Hypothesis

7,978 Matching Annotations

Last 7 days
arxiv.org arxiv.org

New submission 04/07/2023, 10:34:44

1
1. Public_Reviews 14 Jul 2026
  
  in eLife (unscoped)
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  This work provides a new dataset of 71,688 images of different ape species across a variety of environmental and behavioral conditions, along with pose annotations per image. The authors demonstrate the value of their dataset by training pose estimation networks (HRNet-W48) on both their own dataset and other primate datasets (OpenMonkeyPose for monkeys, COCO for humans), ultimately showing that the model trained on their dataset had the best performance (performance measured by PCK and AUC). In addition to their ablation studies where they train pose estimation models with either specific species removed or a certain percentage of the images removed, they provide solid evidence that their large, specialized dataset is uniquely positioned to aid in the task of pose estimation for ape species.
  
  The diversity and size of the dataset make it particularly useful, as it covers a wide range of ape species and poses, making it particularly suitable for training off-the-shelf pose estimation networks or for contributing to the training of a large foundational pose estimation model. In conjunction with new tools focused on extracting behavioral dynamics from pose, this dataset can be especially useful in understanding the basis of ape behaviors using pose.
  
  We thank the reviewer for the kind comments.
  
  Since the dataset provided is the first large, public dataset of its kind exclusively for ape species, more details should be provided on how the data were annotated, as well as summaries of the dataset statistics. In addition, the authors should provide the full list of hyperparameters for each model that was used for evaluation (e.g., mmpose config files, textual descriptions of augmentation/optimization parameters).
  
  We have added more details on the annotation process and have included the list of instructions sent to the annotators. We have also included mmpose configs with the code provided. The following files include the relevant details:
  
  File including the list of instructions sent to the annotators:
  
  OpenMonkeyWild Photograph Rubric.pdf
  
  Mmpose configs:
  
  i) TopDownOAPDataset.py
  
  ii) animal_oap_dataset.py
  
  iii) init.py
  
  iv) hrnet_w48_oap_256x192_full.py
  
  Anaconda environment files:
  
  i) OpenApePose.yml
  
  ii) requirements.txt
  
  Overall this work is a terrific contribution to the field and is likely to have a significant impact on both computer vision and animal behavior.
  
  Strengths:
  
  Open source dataset with excellent annotations on the format, as well as example code provided for working with it.
  
  Properties of the dataset are mostly well described.
  
  Comparison to pose estimation models trained on humans vs monkeys, finding that models trained on human data generalized better to apes than the ones trained on monkeys, in accordance with phylogenetic similarity. This provides evidence for an important consideration in the field: how well can we expect pose estimation models to generalize to new species when using data from closely or distantly related ones?
  
  Sample efficiency experiments reflect an important property of pose estimation systems, which indicates how much data would be necessary to generate similar datasets in other species, as well as how much data may be required for fine-tuning these types of models (also characterized via ablation experiments where some species are left out).
  
  The sample efficiency experiments also reveal important insights about scaling properties of different model architectures, finding that HRNet saturates in performance improvements as a function of dataset size sooner than other architectures like CPMs (even though HRNets still perform better overall).
  
  We thank the reviewer for the kind comments.
  
  Weaknesses:
  
  More details on training hyperparameters used (preferably full config if trained via mmpose).
  
  We have now included mmpose configs and anaconda environment files that allow researchers to use the dataset with specific versions of mmpose and other packages we trained our models with. The list of files is provided above.
  
  Should include dataset datasheet, as described in Gebru et al 2021 (arXiv:1803.09010).
  
  We have included a datasheet for our dataset in the appendix lines 621-764.
  
  Should include crowdsourced annotation datasheet, as described in Diaz et al 2022 (arXiv:2206.08931). Alternatively, the specific instructions that were provided to Hive/annotators would be highly relevant to convey what annotation protocols were employed here.
  
  We have included the list of instructions sent to the Hive annotators in the supplementary materials. File: OpenMonkeyWild Photograph Rubric.pdf
  
  Should include model cards, as described in Mitchell et al (arXiv:1810.03993).
  
  We have included a model card for the included model in the results section line 359. See Author response image 1:
  
  Author response image 1.
  
  It would be useful to include more information on the source of the data as they are collected from many different sites and from many different individuals, some of which may introduce structural biases such as lighting conditions due to geography and time of year.
  
  We agree that the source could introduce structural biases. This is why we included images from so many different sources and captured images at different times from the same source—in hopes that a large variety of background and lighting conditions are represented. However, doing so limits our ability to document each source background and lighting condition separately.
  
  Is there a reason not to use OKS? This incorporates several factors such as landmark visibility, scale, and landmark type-specific annotation variability as in Ronchi & Perona 2017 (arXiv:1707.05388). The latter (variability) could use the human pose values (for landmarks types that are shared), the least variable keypoint class in humans (eyes) as a conservative estimate of accuracy, or leverage a unique aspect of this work (crowdsourced annotations) which affords the ability to estimate these values empirically.
  
  The focus of this work is on overall keypoint localization accuracy and hence we wanted a metric that is easy to interpret and implement, in this case we made use of PCK (Percentage of Correct Keypoints). PCK is a simple and widely used metric that measures the percentage of correctly localized keypoints within a certain distance threshold from their corresponding groundtruth keypoints.
  
  A reporting of the scales present in the dataset would be useful (e.g., histogram of unnormalized bounding boxes) and would align well with existing pose dataset papers such as MS-COCO (arXiv:1405.0312) which reports the distribution of instance sizes and instance density per image.
  
  We have now included a histogram of unnormalized bounding boxes in the manuscript, see Author response image 2:
  
  Author response image 2.
  
  Reviewer #2 (Public Review):
  
  The authors present the OpenApePose database constituting a collection of over 70000 ape images which will be important for many applications within primatology and the behavioural sciences. The authors have also rigorously tested the utility of this database in comparison to available Pose image databases for monkeys and humans to clearly demonstrate its solid potential.
  
  We thank the reviewer for the kind comments.
  
  However, the variation in the database with regards to individuals, background, source/setting is not clearly articulated and would be beneficial information for those wishing to make use of this resource in the future. At present, there is also a lack of clarity as to how this image database can be extrapolated to aid video data analyses which would be highly beneficial as well.
  
  I have two major concerns with regard to the manuscript as it currently stands which I think if addressed would aid the clarity and utility of this database for readers.
  
  (1) Human annotators are mentioned as doing the 16 landmarks manually for all images but there is no assessment of inter-observer reliability or the such. I think something to this end is currently missing, along with how many annotators there were. This will be essential for others to know who may want to use this database in the future.
  
  We thank the reviewer for pointing this out. Inter-observer reliability is important for ensuring the quality of the annotations. We first used Amazon MTurk to crowd source annotations and found that the inter-observer reliability and the annotation quality was poor. This was the reason for choosing a commercial service such as Hive AI. As the crowd sourcing and quality control are managed by Hive through their internal procedures, we do not have access to data that can allow us to assess inter-observer reliability. However, the annotation quality was assessed by first author ND through manual inspections of the annotations visualized on all of the images the database. Additionally, our ablation experiments with high out of sample performances further vaildate the quality of the annotations.
  
  Relevant to this comment, in your description of the database, a table or such could be included, providing the number of images from each source/setting per species and/or number of individuals. Something to give a brief overview of the variation beyond species. (subspecies would also be of benefit for example).
  
  Our goal was to obtain as many images as possible from the most commonly studied ape species. In order to ensure a large enough database, we focused only on the species and combined images from as many sources as possible to reach our goal of ~10,000 images per species. With the wide range of people involved in obtaining the images, we could not ensure that all the photographers had the necessary expertise to differentiate individuals and subspecies of the subjects they were photographing. We could only ensure that the right species was being photographed. Hence, we cannot include more detailed information.
  
  (2) You mention around line 195 that you used a specific function for splitting up the dataset into training, validation, and test but there is no information given as to whether this was simply random or if an attempt to balance across species, individuals, background/source was made. I would actually think that a balanced approach would be more appropriate/useful here so whether or not this was done, and the reasoning behind that must be justified.
  
  This is especially relevant given that in one test you report balancing across species (for the sample size subsampling procedure).
  
  We created the training set to reflect the species composition of the whole dataset, but used test sets balanced by species. This was done to give a sense of the performance of a model that could be trained with the entire dataset, that does not have the species fully balanced. We believe that researchers interested in training models using this dataset for behavior tracking applications would use the entire dataset to fully leverage the variation in the dataset. However, for those interested in training models with balanced species, we provide an annotation file with all the images included, which would allow researchers to create their own training and test sets that meet their specific needs. We have added this justification in the manuscript to guide the other users with different needs. Lines 530-534: “We did not balance our training set for the species as we wanted to utilize the full variation in the dataset and assess models trained with the proportion of species as reflected in the dataset. We provide annotations including the entire dataset to allow others to make create their own training/validation/test sets that suit their needs.”
  
  And another perhaps major concern that I think should also be addressed somewhere is the fact that this is an image database tested on images while the abstract and manuscript mention the importance of pose estimation for video datasets, yet the current manuscript does not provide any clear test of video datasets nor engage with the practicalities associated with using this image-based database for applications to video datasets. Somewhere this needs to be added to clarify its practical utility.
  
  We thank the reviewer for this important suggestion. Since we can separate a video into its constituent frames, one can indeed use the provided model or other models trained using this dataset for inference on the frames, thus allowing video tracking applications. We now include a short video clip of a chimpanzee with inferences from the provided model visualized in the supplementary materials.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Please provide a more thorough description of the annotation procedure (i.e., the instructions given to crowd workers)! See public review for reference on dataset annotation reporting cards.
  
  We have included the list of instructions for Hive annotators in the supplementary materials.
  
  An estimate of the crowd worker accuracy and variability would be super valuable!
  
  While we agree that this is useful, we do not have access to Hive internal data on crowd worker IDs that could allow us to estimate these metrics. Furthermore, we assessed each image manually to ensure good annotation quality.
  
  In the methods section it is reported that images were discarded because they were either too blurry, small, or highly occluded. Further quantification could be provided. How many images were discarded per species?
  
  It’s not really clear to us why this is interesting or important. We used a large number of photographers and annotators, some of whom gave a high ratio of great images; some of whom gave a poor ratio. But it’s not clear what those ratios tell us.
  
  Placing the numerical values at the end of the bars would make the graphs more readable in Figures 4 and 5.
  
  We thank the reviewer for this suggestion. While we agree that this can help, we do not have space to include the number in a font size that would be readable. Smaller font sizes that are likely to fit may not be readable for all readers. We have included the numerical values in the main text in the results section for those interested and hope that the figures provide a qualitative sense of the results to the readers.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

arxiv.org/abs/2212.00741
www.biorxiv.org www.biorxiv.org

Decoupling AMPK from fatty acid synthesis allows maintenance of fitness late in life

1
1. Public_Reviews 14 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  This rigorous and creative study uses an elegant combination of metabolomics, transcriptomics, and budding yeast molecular genetics to discover that (i) activating AMPK to maintain mitochondrial respiration fueled by cytosolic Acetyl CoA and (ii) increasing fatty acid synthesis independent of respiration drive independent pathways that increase the fitness of replicatively-aged budding yeast cells, albeit without increasing their lifespan. This work will be of interest to scientists in the field of aging and metabolism. Some clarifications in the text would address the following concerns, which would increase the impact of the study:
  
  (1) What does activation of AMPK (via PGDP-Sak1 expression) do to the replicative lifespan? How many bud scars, in general, do the subpopulations that are older - yet have less Tom70 (increased mitochondrial fitness) - have, after the 48 hrs timepoint that they are examining? How many divisions occurred in this 48hr time period - i.e. is it long enough to have all cells reach the end of their replicative lifespan? This information is important to rule out that a subset of the mutant cells just divided faster and hence had more divisions within 48 hrs (growing faster and living longer are different things). Having identical growth curves doesn't indicate per se that they all divide at the same rate, as there may be a subpopulation that divides faster and a subpopulation that doesn't grow so well.
  
  Increasing AMPK activity increases replicative lifespan [PMID: 25869125], but given our finding that AMPK activation splits the population, such replicative lifespan assays are hard to interpret. Bud scar counts have a similar issue. Hence we restricted the lifespan and bud scar analyses to wt and A2A which are more homogenous (Figures S2 B and E). A2A cells at 48 h have ~25% more bud scars than wt cells. Yes, by 48 h most of the cells have lost viability (Figure 2E). The reviewer is correct that you can't properly compare the lifespan curves if the cells divide at different rates, hence our follow-up test of wt at 48 h vs A2A at 40 h viability after we had confirmed that these time points captured cells at equivalent replicative ages (Figure 2D, E). This shows that viability of A2A is slightly lower than wt at matched age, indicating a slightly shorter lifespan.
  
  (2) A2A cells do not have an extended replicative lifespan (RLS) but show an increase in the "low senescence" population (Figure 2). If the cells are not becoming senescent, why don't they have longer RLS? Not having a longer lifespan seems inconsistent with the statement that "bud scar counting confirmed that A2A cells reach a higher age than wild type", which comes back to how many times the cells can divide in the 48hr timepoint studied and their rate of cell division? Also, the lifespan curve shown is plotted against time, not cell division number, which does not take into account different division times of cells within the population (described above). It would be much more useful to show standard lifespan curves showing cell division numbers per lifespan per cell.
  
  Our observation that cells can reach the end of life without senescing is consistent with other studies that have studied the life course of individual cells by microscopy [PMID: 31291577, 32675375]. These studies always highlight some proportion of the cells that reach the end of life with no or minimal senescence, though this fraction varies with the experimental system. The question of why cells lose viability without senescing is a complete unknown in the field, and reflects a wider lack of consensus as to why yeast lose viability with replicative age.
  
  In liquid culture we can only assess viability over time, not cell division number, which we agree is not optimal and we are wary about making strong statements on lifespan for exactly the reasons the reviewer notes. Unfortunately, it is clear from the comparison of liquid and solid media lifespans performed by the Gottschling lab [PMID: 19652178] that culture system has a huge effect on lifespan, with cells in classical plate-based microdissection assays living far longer than the same strains do in liquid. This means that lifespans determined by microdissection-based assays are of questionable relevance to ageing studies performed in liquid culture. Senescence cannot be assayed on plates, while microfluidic systems lack the throughput necessary and preclude key techniques like RNA-seq, so liquid culture assays were the only option for this work. We agree that this leaves an unsatisfactory approximation for lifespan measurements, but we consider it critical that everything is measured in the same system. We therefore restricted our conclusion on lifespan to simply say that lifespan of A2A cells is not extended which our data in Figures 2D, E, S2B does support (see also answer to Q1), and therefore with the majority of A2A cells showing low senescence marks and high fitness at 48 h we can conclude that lifespan and fitness loss must be separable.
  
  We have added a note of these limitations of lifespan measurements in the materials and methods section of the manuscript.
  
  (3) Increased "fitness" of the old cells is implied from the increased size of the colonies that the old cells can make. However, this is a measure of the fitness of the daughters per se, not the old mother cells. Are the old mothers just passing on healthier mitochondria and more lipids to the daughters, such that they can divide more times? If the aged cells have an "increased fitness", why don't they divide more times themselves (i.e. live longer?).
  
  Yes, colony growth speed is defined by daughter cell replication, but as long as the daughters and subsequent generations divide at the same rate irrespective of whether they come from a young or old mothers then the size of the colony after 24 hours varies based on the time it took the initial mother to produce a daughter. This is what the assay really measures. We note that aged wildtype mothers often do not divide at all in the first 24 hours after being put on an agar plate (hence the tiny reported colony size), even though they do eventually produce a daughter which then forms a colony, whereas A2A cells tend to produce the first daughter rapidly whether young or old. It is known that daughters of aged wildtype mothers also divide slower, as to some extent do grand-daughters (PMID: 2644196), which will also contribute to differences in colony size, and this may well result from a lipid and/or mitochondrial contribution, but the primary driver of colony size in 24 hours is the time the mother took to initially divide. We have added this detail to the materials and methods section of the manuscript.
  
  As noted above, the mechanistic basis of lifespan is unknown, but although senescence can shorten lifespan, our work and that of others shows that lifespan is still limited in the absence of senescence.
  
  (4) The statement is made that "these experiments define two classes of aging cells with distinct metabolic needs, coherent with the model of two aging trajectories previously proposed (referencing Nan Hao's work)". However, the big difference here is that in Nan Hao's work, their two aging trajectories influenced the length of lifespan, but that does not appear to be the case here. That distinction should be made clear. Perhaps the authors could also speculate as to why the A2A yeast stops dividing after presumably the same number of cell divisions, even though they have an activated AMPK and activated fatty acid synthesis pathway.
  
  Yes, this is a good point and we have added this distinction to the Discussion:
  
  “Here we have characterised two classes of ageing cells seemingly differentiated by high and low availability of cytosolic Acetyl-CoA, consistent with a previous demonstration that ageing follows two trajectories in yeast though it should be noted that in this previous report, the two trajectories also differed in replicative lifespan (6).”
  
  We would love to speculate on why the A2A cells don't have an extended lifespan, but at this point we don't have a strong hypothesis. We have come up with many theories for this, but none that we haven’t managed to disprove experimentally. One thing worth considering is that many cells which lose replicative viability in liquid culture and probably in plate assays remain intact – for example, DNA and RNA integrity is not compromised over 24- 48 h – so those cells are probably not dead per se. But we also detect apoptosis-sized DNA fragments, which must come from dead cells, so there is clearly not a single mechanism defining the end of replicative lifespan.
  
  (5) I am a bit confused by the use of the word "senescence" by this lab here and in their previous growth on galactose studies. If yeast don't senesce, which is usually defined as an irreversible arrest of the cell cycle where cells stop dividing, shouldn't the yeast that do not senesce still be dividing and hence have a longer lifespan? Should a different term be used rather than senescence? Such as "fitness late in life". The authors giving their definition of senescence may help reduce this apparent contradiction.
  
  We completely agree, this is confusing and noted this distinction in the Introduction. Use of the term senescence to mean a loss of fitness late in life in yeast stems from the classical definition of senescence as applied to whole organisms. However, the term senescence as applied to cells has a more specific meaning in terms of the cell cycle as the reviewer notes. As an individual S. cerevisiae is both a cell and an organism, the terminology clashes. However, the marker we largely employ (Tom70-GFP) which in our hands is a very good proxy for fitness was originally defined as marking the senescence entry point (SEP), so overall we feel we can't avoid the term.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this study, the authors investigate how cytosolic acetyl-CoA metabolism influences replicative aging in budding yeast. They propose that acetyl-CoA regulates aging through three major pathways: (1) mitochondrial transport to support mitochondrial function, (2) fatty acid synthesis, and (3) global protein acetylation. The data show that AMPK activation promotes mitochondrial import of acetyl-CoA and partially mitigates mitochondrial decline in a subset of aging cells.
  
  Furthermore, the engineered A2A strain, which enhances mitochondrial acetyl-CoA utilization while relieving inhibition of fatty acid synthesis, increases the proportion of cells exhibiting a "low senescence" phenotype.
  
  Overall, this is a thoughtful and potentially impactful study that advances our understanding of metab to olic control of aging. Addressing the points below, particularly by refining interpretations and, where feasible, incorporating additional analyses, will further strengthen the manuscript and its conclusions.
  
  Strengths:
  
  The study has several notable strengths. It addresses an important question by shifting the focus from lifespan to preservation of late-life fitness, which is highly relevant to aging biology. The work integrates metabolic, genetic, and functional analyses to link cytosolic acetyl-CoA flux with distinct aging outcomes, and the engineering of the A2A strain provides a clear and elegant demonstration of how coordinated pathway modulation can improve cellular fitness.
  
  Weaknesses:
  
  (1) While the manuscript focuses on mitochondrial transport and fatty acid synthesis, cytosolic acetyl-CoA is also a key regulator of histone acetylation and chromatin silencing. It would strengthen the study to consider whether acetyl-CoA depletion contributes to improved fitness through enhanced rDNA silencing. Given the well-established role of rDNA instability in yeast aging, additional experiments examining rDNA silencing and stability would be valuable. For example, monitoring rDNA copy number changes (not necessarily ERCs) under AMPK activation, oleic acid supplementation, and in the A2A strain, similar to approaches used in the authors' prior work, would help clarify whether chromatin regulation contributes to the observed phenotypes.
  
  We have added data addressing these points to the manuscript and Supplemental Figures 2, 3 and 4, though the outcomes are complex. Histone acetylation changes chromatin accessibility and could therefore alter global gene expression; in accord with this, RNA-seq shows that P<sub>GPD</sub>-SAK1 reduces known age-linked gene expression dysregulation. However, A2A does not further reduce the effect, meaning either that another driver exists in addition to cytosolic acetyl-CoA, or that age-linked gene expression dysregulation is unrelated to cytosolic acetyl-CoA. Oleic acid has little effect on age-linked gene expression dysregulation despite rescuing fitness. With regard to rDNA silencing, transcription of the rDNA intergenic spacer non-coding RNAs promotes ERC formation; we have added data showing that ERC accumulation is not reduced in A2A but slightly higher coherent with the higher replicative age of A2A at 48 h, which suggests silencing is not better in A2A. By RNA-seq, these intergenic spacer transcripts are massively upregulated with age, but this will be a consequence of the increased genomic copy number on ERCs; the upregulation is less in A2A than other conditions, but this arises because the log phase spacer transcript levels are higher and so does not reflect better rDNA silencing. We have previously assayed for heritable changes in rDNA copy number arising during ageing and found (to our surprise) absolutely nothing, so we don't expect any changes under these conditions. The upregulation of transcripts from Sir2-repressed telomeric and MAT loci with age is decreased in P<sub>GPD</sub>-SAK1 and A2A, but the effect size is not different from any other low-expressed genes so we do not think there is a particular effect at loci subject to chromatin silencing (see our previous study Zylstra et al PMID 37643194 for evidence that Sir2-mediated gene silencing is not affected by age). We have added our conclusions from these experiments to the Discussion.
  
  (2) The current data do not fully distinguish whether AMPK activation and oleic acid supplementation act on distinct subpopulations of aging cells. An alternative explanation is that oleic acid supplementation enhances mitochondrial function and acts additively with AMPK activation, thereby increasing the fraction of cells in the "low senescence" state. Since this distinction is not central to the main conclusions, I suggest softening the language around subpopulation specificity. Emphasizing instead that the A2A strain coordinately modulates multiple branches of acetyl-CoA metabolism to improve late-life fitness would maintain the strength of the central message without over interpretation.
  
  We respectfully disagree with the reviewer on this point. We show that P<sub>GPD</sub>-SAK1 rescues senescence in ~half the population by a Cat2/Mls1 dependent mechanism (Figure 1F). We then show that in A2A, which rescues most cells, deletion of CAT2/MLS1 restores senescence in ~half the cells (Figure 3F/G). This cannot be explained by an additive mechanism as this would either result in all cells being partially rescued in the P<sub>GPD</sub>-SAK1 and in the A2A cat2Δ mls1Δ mutants, which is definitely not the case either by Tom70-GFP or fitness. Instead the population splits into high/low senescence and fit/unfit cells in the different assays.
  
  On the specific point of whether lipid synthesis additively increases mitochondrial function, we have added oxygen consumption rate data showing that A2A cells respire more than P<sub>GPD</sub>-SAK1 at 48h but only by a relatively small amount (Figure S3D), so there is indeed an additive improvement in mitochondrial function, but too little to explain the difference in population fitness in our opinion.
  
  We realise that the reviewer is asking more specifically about oleic acid, but again in the flow data, Figure 4C, what changes with oleic acid or P<sub>GPD</sub>-SAK1 is the proportion of cells in the low Tom70 / high WGA sector. Under an additive effect model, oleic acid or P<sub>GPD</sub>-SAK1 individually would partially reduce Tom70 and partially increase WGA, but the population in the low Tom70 / high WGA sector has the same average Tom70/WGA values in oleic acid, P<sub>GPD</sub>-SAK1 or P<sub>GPD</sub>-SAK1+oleic acid. It is the proportion of cells in this population that changes. Furthermore, under an additive model, wildtype cells aged with oleic acid would not have highest fitness than P<sub>GPD</sub>-SAK1 or A2A (Figure 4D) as these individual cells would lack the mitochondrial upregulation from P<sub>GPD</sub>-SAK1.
  
  (3) The manuscript proposes that lipid starvation and excess acetyl-CoA are major drivers of senescence in distinct subpopulations of wild-type aging cells. This conclusion is not yet fully supported by the presented data. Direct measurements of age-dependent divergence in acetyl-CoA and fatty acid levels at the single-cell level would be needed to substantiate this model. Based on the current evidence, a more conservative interpretation would be that aging cells exhibit differential sensitivity to perturbations in acetyl-CoA and lipid metabolism. Accordingly, I recommend revising the statement in the Abstract ("We further implicate lipid starvation and excess acetyl coenzyme A availability as major drivers of senescence...") and the corresponding discussion text to better align with the data.
  
  We agree and have adjusted the abstract to make it clearer that the lipid starvation / excess acetyl-coA interpretation is a model.
  
  “Our findings support a model in which lipid starvation and excess acetyl-coenzyme A availability are major drivers of senescence in replicatively aged wild-type yeast.”
  
  Reviewer #3 (Public review):
  
  Summary:
  
  These findings suggest that PGPD-SAK1 yeast show a subpopulation with lowered TOM70-GFP expression in high bud scar staining aged cells. Deletion of CAT2 or MLS1 reduces this effect. A PGPD-SAK1 acc1S1157A double mutant (called "A2A" here) shows an even larger effect of lowered tom70 expression in high bud scar staining aged cells. Utilization of various additional mutants involved in acetyl-CoA transport, carnitine shuttle, respiration, etc., leads the authors to conclude that these shifts in TOM70-GFP in aged cells are linked to the AMPK-fatty acid metabolic regulatory system.
  
  Strengths:
  
  These extensive and clearly described experiments reveal interesting changes in TOM70-GFP intensity in subsets of aged yeast in several mutants eventually identified as linked to the AMPK-fatty acid metabolic regulatory system.
  
  Weaknesses:
  
  (1) 3 biological replicates for mRNASeq is low.
  
  Thank you for pointing this out. We performed another replicate after posting the initial preprint to confirm the finding but didn’t update the figure in the eLife-reviewed version. We have added this to the scatter plots and analysis in Figure 1, there are minor changes but the set of genes we followed up are still highly significant. For ageing experiments, we sequence to n=3 as a first pass which is sufficient to detect widespread age-linked gene expression effects, and add more replicates if required to solidify findings for specific sets of genes. Hence, the additional RNAseq experiments we have added to the manuscript to Address Reviewer 2’s comments on widespread gene expression effects are also n=3-4.
  
  (2) While "Traditional conceptions of ageing implicate a progressive accumulation of damage leading to systemic degradation in performance until death, with evolutionary pressures acting to maximise early life fitness and fecundity at the expense of ageing health." is tangential perhaps to the data and conclusions of the study, both claims of this sentence are at best controversial, and the manuscript is no weaker for their omission.
  
  We would prefer not to remove this sentence, which we see as important to a major message of the manuscript: that ageing does not have to involve a loss of fitness before death. Outside the ageing biology field, ageing is often described as the progressive wearing out of components leading to decline and death (‘like an old car’ is a common analogy); in the ageing field this is certainly controversial, but outside the field it remains the normal understanding. This is what we mean by traditional conceptions, and it is important to consider the contradiction between this widely held viewpoint and our findings (and of course those of many others in the ageing field).
  
  The second part of the sentence about evolutionary pressures alludes to antagonistic pleiotropy, which we have now made explicit. Antagonistic pleiotropies as a driving mechanism for ageing, while not universally accepted, are as far as we can tell the most widely accepted type of theory in the ageing field. Our interpretation that yeast are bet-hedging as a population growth strategy and this drives ageing in the long term is a classic antagonistic pleiotropy and we need to raise this concept in the introduction.
  
  (3) The statement that "Here, we determine the basis of senescence and fitness loss in replicatively ageing yeast" is a bit strong as a summary of the present careful work presented here. If the authors had created yeast mutants that retained fitness indefinitely, this would be a more appropriate strength of claim to summarize the work.
  
  We agree and have moderated this sentence:
  
  “Here, we show that senescence and fitness loss in replicatively ageing yeast can be almost completely avoided without extension of lifespan by rewiring the conserved AMPK-fatty acid metabolic regulatory system.”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  The labelling of Figure 3G horizontal axis needs to be realigned with the data.
  
  Fixed – thank you.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) In Figure 3G, the x-axis labels appear misaligned and should be corrected for clarity.
  
  Fixed – thank you.
  
  (2) Figures S3B and S3C appear to be mislabeled and should be revised.
  
  Fixed – thank you.
  
  (3) On page 6 (3rd paragraph), the statement that the beneficial impact arises from acetyl-CoA removal "rather than a benefit of respiration" may be overstated. The data support a role for acetyl-CoA removal but do not fully exclude a contribution from respiration. A more balanced phrasing would improve accuracy.
  
  We have revised this sentence and also added data:
  
  “Working in sip2Δ to avoid an increase in AMPK activity due to reduced Acetyl-CoA availability, we observed that ald6Δ increased the low senescence population through decreasing Tom70-GFP (S3C), and therefore the beneficial impact of PGPD-SAK1 on this pathway arises primarily through Acetyl-CoA removal. It is possible that respiration is adding to this benefit, and we detect a significant increase in Oxygen Consumption Rate in aged PGPD-SAK1 cells, but the further increase in A2A is smaller and we consider that this cannot fully explain the effect of acc1S1157A.”
  
  Reviewer #3 (Recommendations for the authors):
  
  This manuscript is clearly written, and the data are clearly presented. While 3 biological replicates is inadvisably low for mRNASeq, the subsequent experiments motivated by the genes identified there nevertheless stand on their own as presented.
  
  Thank you.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.27.645766v3
www.biorxiv.org www.biorxiv.org

A network perspective on the role of c-di-GMP-associated protein complexes in biofilm formation

1
1. Public_Reviews 14 Jul 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This manuscript by Noirot-Gros et. al. presents a herculean effort to map the protein-protein interactome of the c-di-GMP signaling network in Pseudomonas fluorescens (Pf). C-di-GMP, the key driver of biofilm formation in bacteria, is controlled by a highly complex network of synthesis, degradation and effector proteins. Pf is no exception as it encodes dozens of such proteins. The authors use a Yeast Two-Hybrid approach genome-wide screen with 10 diguanylate cyclase (DGC) enzymes as bait to assess protein-protein interactions in this network. The results identify over one hundred such interactions with several different hubs, including c-di-GMP signaling, other signaling systems, membrane proteins, etc. The authors then explore the original bait proteins as well as identify interactors on biofilm formation-related phenotypes and swarming using a high-throughput CRISPRi expression knockdown approach. The amount of data generated is quite impressive. Much of the manuscript uses statistical-based network analysis to group different proteins based on their interactions or impact on phenotypes, which is a high-level analysis that can catalyze further study into this system. The authors chose three specific proteins to assess their impact on cell morphology, DNA repair, and protein localization. Overall, in my view, this is perhaps the best analysis of a c-di-GMP protein-protein interactome, and it provides a multitude of hypotheses to be tested. However, therein lies the weakness of the manuscript in that very few of these hypotheses are actually tested. But such is not the goal of this network analysis type of approach. Overall, I think the work will be highly impactful to those in the c-di-GMP field, and it provides a template for others attempting such analyses of protein-protein interactions.
  
  Strengths:
  
  The manuscript is impressive in the sheer scale of the protein-protein interactions identified, network analysis, and phenotypic analysis of specific proteins in the network. It is an impressive amount of work that could be very useful to the field. It is also statistically rigorous in its analysis of significant interactions or network nodes.
  
  Weaknesses:
  
  The weakness of the manuscript is that, with three exceptions, very few of the hypotheses are actually tested. For example, BifA is shown to be a network hub protein that interacts with many other diguanylate cyclases, and this is hypothesized to be through GGDEF heterodimerization. I appreciate that experimentally testing such a hypothesis is probably another entire manuscript, but some early forays into such ideas could be undertaken using AlphaFold structural modeling of protein-protein interactions compared with GGDEFs that don't form heterodimers. Also, an inherent weakness is that such detailed analyses of a c-di-GMP signaling network, in which each diguanylate cyclase and phosphodiesterase may respond to a unique cue, is that the network identified and the conclusions made are highly specific to the experimental conditions in which the work was done. Therefore, it is unclear how broadly these conclusions (i.e. BifA is the central regulator of c-di-GMP signaling) apply to other conditions. But it is impossible to get around such a limitation, and this work can lead to testing the robustness of the identified network in other environments.
  
  We would like to thank the reviewer sincerely for their positive comments on our manuscript and for their constructive feedback. We recognize the limitations arising from the lack of extensive knowledge regarding the environmental cues that trigger the regulation of all CDG activities in P. fluorescens. We hypothesize that DipA acts as a central local hub that positively or negatively regulates the activity of its interacting CDG partners throughout the cell life cycle, lifestyle transitions and environmental signals. Testing this hypothesis would indeed require extensive biochemical and omics approaches. However, strengthening the significance of DipA complexes in silico using AlphaFold is a very appealing proposition and we are currently considering including this analysis in the revised version of the manuscript.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this manuscript, Noirot-Gros and coworkers investigated the network of c-di-GMP associated protein complexes in Pseudomonas fluorescens. They did so by using a genome-wide yeast two-hybrid screen, and that was further probed by phenotypic screening that focused on biofilm and motility phenotypes. From this network map, they discovered that the phosphodiesterase DipA interacts with the GGDEF domains of many c-di-GMP-binding proteins.
  
  Strengths:
  
  (1) Broadness of screen led to identification of new interactions: The genome-wide yeast two-hybrid screening approach permitted broad investigation of c-di-GMP-associated protein-protein interactions. These interactions included some previously validated interactions as well as newly discovered interactions.
  
  (2) Complementary experimental validation: The proposed network was experimentally validated, including by using a CRISPRi-based approach in which the expression of genes encoding proteins identified in the network was systematically suppressed, and then the impact on the biofilm and motility phenotypes was assessed.
  
  Weaknesses:
  
  The findings would have been strengthened by further biochemical analysis, but this is likely beyond the scope of the paper.
  
  We would like to express our gratitude to the reviewer for their positive evaluation assessment, and for taking into account the limitations of the study's scope.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this manuscript, Noirot-Gross et al take an open-ended approach to elucidate the c-diGMP-associated protein complexes in Pseudomonas fluorescens. Starting with 10 cyclic d-GMP putative proteins, they use a combination of genome-wide two-hybrid system followed by CRISPRi-mediated exploration of phenotypes to describe the cyclic di-GMP-associated regulation of biofilm formation, and how it relates to other functions. Overall, this work presents an excellent example of how genome annotations can be further confirmed with the use of integrated functional genomic approaches. Some areas of improvement can be applied to this manuscript to enhance readability and provide a clearer distinction between confirmatory results and new findings, which are provided below:
  
  Strengths:
  
  (1) The authors have explored their findings extensively and provide a comprehensive view of the topic.
  
  (2) The combination of genome-wide explorations of protein-protein interactions with the more focused phenotypic exploration of the interactions found provides a solid framework for the work presented.
  
  Weaknesses:
  
  (1) Overall goal of the work:
  
  While articles that describe open-ended approaches can be comprehensive and descriptive in nature, the authors should have a main overall goal, which can guide the reader through the main and most compelling findings at the end. As written, the overall goal is not clear. The network perspective is interesting, and the focus on biofilm formation appears in the title. Why P. fluorescens? How is cyclic di-GMP-mediated regulation of biofilm formation in P. fluorescens different from P. aeruginosa? Why would it be studied? (Positive or negative regulation of biofilm formation?)
  
  We would like to express our appreciation to the reviewer for their thorough evaluation of our manuscript and for the constructive feedback they provided. The overall goal of this study will be further refined, and outlined in the introduction in the revised version of the manuscript.
  
  (2) Abstract:
  
  The abstract is very well written and guides the reader to the DipA as a hub protein in the network. From further reading, the article could clarify whether this finding is confirmatory or novel (does DipA play a similar role in P. aeruginosa?) It would be appropriate to mention the role of DipA in other Pseudomonas species from the beginning, and not only in the discussion session.
  
  (3) Introduction:
  
  The introduction is nicely written. An area of improvement could be giving more attention to protein interactions as relevant to c-di-GMP. The authors could consider an independent paragraph starting with line 84-85 "Protein-protein interactions involving DGCs, PDEs, and target effectors are crucial in establishing localized signalling through the generation of local pools of c-di-GMP", expanding on this particular aspect with an example of localized signal, after explaining that localization could help decipher specific function within the network of DGCs and PDEs. Then go into connecting biofilms with c-di-GMP and protein-protein interactions, using the example of GcbC and LapD.
  
  We propose highlighting the example to the local signalling cascade formed by the tripartite system YdaM, YciR and MlrA. This will be addressed in the revised version of the manuscript.
  
  (4) The rationale of choosing 10 PDEs could be clarified. The nice diagrams shown in the supplementary table could be used as part of Figure 1, so the reader understands why these proteins were used, and what is known about them (for example, add them as Figure 1a).
  
  We propose to include a specific section in the supplementary file to explain the whole rationale behind choosing these CDGs. These proteins were selected based on their involvement in different steps of biofilm formation in Pseudomonas, as well as their role in the ability of P. fluorescens strains to colonize plant roots.
  
  (5) Figures 1b and 2 convey the same information as in Figure 1a. They could be removed without affecting the understanding of the article.
  
  Figure 2 will be transferred in Supplementary as part of the Figure S1
  
  (6) CRISPRi and Figure 3. Figure 3 shows the methodology of CRICPR phenotypic screening. A diagram showing the CRISPRi system in P. fluorescens could help the non-expert reader. While the choice of 23 proteins related to the emerging hub DipA is clear, the choice of the other 33 genes could be better explained. Are these proteins already related to biofilm formation? Where are they part of the network detected? How about the other 14 SBW25 genes? The authors could clarify the rationale of the choices. Figure 4 could be combined with Figure 3 or moved to the supplementary material.
  
  A better description of the rationale behind the choice of tested interacting protein partners will be provided. We also agree to combine Figure 4 with Figure 3.
  
  (7) Figures 5, 6 and 7 represent solid network analysis of the findings. Still, they could be improved in clarity on the main findings. The authors conclude at the end of section 3.2.3 that there are networks that exert a "positive role" and a "negative role". The authors could show that in the figures, explaining what those roles are: more biofilm structural coding genes? positive or negative regulation of biofilm formation?)
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.05.12.724550v1
s3.amazonaws.com s3.amazonaws.com

S04_Wexler-Long.pdf

1
1. Lucas.siegel318 13 Jul 2026
  
  in Public
  
  Even though secular archivists may feel uncomfortable thinking of this workas part of anything other than fulfilling professional responsibilities, we mustunderstand that such work is also a component of a contemporary ritual.
  
  I find the tone of this article quite interesting, as it grants archivists a role almost akin to a secularized priest, administering sacred rituals to the dying, the dead, and their survivors. I had never considered this angle on archival studies, but it is a thought-provoking way of framing this aspect of archival work. I think that it properly conveys the gravity and responsibility of working with donors near the end of their lives.
Visit annotations in context

Annotators

Lucas.siegel318

URL

s3.amazonaws.com/files.commons.gc.cuny.edu/wp-content/blogs.dir/48911/files/2026/01/S04_Wexler-Long.pdf
www.emerald.com www.emerald.com

21_3_holden

3
1. LMS26 13 Jul 2026
  
  in Public
  
  representation
  
  I need to push back on this comment. If tokenizing provides queer people a space for representation of their community, but the "token queer" only represents a very small part of the community that is considered "acceptable" or "palatable" by others or makes heterosexual, cisgender people "less uncomfortable," then I don't think that we can really call this "representation" of the community. It is really just a perpetuation of stereotype (as the author mentions), and not at all an authentic space for representation. This may do more harm than good.
2. NehaNair 13 Jul 2026
  
  in Public
  
  Furthermore, the responsibility ofeducating students on queer issues fallson individuals who are invested in thetopic
  
  This idea of 'being invested in the topic' makes me question our education system deeply. The system encourages an individual to funnel into a specific topic as they move ahead in their education - elementary, secondary, undergraduate, graduate and PhD. A vegan may not think about caste while they promote the beef ban. I may advocate for one marginalized group and not the other because I am 'invested in that topic'. Transdisciplinary attempts should be encouraged throughout education so that we can carry this responsibility better. This article comes under Education Leadership, but also Social, Political and Cultural Contexts of education.
3. LMS26 12 Jul 2026
  
  in Public
  
  make so muchnois
  
  I have seen this with several marginalized groups - we are forced to overcompensate just to be heard. However, sometimes the very nature of "making so much noise" perpetuates the discrimination these groups experience. I think about friends of mine who are activists combating anti-black racism, and they can be really difficult to approach and learn from. I have also had experiences with women in male-dominated spaces who are extra tough because they feel they need to be (e.g., female border officers who are intentionally mean or aggressive as an overcompensation in response to their marginalization or the assumption that they can't be tough or mean). I am not in any way saying that I'm against activist or speaking out -- I am commenting on how unfair it is that marginalized groups are "forced to make so much noise" and that, in doing so, they may not even achieve their desired outcome.
Visit annotations in context

Annotators

NehaNair

LMS26

URL

emerald.com/jole/article-pdf/21/3/1/10062282/v21_i3_r7.pdf
www.biorxiv.org www.biorxiv.org

Metabolic Trans-Omic Analysis Reveals Key Regulatory Disruption of Energy Metabolism in Alzheimer's Disease

1
1. Public_Reviews 13 Jul 2026
  
  in eLife
  
  Author response:
  
  General Statements:
  
  We appreciate the reviewers for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly.
  
  Point-by-point description of the revisions:
  
  Reviewer #1 (Evidence, reproducibility and clarity):
  
  Major comments
  
  (1) This study leaves out lipid metabolism as a major energy metabolism pathway relevant to AD. The authors themselves cite the significance of acylcarnitines and CPT1A in AD (pg. 3, lines 32-33, pg. 4, lines 1-2). Lipid metabolism and homeostasis is known to be disrupted in AD1. Fatty acid oxidation is a known energy source in the prefrontal cortex2 and will also generate acetyl coA, which this study reveals is a significant decreased metabolite in AD. Furthermore, sphingomyelin emerges as one of the major decreased DEMs as well. Thus, lipid metabolism should be highlighted in Figure 3 and discussed throughout the manuscript; otherwise its omission should be clearly stated and justified.
  
  We appreciate the reviewer’s insightful comment regarding a critical role of lipid metabolism in AD. We recognize that lipid metabolism is a metabolic pathway deeply involved in AD pathology (Baloni et al., 2022, 2020; Varma et al., 2021). Accordingly, we have revised the Limitations section to more strongly emphasize its role as a vital energy source (pg. 13, lines 15-17). Regarding the visualization of lipid metabolism, we extracted lipid-related pathway from the trans-omic network but found that the regulatory relationships among DEPs and DEMs were excessively complex and interconnected. Thus, interpreting this regulatory network seemed to be more challenging compared to the other energy production pathways presented in our manuscript. Therefore, we have concluded that the pathway analysis in our trans-omic network may not be suitable for deeply elucidating the lipid dysregulation in AD. We have added a statement acknowledging this as a limitation of our current methodology in the revised manuscript (pg. 13, lines 13-22).
  
  (2) The covariates used for differential analysis should be discussed and justified. Notably, age is used as a covariate for transcriptomic analysis but not proteomic and metabolomic analysis, with no justification. Additionally, given the known importance of lipid metabolism in AD and the putative role of APOE in lipid homeostasis3, APOE genetic status should be considered as a covariate, or its omission should be justified.
  
  We appreciate the reviewer’s comment regarding the included covariates in differential analyses of our study. The reason we did not include other variables, such as age at death and RIN, is that these data were not available for each sample. Thus, we referred to the original research articles from which proteomic or metabolomic datasets used in our study were derived. Regarding the metabolomic dataset, in the original article (Batra et al., 2023), only two metabolites, 1-methyl-5-imidazoleacetate and N6-carboxymethyllysine, were significantly associated with age. In addition, no metabolites were significantly associated with sex, BMI, and years of education. Regarding the proteomic dataset, in the original article (Johnson et al., 2020), age at death, PMI, and sex were included as covariates in the analyses, though these variables were not found to strongly influence the data (Extended Data Fig.2 in (Johnson et al., 2020)).
  
  (3) The authors make a conclusion statement that suggests intervention: "Collectively, our data suggests that preserving or improving the ability to produce ATP and early intervention in the process of nitrogen metabolism are candidates for the prevention and treatment of dementia" (pg. 12, lines 12-14). This claim is not well-supported by the evidence provided in the study. There are a few limitations: (a) This was an observational, not interventional study; (b) The study did not establish whether the metabolic disruptions are causes or effects in AD; and (c) ATP or other bioenergetic indicators were not directly measured. Therefore, any statements about potential interventions should be removed or qualified as highly speculative.
  
  We agree with the reviewer that the statement regarding potential interventions was not sufficiently supported by our analyses. Accordingly, we have removed the sentence regarding prevention and treatment from the revised manuscript (e.g., we have deleted final paragraph of the previous manuscript).
  
  (4) In conjunction with the last point, the main conclusion of the study is that energy production is down in AD. The data presented in Figure 3 are consistent with this conclusion, but it is far from definitive due to limitations stated above in comments 3a and 3b. The authors should offer additional support for this conclusion: experimental follow-up, flux modeling, analysis of alternative datasets with ATP measurement, causal inference.
  
  We sincerely thank the reviewer for this valuable and constructive suggestion. Regarding flux modeling, we agree that metabolic flux analysis could provide important mechanistic insight. Indeed, previous studies have applied flux modeling in the context of lipid metabolism in Alzheimer’s disease (Baloni et al., 2022). We also attempted to perform flux modeling focusing on energy metabolism. However, we found it difficult to obtain biologically meaningful and robust results and therefore decided not to include these analyses in the current manuscript.
  
  With respect to ATP measurements, we fully agree that direct evidence of altered ATP levels would further strengthen our conclusion. However, to the best of our knowledge, there are currently no publicly available large-scale datasets that directly measure ATP levels in human postmortem brain tissues. This limitation makes it challenging to incorporate validation in the present study.
  
  Regarding experimental follow-up, we agree that functional validation is essential to confirm the mechanistic implications of our findings. We are actively considering follow-up experimental studies. However, we consider the present work to be a multi-omic integrative analysis aimed at identifying key molecular alterations and generating biologically important hypotheses. We have revised the Limitation section to more clearly position this manuscript as an observational systems-level analysis (pg. 13, lines 20-22).
  
  (5) The validation analysis did not sufficiently show the generalizability of this study's results. The authors demonstrated a correlation of 0.53 to the MSBB transcriptomics data and 0.60 to the AMP-AD DiverseCohorts proteomics data. Beyond these correlation coefficients, no meaningful comparison between the datasets is offered. How concordant are the differentially expressed features (or pathways) between the datasets? How robust would the trans-omic network be if incorporating the alternate datasets? Is the main conclusion (energy metabolism is down in AD) supported by the validation datasets? We think this analysis should be expanded and described in the main text.
  
  Although the results for external metabolomics datasets are reported in Fig S2C, correlation coefficients with the external data are not reported. The authors state, "Note that each study used different definitions for AD and CT groups, had variations in measurement methods and brain regions analyzed." We appreciate these limitations. However, the external data should be re-analyzed using the same definitions of AD and CT, if possible. The limitations and results (which DEMs are shared between datasets) should be discussed in the main text.
  
  We thank the reviewer for this important comment regarding the generalizability of our findings. In the revised manuscript, we have expanded the validation analyses and summarized the results in Figure S2. First, at the transcriptomic level, Figure S2B and S2C show the overlap between up- and downregulated genes in AD identified in our ROSMAP-derived analyses and those reported in a previously published large-scale meta-analysis of 2,114 postmortem samples across seven brain regions (Wan et al., 2020). A substantial proportion of DEGs were shared, supporting cross-cohort and cross-region robustness to some extent. At the proteomic level, Figure S2E shows a comparison between the ROSMAP and the AMP-AD DiverseCohorts datasets. We highlighted the subset of enzymes involved in the energy metabolism analysis shown in Fig. 3 and calculated a separate correlation coefficient for this subset (Pearson coefficient = 0.86, p-value = 1.5e-7), further supporting our main conclusion. In addition, to assess the concordance between the two datasets in a threshold-independent manner, we additionally performed Rank-Rank Hypergeometric Overlap (RRHO) analysis (Figure S2E). RRHO analysis (Cahill et al., 2018; Plaisier et al., 2010) enables the comparison of ranked protein lists without relying on arbitrary differential expression cutoffs and has been used for cross-dataset comparison in several previous studies (Fröhlich et al., 2024; Maitra et al., 2023). The RRHO heatmaps demonstrated significant enrichment in the concordant quadrants, confirming systematic agreement between datasets beyond simple correlation coefficients. For metabolomics, Figure S2G shows RRHO analyses comparing the ROSMAP metabolomic data with other datasets measured by the same UPLC-MS/MS platform (Batra et al., 2024; Novotny et al., 2023), demonstrating significant concordance in ranked metabolite changes in AD.
  
  (6) The glycolysis analysis and discussion needs more development. Glycolysis and gluconeogenesis share many of the same enzymes, but they are not the same pathway and should not be discussed as such. To make a claim about the overall influence of enzyme and metabolite levels on glycolysis, the authors should focus on the energetically committing steps of glycolysis (hexokinase, phosphofructokinase, pyruvate kinase) in Figure 3A, and include the full/current version of the figure in the supplement. Gluconeogenesis-specific enzymes (pyruvate carboxylase, PEPCK) are not mentioned at all - are they among the DEPs/DEGs?
  
  We appreciate the reviewer’s comment regarding the distinction between glycolysis and gluconeogenesis pathway. Among the gluconeogenesis-specific enzyme proteins, G6PC1, FBP1, PC, and PCK2 were measured in our dataset, but none of them were identified as DEPs. In addition, gluconeogenesis is a process that occurs primarily in the liver and kidney rather than the brain. Given this biological context and the lack of significant changes in relevant enzymes, we have revised the terminology throughout the manuscript, replacing “glycolysis/gluconeogenesis pathway” with “glycolysis pathway” in the revised version.
  
  (7) Given that there wasn't good concordance between the DEGs and DEPs, did including the mRNA and transcription factor layers in the network really add anything useful? It seems like the main conclusions of the manuscript were driven by the protein and metabolite layers only. How many of the DE metabolic enzymes were coregulated at the transcript and protein level? It would be useful to include the 5-layer trans-omic network in the supplement to display these results. Given your network, at what level does it appear that energy metabolism is regulated?
  
  It is true that our primary conclusion regarding the regulation of energy metabolism is driven by the changes in protein and metabolite abundance. However, we consider the low concordance between mRNA and protein expression itself to be an important feature of AD pathology, as also reported in previous studies (Johnson et al., 2022; Tasaki et al., 2022). Although we did not perform a further analysis of this discordance, we believe that including the TF and mRNA layers into the metabolic trans-omic network strengthens a system-wide view of metabolic dysregulation in AD.
  
  Regarding the mRNA changes corresponding to the DEP enzymes, please refer to Figure S7A.
  
  (8) Comment further on the results from Figure 2D. What can be learned from identifying metabolites with the greatest degree centrality? What pathways other than energy metabolism are highlighted by the trans-omic network?
  
  We assume that some energetic indicators, including AMP and acetyl-CoA, and nitrogen metabolism-related metabolites, Glu, 2-oxoglutarate, and urea, can be potential key regulators of dysregulated metabolism in AD.
  
  (9) (Suggestion) We suggest the authors leverage their trans-omic network in additional ways beyond giving a snapshot of a few energy metabolism pathways. The analysis of top DEMs could go further. What pathways are impacted beyond energy metabolism? Among the metabolic reactions allosterically regulated by top DEMs, what metabolic pathways are enriched?
  
  We identified the enriched metabolic pathways that were allosterically regulated by DEMs in AD using Fisher’s exact test. Alanine, aspartate, and glutamate metabolism pathways were significantly enriched in 2-oxoglutarate, glutarate, alanine, and glutamate-regulating metabolic reactions. Arginine and proline metabolism pathway was enriched in N-methyl-L-arginine and putrescine-regulating metabolic reactions. Arginine biosynthesis pathway was enriched in arginine-regulating metabolic reactions. Glycerophospholipid metabolism pathway was enriched in CDP-ethanolamine-regulating metabolic reactions. Glycine, serine, and threonine metabolism pathway was enriched in serine-regulating metabolic reactions. Purine metabolism pathway was enriched in AMP-regulating metabolic reactions. Pyrimidine metabolism pathway was enriched in deoxyuridine and thymidine-regulating metabolic reactions. Sphingolipid metabolism pathway was enriched in sphingosine-regulating metabolic reactions. However, this analysis did not yield sufficiently valuable insights into the regulatory relationships among biomolecules in AD. Thus, we did not include these results in the revised manuscript.
  
  (10) (Suggestion) Figure 3 shows that most differential signal in AD points to lower energy production due to the combination of differentially expressed metabolites and enzymes, but we are not given much context about the strength of these among all the differential signals. We would suggest including volcano plots where the features of interest, i.e. DE enzymes and metabolites, are colored differently (or a similar figure).
  
  We thank the reviewer for this constructive suggestion. To provide better context regarding the importance of the differential signals, we have added volcano plots for mRNAs, proteins, and metabolites in Figure S4A, B, and C.
  
  (11) (Suggestion) The PPI network could be better leveraged to understand metabolic changes in AD. If nodes are grouped into subnetworks (e.g. by Louvain / Leiden clustering) and tested for pathway enrichment, could you find functional subnetworks of coordinately up- and down- regulated metabolic enzymes? This could yield some pathways of interest beyond the energy metabolism pathways already highlighted.
  
  We appreciate the reviewer’s suggestion to utilize the PPI network for subnetwork analysis. However, it is important to note that the proteomic dataset analyzed in this study is derived from the original work of (Johnson et al., 2020). In that paper, the authors already performed a Weighted Gene Co-expression Network Analysis (WGCNA) across several datasets to identify co-expressed modules and functional pathways.
  
  Given this, we assumed that applying additional clustering methods to the same dataset would be unlikely to yield significant biological insights beyond the established findings.
  
  Minor comments
  
  (1). "All genes" and "all metabolites" should not be the background for the proteomic and metabolic pathway enrichment analysis by Metascape and MetaboAnalyst. The background should be limited to the proteins and metabolites that were measured.
  
  We fully agree with the reviewer that using “all gene” or “all metabolites” as a background is not suitable for enrichment analyses. As suggested, we have revised the enrichment analyses using the measured proteins and metabolites as a background in both Metascape and MetaboAnalyst (Fig. S4D).
  
  (2) Highlight the metabolic enzymes in Fig S2B. Calculate a separate correlation coefficient for the enzymes extracted in the energy metabolism analysis from Fig 3.
  
  We appreciate the reviewer’s suggestion to refine the correlation analysis. As requested, we have revised Fig. S2D to explicitly highlight the subset of enzymes involved in the energy metabolism analysis shown in Fig. 3. We calculated a separate correlation coefficient for the subset (Pearson coefficient = 0.86, p-value = 1.5e-7).
  
  (3) Use a multiple hypothesis adjusted p-value or q-value in Figure S3.
  
  We agree with the reviewer regarding the necessity of correcting for multiple comparisons. Accordingly, we have revised Fig. S4D using q-values.
  
  (4) Describe the methods used to calculate the logFC values from the validation dataset.
  
  We have revised the Methods to include a detailed description of the procedure used to calculate the log2FC values for the validation datasets (pg. 21, lines 13-15).
  
  (5) It is difficult to read Figure 3. We would recommend really emphasizing to the reader to refer to Fig S7B as a "key" to this figure. The description of the red/blue arrows and nodes in the methods section (pg. 24, lines 21-36, pg 25, lines 1-4) were also helpful, but very lengthy. We recommend putting an abridged version of this description into the Fig S7 figure legend.
  
  We appreciate the feedback regarding the readability of Fig. 3. As recommended, we have revised the manuscript to explicitly direct readers to Fig. S8B as an essential “key” for interpreting the network visualization (pg. 8, lines 28). Furthermore, we have added an abridged description of the network elements to the legend of Fig. S8B.
  
  (6) The S7 figure legend should refer to panels A and B, not E and F.
  
  We apologize for this oversight. We have corrected the legend of Fig. S8.
  
  (7) (Suggestion) Are any of the differentially expressed metabolites allosteric regulators of the DE transcription factors? This could be interesting to discuss.
  
  We appreciate the reviewer’s insightful suggestion about the potential allosteric regulation of the DETFs by DEMs. We conducted an extensive literature search to identify any reports related to this perspective. However, to the best of our knowledge, no such direct interactions have been reported to date.
  
  Reviewer #1 (Significance):
  
  The study's strength lies in leveraging three omics modalities across large patient cohorts (n ~ 150-240) to identify coherent signals between transcriptomics, proteomics, and metabolomics in postmortem DLPFC tissue. It was encouraging to see that the main result, showing downregulation for TCA, oxidative phosphorylation, and ketone body metabolism, emerged from consistent signals across both proteomics and metabolomics. This result was consistent with previous findings in other models cited by the author4,5 and other studies 6,7 demonstrating deficiency in energy-producing pathways in AD.
  
  Another strength of the study is the application of thoughtful methodology to connect differentially expressed proteins and metabolites via an intermediate data layer of metabolic reactions. The authors leverage the KEGG and BRENDA databases and apply sound logic to estimate the effects of enzyme level and metabolite level on pathway activity, with metabolites serving as substrate, product, or allosteric regulator for reactions. This trans-omic network methodology was developed in previous studies cited by the author8,9.
  
  However, as written, this study is limited in its contribution of new knowledge to the AD research field. The main conclusion (energy production is down in AD, due to regulatory disruption of energy metabolism) is not strongly supported (see comments 1, 3, and 4 for elaboration). The evidence could be improved by orthogonal approaches: further experimentation, further integration of external datasets, causal modeling, or flux modeling. Alternatively, even in the absence of new experimental and computational approaches, the story could be made more complete by further leveraging the trans-omic network to provide insights into (a) the regulation of energy metabolism; and (b) the impacts of key disrupted metabolites (see comments 7-9).
  
  The study is also limited in its demonstrating the power of these methodologies to provide integrative insights. As mentioned above, the integration of enzyme levels and metabolite levels is clearly useful (Figure 3). In contrast, the utility of the mRNA and transcription factor layers was not evident. The study did not appear to improve or expand upon trans-omic network methodology described in the previous works. Finally, the various analyses (analyzing the trans-omic network for nodes with the highest degree centrality, the PPI analysis, and viewing the energy metabolism pathways in the network) provided disparate results that were only tenuously connected in the discussion section.
  
  Reviewer #2 (Evidence, reproducibility and clarity):
  
  Summary
  
  This manuscript integrates public transcriptomic, proteomic, and metabolomic datasets from ROSMAP DLPFC samples to construct a multi-layer metabolic trans-omic network in Alzheimer's disease. By linking transcription factors, enzyme mRNAs, proteins, metabolic reactions, and metabolites, the authors report coordinated downregulation of the TCA cycle, oxidative phosphorylation, and ketone body metabolism, along with mixed regulatory signals in glycolysis/gluconeogenesis. They interpret these patterns as indicative of broad energetic dysfunction and alterations in amino-acid/nitrogen metabolism in AD. While the framework is conceptually appealing, much of the analysis remains descriptive, and several biological interpretations extend beyond what the data can robustly support. The reliance on bulk tissue without accounting for cell-type composition, limited covariate adjustment, and the absence of validation or sensitivity analyses reduce confidence in the mechanistic conclusions. Overall, the study provides a preliminary systems-level overview, but additional rigor is needed before the proposed trans-omic regulatory insights can be considered convincing.
  
  Major Comments
  
  (1) Interpretation requires more cautious phrasing, and validation is essential. The manuscript frequently asserts that specific pathways are "inhibited" or that energetic deficits are "compensated," but these conclusions extend beyond what the descriptive, bulk-level data can support. Because no metabolic flux, causality, or direct functional measurements are included, the results should be framed as putative regulatory shifts, not confirmed impairments. Critically, key claims about pathway inhibition would require flux modeling, perturbation analyses, or experimental validation to be convincing. Without such validation, the mechanistic interpretations remain speculative.
  
  We thank the reviewer for this crucial comment. We fully agree that, given the descriptive and bulk-level nature of our analysis, mechanistic interpretations must be made with caution. In the absence of direct metabolic flux measurements or experimental validation, our findings should be interpreted as putative regulatory shifts rather than confirmed functional impairments. Accordingly, we have revised the manuscript to temper mechanistic claims. We have replaced definitive statements with more speculative phrasing (e.g., “Our analysis revealed a putative coordinated downregulation …” instead of “Our analysis revealed a coordinated downregulation …” in Abstract section; “we demonstrate the systems-level view of the potential dysregulated energy production …” instead of “we demonstrate the systems-level view of the dysregulated energy production …” in pg. 10, lines 25-26).
  
  (2) Although the authors acknowledge this in the limitations, bulk-level differences may primarily reflect altered proportions of neurons, astrocytes, microglia, and oligodendrocytes rather than true within-cell-type regulation. Incorporating a cell-type deconvolution or performing a sensitivity analysis would substantially improve interpretability. This issue also impacts the trans-omic network: if the molecules included originate from different cell types, the inferred regulatory relationships may not reflect true intracellular processes.
  
  We appreciate the reviewer’s point that bulk-level differences can reflect altered proportions of different brain cell types, subsequently affecting the inferred trans-omic network analysis. To assess the changes in cell type proportions of the samples that we used in our study, we additionally used public single-cell transcriptomic datasets, which were obtained from DLPFC tissue of 465 subjects in the ROSMAP cohort (Green et al., 2024). For each omic data that we used in our analyses, we matched the same subjects and calculated the following cell type proportions, astrocytes, excitatory neurons, inhibitory neurons, microglias, oligodendrocytes, and OPCs. Then, we statistically compared the cell type proportions between control subjects and patients with AD (Fig. S3). In the transcriptomic data, we confirmed that the proportion of inhibitory neurons in the AD group was smaller than in the CT group, and that the proportion of oligodendrocytes in the AD group was larger than in the CT group. In the proteomic data, we did not observe any statistically significant changes in the cell type proportion between the two group. In the metabolomic data, we found that the proportion of inhibitory neurons in the AD group was smaller than in the CT group (pg. 6, lines 8-11).
  
  (3) Differential analysis covariates. For the differential expression analyses, only gender and PMI were included as covariates. Additional variables, such as age at death, RIN, neuropathological measures, and comorbidities, can strongly influence molecular profiles and should be considered to ensure that the observed differences reflect AD-related biology rather than confounding pathological or technical factors.
  
  We appreciate the reviewer’s comment regarding the included covariates in differential analyses of our study. The reason we did not include other variables, including age at death and RIN, is that these data for each sample were not available. Thus, we referred to original research articles from which proteomic or metabolomic datasets used in our study were derived. Regarding the metabolomic dataset, in the original article (Batra et al., 2023), only two metabolites, 1-methyl-5-imidazoleacetate and N6-carboxymethyllysine, were significantly associated with age. In addition, no metabolites were significantly associated with sex, BMI, or education. Regarding the proteomic dataset, in the original article, age at death, PMI, and sex were included as covariates in the analyses, though these variables were not found to strongly influence the data (Extended Data Fig.2 in (Johnson et al., 2020)).
  
  (4) Network stability and sample non-overlap. Proteomic, transcriptomic, and metabolomic data come from partially overlapping individuals. The authors should test whether the reconstructed network is robust to: different significance thresholds, restricting analyses to overlapping samples and alternative definitions of AD vs control.
  
  We appreciate the reviewer’s comment for the trans-omic network stability. In our study, the number of individuals for whom all omic modalities were measured was relatively small (n=25 in CT and n=35 in AD). This limited overlap reduces statistical power and can affect the downstream network construction. We have acknowledged this limitation in the revised manuscript and clarified that the reconstructed networks should be interpreted with caution regarding reproducibility and generalizability (pg. 13, lines 13-23).
  
  Minor Comments
  
  (1) Some TF enrichment and regulatory inferences lack explicit mention of multiple-testing correction.
  
  We apologize for the lack of clarity in our original description. We have corrected for multiple-testing for the TF inference. Thus, we have revised the Methods section to explicitly describe the correction method used and the threshold applied (pg. 23, lines 23-24).
  
  (2) The limitations section is strong but should explicitly discuss the influence of postmortem interval on metabolite levels.
  
  We appreciate the reviewer’s comment about the effect of postmortem interval on changes in metabolite levels. Accordingly, we have added the description of this perspective in our revised manuscript (pg. 13, lines 1-5).
  
  Reviewer #2 (Significance):
  
  The study extends a trans-omic integration framework, originally applied to metabolic disease, into the context of Alzheimer's pathology. Although the biological findings largely confirm known alterations in mitochondrial and energy metabolism, the network-based approach offers a structured way to view cross-layer regulatory changes. Its main advance is conceptual rather than biological, providing a unified framework rather than uncovering fundamentally new mechanisms. This work will primarily interest researchers in neurodegeneration and systems biology, as well as computational groups developing multi-omics integration methods.
  
  Reviewer #3 (Evidence, reproducibility and clarity):
  
  This study leverages existing transcriptomic, metabalomic and proteomic datasets from prefrontal cortex (PFC) to assess metabolic dysregulation in Alzheimer's disease (AD). They found a downregulation of multiple metabolic pathways, including TCA cycle, oxidative phosphorylation, and ketone metabolism, that may explain bioenergetic alterations in AD.
  
  The study used matching ROSMAP omics datasets from the DLPFC that have allowed more robust data integration. However, the datasets are all generated using bulk tissue, which makes data interpretation difficult. For example, the AD changes they observed may be due to shifts in cell type proportion with disease (e.g. cell death, neuron inflammation). Did the authors account for any potential shifts in cell type proportion in their analysis?
  
  If the assumption is that the changes in AD are cell intrinsic, which cell types are likely to be impacted? Can the authors integrate any existing single-cell analysis to infer which cell types may be driving the signals they detect, and whether this accounts for some of the antagonistic regulatory effects that were detected?
  
  We thank the reviewer for their insightful comments. We agree that the use of bulk tissue datasets cannot account for cell-type heterogeneity. As noted in our Limitations section (pg. 12, lines 24-27), we recognize that previous studies have found that the Braak stage is correlated positively with microglia and astrocyte proportions and negatively with oligodendrocyte proportion (Hannon et al., 2024; Shireby et al., 2022). Regarding the integration of single-cell analysis, we have referenced recent snRNA-seq findings (Mathys et al., 2024) in our Limitations section (pg. 12, lines 28-32) to deconvolve our bulk signatures.
  
  Furthermore, in our revised manuscript, we additionally used public single-cell transcriptomic datasets, which were obtained from DLPFC tissue of 465 subjects in the ROSMAP cohort (Green et al., 2024). For each omic data that we used in our analyses, we matched the same subjects and calculated the following cell type proportions, astrocytes, excitatory neurons, inhibitory neurons, microglia, oligodendrocytes, and OPCs. Then, we statistically compared the cell type proportions between control subjects and patients with AD (Fig. S3). In the transcriptomic data, we confirmed that the proportion of inhibitory neurons in the AD group was smaller than in the CT group, and that the proportion of oligodendrocytes in the AD group was larger than in the CT group. In the proteomic data, we did not observe any statistically significant changes in the cell type proportion between the two groups. In the metabolomic data, we found that the proportion of inhibitory neurons in the AD group was smaller than in the CT group (pg. 6, lines 8-11).
  
  Reviewer #3 (Significance):
  
  The manuscript provides multimodal insight into metabolic dysregulation in AD in the PFC. Given that metabolic dysfunction is likely to play a major in disease pathogenesis, this is a study of importance. However, the findings lack granularity at the cell type level, which limits the impact of the study.
  
  Reference
  
  (1) Baloni, P., Arnold, M., Buitrago, L., Nho, K., Moreno, H., Huynh, K., Brauner, B., Louie, G., Kueider-Paisley, A., Suhre, K., Saykin, A. J., Ekroos, K., Meikle, P. J., Hood, L., Price, N. D., Alzheimer’s Disease Metabolomics Consortium, Doraiswamy, P. M., Funk, C. C., Hernández, A. I., … Kaddurah-Daouk, R. (2022). Multi-Omic analyses characterize the ceramide/sphingomyelin pathway as a therapeutic target in Alzheimer’s disease. Communications Biology, 5(1), 1074.
  
  (2) Baloni, P., Funk, C. C., Yan, J., Yurkovich, J. T., Kueider-Paisley, A., Nho, K., Heinken, A., Jia, W., Mahmoudiandehkordi, S., Louie, G., Saykin, A. J., Arnold, M., Kastenmüller, G., Griffiths, W. J., Thiele, I., Alzheimer’s Disease Metabolomics Consortium, Kaddurah-Daouk, R., & Price, N. D. (2020). Metabolic Network Analysis Reveals Altered Bile Acid Synthesis and Metabolism in Alzheimer’s Disease. Cell Reports. Medicine, 1(8), 100138.
  
  (3) Batra, R., Arnold, M., Wörheide, M. A., Allen, M., Wang, X., Blach, C., Levey, A. I., Seyfried, N. T., Ertekin-Taner, N., Bennett, D. A., Kastenmüller, G., Kaddurah-Daouk, R. F., Krumsiek, J., & Alzheimer’s Disease Metabolomics Consortium (ADMC). (2023). The landscape of metabolic brain alterations in Alzheimer’s disease. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 19(3), 980–998.
  
  (4) Batra, R., Krumsiek, J., Wang, X., Allen, M., Blach, C., Kastenmüller, G., Arnold, M., Ertekin-Taner, N., Kaddurah-Daouk, R., & Alzheimer’s Disease Metabolomics Consortium (ADMC). (2024). Comparative brain metabolomics reveals shared and distinct metabolic alterations in Alzheimer’s disease and progressive supranuclear palsy. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 20(12), 8294–8307.
  
  (5) Cahill, K. M., Huo, Z., Tseng, G. C., Logan, R. W., & Seney, M. L. (2018). Improved identification of concordant and discordant gene expression signatures using an updated rank-rank hypergeometric overlap approach. Scientific Reports, 8(1), 9588.
  
  (6) Fröhlich, A. S., Gerstner, N., Gagliardi, M., Ködel, M., Yusupov, N., Matosin, N., Czamara, D., Sauer, S., Roeh, S., Murek, V., Chatzinakos, C., Daskalakis, N. P., Knauer-Arloth, J., Ziller, M. J., & Binder, E. B. (2024). Single-nucleus transcriptomic profiling of human orbitofrontal cortex reveals convergent effects of aging and psychiatric disease. Nature Neuroscience, 27(10), 2021–2032.
  
  (7) Green, G. S., Fujita, M., Yang, H.-S., Taga, M., Cain, A., McCabe, C., Comandante-Lou, N., White, C. C., Schmidtner, A. K., Zeng, L., Sigalov, A., Wang, Y., Regev, A., Klein, H.-U., Menon, V., Bennett, D. A., Habib, N., & De Jager, P. L. (2024). Cellular communities reveal trajectories of brain ageing and Alzheimer’s disease. Nature, 633(8030), 634–645.
  
  (8) Hannon, E., Dempster, E. L., Davies, J. P., Chioza, B., Blake, G. E. T., Burrage, J., Policicchio, S., Franklin, A., Walker, E. M., Bamford, R. A., Schalkwyk, L. C., & Mill, J. (2024). Quantifying the proportion of different cell types in the human cortex using DNA methylation profiles. BMC Biology, 22(1), 17.
  
  (9) Johnson, E. C. B., Carter, E. K., Dammer, E. B., Duong, D. M., Gerasimov, E. S., Liu, Y., Liu, J., Betarbet, R., Ping, L., Yin, L., Serrano, G. E., Beach, T. G., Peng, J., De Jager, P. L., Haroutunian, V., Zhang, B., Gaiteri, C., Bennett, D. A., Gearing, M., … Seyfried, N. T. (2022). Large-scale deep multi-layer analysis of Alzheimer’s disease brain reveals strong proteomic disease-related changes not observed at the RNA level. Nature Neuroscience, 25(2), 213–225.
  
  (10) Johnson, E. C. B., Dammer, E. B., Duong, D. M., Ping, L., Zhou, M., Yin, L., Higginbotham, L. A., Guajardo, A., White, B., Troncoso, J. C., Thambisetty, M., Montine, T. J., Lee, E. B., Trojanowski, J. Q., Beach, T. G., Reiman, E. M., Haroutunian, V., Wang, M., Schadt, E., … Seyfried, N. T. (2020). Large-scale proteomic analysis of Alzheimer’s disease brain and cerebrospinal fluid reveals early changes in energy metabolism associated with microglia and astrocyte activation. Nature Medicine, 26(5), 769–780.
  
  (11) Maitra, M., Mitsuhashi, H., Rahimian, R., Chawla, A., Yang, J., Fiori, L. M., Davoli, M. A., Perlman, K., Aouabed, Z., Mash, D. C., Suderman, M., Mechawar, N., Turecki, G., & Nagy, C. (2023). Cell type specific transcriptomic differences in depression show similar patterns between males and females but implicate distinct cell types and genes. Nature Communications, 14(1), 2912.
  
  (12) Mathys, H., Boix, C. A., Akay, L. A., Xia, Z., Davila-Velderrain, J., Ng, A. P., Jiang, X., Abdelhady, G., Galani, K., Mantero, J., Band, N., James, B. T., Babu, S., Galiana-Melendez, F., Louderback, K., Prokopenko, D., Tanzi, R. E., Bennett, D. A., Tsai, L.-H., & Kellis, M. (2024). Single-cell multiregion dissection of Alzheimer’s disease. Nature, 632(8026), 858–868.
  
  (13) Novotny, B. C., Fernandez, M. V., Wang, C., Budde, J. P., Bergmann, K., Eteleeb, A. M., Bradley, J., Webster, C., Ebl, C., Norton, J., Gentsch, J., Dube, U., Wang, F., Morris, J. C., Bateman, R. J., Perrin, R. J., McDade, E., Xiong, C., Chhatwal, J., … Harari, O. (2023). Metabolomic and lipidomic signatures in autosomal dominant and late-onset Alzheimer’s disease brains. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 19(5), 1785–1799.
  
  (14) Plaisier, S. B., Taschereau, R., Wong, J. A., & Graeber, T. G. (2010). Rank-rank hypergeometric overlap: identification of statistically significant overlap between gene-expression signatures. Nucleic Acids Research, 38(17), e169.
  
  (15) Shireby, G., Dempster, E. L., Policicchio, S., Smith, R. G., Pishva, E., Chioza, B., Davies, J. P., Burrage, J., Lunnon, K., Seiler Vellame, D., Love, S., Thomas, A., Brookes, K., Morgan, K., Francis, P., Hannon, E., & Mill, J. (2022). DNA methylation signatures of Alzheimer’s disease neuropathology in the cortex are primarily driven by variation in non-neuronal cell-types. Nature Communications, 13(1), 5620.
  
  (16) Tasaki, S., Xu, J., Avey, D. R., Johnson, L., Petyuk, V. A., Dawe, R. J., Bennett, D. A., Wang, Y., & Gaiteri, C. (2022). Inferring protein expression changes from mRNA in Alzheimer’s dementia using deep neural networks. Nature Communications, 13(1), 655.
  
  (17) Varma, V. R., Wang, Y., An, Y., Varma, S., Bilgel, M., Doshi, J., Legido-Quigley, C., Delgado, J. C., Oommen, A. M., Roberts, J. A., Wong, D. F., Davatzikos, C., Resnick, S. M., Troncoso, J. C., Pletnikova, O., O’Brien, R., Hak, E., Baak, B. N., Pfeiffer, R., … Thambisetty, M. (2021). Bile acid synthesis, modulation, and dementia: A metabolomic, transcriptomic, and pharmacoepidemiologic study. PLoS Medicine, 18(5), e1003615.
  
  (18) Wan, Y.-W., Al-Ouran, R., Mangleburg, C. G., Perumal, T. M., Lee, T. V., Allison, K., Swarup, V., Funk, C. C., Gaiteri, C., Allen, M., Wang, M., Neuner, S. M., Kaczorowski, C. C., Philip, V. M., Howell, G. R., Martini-Stoica, H., Zheng, H., Mei, H., Zhong, X., … Logsdon, B. A. (2020). Meta-Analysis of the Alzheimer’s Disease Human Brain Transcriptome and Functional Dissection in Mouse Models. Cell Reports, 32(2), 107908.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.26.678758v3
www.medrxiv.org www.medrxiv.org

The age and sex dynamics of heterosexual HIV transmission in Zambia: an HPTN 071 (PopART) phylogenetic and modelling study

1
1. Public_Reviews 13 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This important study provides evidence for our understanding of HIV transmission dynamics by age and sex in Zambia during the PopART trial; by combining phylogenetic and individual-based mathematical modelling (IBM), it adds depth to the epidemiological literature and may inform more strategic allocation of HIV prevention resources in sub-Saharan Africa. The authors employ two complementary and well-established methodologies (phylogenetics and IBM), and this dual approach is a notable strength. However, the evidence supporting key conclusions is incomplete, with several claims insufficiently substantiated by the data presented. Improvements in data presentation (e.g., quantification of qualitative statements, statistical estimates, and clearer description of results) would substantially strengthen the paper.
  
  We thank the editor and reviewers for their positive comments. We have revised the manuscript in response to the points raised, as described below.
  
  First of all, we would like to summarise what we have changed regarding the presentation of summary statistics throughout the text. We agree that many of the statements in the original submission tended towards being qualitative. This was the result of shying away from presenting two separate estimates, with different ways of quantifying uncertainty, in the text. The phylogenetics could be presented as mean and confidence interval, while the IBM would need some measure of centrality (mean or median) and the highest density interval for a summary statistic (e.g. the mean age gap) as it varies over the posterior. These are not directly comparable. We have now changed this to present both where appropriate, with cautionary note about the difference between the CIs and HDIs (lines 257-260).
  
  We also were somewhat arbitrary regarding where we chose to summarise the posterior in the IBM or look at the best-fitting single simulation, and where we presented the mean as opposed to the median. We have done a considerable overhaul of what is presented in this revision:
  
  (1) We always present the posterior summary unless the level of detail is such that summarising uncertainty over the posterior is not feasible (e.g. in figures 3, 4 and 5). In the latter case we still use the best-fitting IBM replicate.
  
  (2) In the main text we always present the mean. For the phylogenetics the summary statistics are mean and confidence interval. For the IBM this is the posterior mean, and 95% HDI, of the mean of a particular statistic as calculated in each of the 1000 IBM replicates. For example, each replicate will have its own distribution of male source ages which have a mean value. These means also vary over the posterior, and a mean of them is calculated, as well as the HDI interval to represent posterior uncertainty. This “mean of means” may be a slightly confusing piece of terminology at first glance, but it allows us to properly capture posterior uncertainty in a way we mostly avoided in the first submission.
  
  One result of 1) above is a change to figure 6. It is now summarised over the posterior, with the result that time trends that were previously not evident become clear. This changes our conclusions slightly (lines 529-537) but it should be noted that the magnitudes of the trends remain small.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This manuscript describes the results of phylogenetic and epidemiological modeling of the PopART community cohorts in Zambia. The current manuscript draft is methodologically strong, but needs revision to strengthen the take-home messages. As written, there are many possible take-away conclusions. For example, the agreement between IBM and phylogenetic analysis is noteworthy and provides a methodological focus. The revealed age patterns of transmission could be a focus. The effects of the PopART intervention and the consequences of a 1-year disruption could be a focus. It is important, though, that any main messages summarized by the authors are substantiated by the evidence provided and do not extrapolate beyond the data that have been generated. I recommend that the authors think deeply about what the most important, well-supported messages are and reframe the discussion and abstract accordingly.
  
  We have rewritten the abstract, and also made changes to the discussion in order to centre our message around the contribution of particular of demographic groups to transmission, and how, with that contribution revealed, such groups can be selected for specialised interventions.
  
  Strengths/weaknesses by section:
  
  (1) ABSTRACT
  
  The Abstract summarizes qualitative findings nicely, but the authors should incorporate quantitative results for all of the qualitative findings statements.
  
  The abstract in the revision is extensively revised, and contains quantitative estimates throughout, from both methodologies where appropriate.
  
  The ending claim is not substantiated by the modeling scenarios that have been run: "targeted interventions for demographic groups such as under-35 men may be the key to finally ending HIV." It is straightforward to run this specific scenario in the model to determine whether or not this is true.
  
  Our modelling framework is not set up to model the “last mile” of HIV elimination, notably as it has no component for MSM or FSW transmission, and we do not feel that we could confidently present results regarding it. As a result, this statement has been greatly softened in the new abstract (lines 75-78).
  
  The authors should add confidence intervals to the quantitative metrics, such as the 93.8% and 62.1% incidence reduction.
  
  These have been added.
  
  (2) RESULTS
  
  The authors should check the Results section for any qualitative claims not substantiated by the analyses performed, and ensure the corresponding analyses are presented to support the claims.
  
  The Results and Methods describe the model's implementation of the PopART intervention differently. The Methods describes it as including VMMC, TB, and STI services, while the Results only mentions intensified HIV testing and linkage.
  
  This is a slight misreading of the text. That paragraph in the Methods is describing the trial itself, not the modelling framework.
  
  A limitation of the model is that HIV disease progression is based on the ATHENA cohort in the Netherlands, which is a different HIV subtype (B) than the one in the research setting (C). The model should be configured using subtype C progression data, which have been published, or at least a sensitivity analysis should be conducted with respect to disease progression assumptions.
  
  The available literature does not suggest a significant difference in progression between subtypes B and C, and we have added text and citations to this effect (lines 699-701).
  
  In Table 2, the authors should consider adding a p-value to establish whether or not IBM and phylogenetics estimates are different.
  
  We have done this; the appropriate test was a posterior predictive check. See lines 261-263, 575-579 and 805-814.
  
  (3) DISCUSSION
  
  The literature review and comparison of study results to previously published phylogenetic studies is very nice. The authors could strengthen this by providing quantitative estimates with CIs for a more scientific comparison of the study results vs. prior studies, perhaps as a table or figure.
  
  We have expanded the discussion on this point (lines 504-527). We considered adding a table, but the existing literature that directly answers the questions we ask is quite limited and fragmentary. For example, Monod et al do not present a complete treatment of age gaps. The literature using regression analyses to identify predictors of HIV prevalence or incidence related to partner age is extensive, but those results are not directly comparable to ours.
  
  The authors state that due to "the narrow geographical catchment area... The results should not be automatically extrapolated to apply to other SSA settings." The authors should exercise this caution when comparing the results to studies in South Africa and elsewhere.
  
  We have made more explicit acknowledgements of these limitations (lines 598-600).
  
  There are many other limitations to the analysis, including some mentioned above, that are not acknowledged. The authors should think carefully about what the most important limitations are and acknowledge them honestly at the end of the Discussion section.
  
  The limitations paragraph has been revised (lines 598-605).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors analyzed PopART data to better characterize the age and sex-specific heterosexual HIV transmission dynamics in Zambia, with the goal of allocating resources.
  
  Strengths:
  
  Important analysis to hone in on the key driver of HIV transmission in Zambia, which hopefully can be used to tune prevention efforts to maximize effect while limiting required resources. Two analytic approaches were used, and while the phylogenetic data were markedly more limited, they mirrored the simulated epidemic. The authors did a nice job reviewing the limitations of the data and the analyses. The authors did a nice job of providing analyses to support their goals and hypothesis, and this work may have more impact now that resources in SSA for HIV prevention and treatment may become more scarce
  
  Weaknesses:
  
  To increase the impact and utility of this work, it would be helpful to parse the analysis just a bit further to estimate the roles of undiagnosed vs diagnosed and untreated subpopulations on this transmission. PopART is a multifaceted intervention, but the cost, effort, and approach to reengagement in care vs testing/treatment can be quite different.
  
  We have now provided stratified results by diagnosed and non-diagnosed status of the source, as well as an overall summary of the proportion of undiagnosed sources by age and sex. See lines 305-310, 539-547, and table 3.
  
  Recommendations for the authors:
  
  Reviewing Editor:
  
  We commend you for conducting a rigorous and comprehensive study titled "The age and sex dynamics of heterosexual HIV transmission in Zambia: an HPTN 071 (PopART) phylogenetic and modelling study" that significantly advances the understanding of HIV transmission dynamics in sub-Saharan Africa. The study utilizes an innovative dual-methodology approach integrating individual-based mathematical modelling (IBM) and pathogen phylogenetics to characterize heterosexual HIV transmission patterns by age and sex during the PopART trial in Zambia.
  
  This manuscript reports on HIV transmission dynamics in Zambia using data from the PopART study, combining individual-based modelling and phylogenetic analysis. The use of two independent methodologies enhances confidence in the consistency of the findings and enables robust cross-validation. The work addresses an important topic in HIV prevention, particularly in settings where resources may become more constrained, and offers insight into potential demographic targets for intervention.
  
  However, several aspects of the manuscript limit its current impact. The main take-home messages are diffuse and not clearly presented. Some conclusions in the abstract and discussion appear to go beyond the scope of the presented data. For instance, the claim that targeting under-35 men may be key to ending HIV is not directly tested in the modelling scenarios and should be reframed or removed unless supported by new analyses. Furthermore, important quantitative details, such as confidence intervals, p-values, and precise age group estimates, are lacking in key sections (e.g., the Abstract and Results).
  
  The authors are encouraged to clearly identify and communicate their central findings, ensure all claims are fully supported by their analyses, and make the data more accessible to readers by adding detailed, quantitative summaries where needed.
  
  The following are our recommendations to the Authors:
  
  (1) Clarify Study Objectives and Central Messages
  
  Reframe the abstract and discussion to highlight a clear, well-supported set of main findings.
  
  Avoid overgeneralized or unsubstantiated claims, especially those not directly tested by your model (e.g., the effectiveness of targeting under-35 men).
  
  As stated above, we have revised this text accordingly.
  
  (2) Support Qualitative Claims with Quantitative Data
  
  Provide numerical results, including effect sizes and confidence intervals, wherever qualitative trends are mentioned.
  
  For example, restate: "The largest gaps for female recipients were among the youngest" as "... in the age group XX-YY with OR = Z.Z (95% CI: A.A-B. B)."
  
  As mentioned at the top of the review, we have overhauled the treatment of summary statistics extensively, and now give confidence or highest density intervals throughout the text.
  
  (3) Improve the Results Section
  
  Check that all claims are supported by the analyses, and ensure figure references are accurate.
  
  The statements that went beyond what was supported, notably about ending the epidemic by targeting young men, have been removed. The typo in table references has been fixed.
  
  Annotate Figure 6 with trendline coefficients and p-values where applicable.
  
  The takeaway message of figure 6 has now changed and we no longer see no trend, just a minor one.
  
  Revise Figure 4 for clarity or consider replacing it with a tabular format.
  
  We would prefer to keep the current figure 4, as we have not found any clearer way to illustrate the patterns, which are the consequence of the phenomenon observed in figure 5. We have put more explicit descriptive text in the discussion, linking the two figures (lines 470-476).
  
  (4) Address Potential Bias and Model Assumptions More Rigorously
  
  Explain sampling bias in IBM and phylogenetics (e.g., how the 355 high-confidence phylogenetic pairs were selected).
  
  The reviewer comment regarding the 355 pairs was based on a misapprehension; we used all the pairs we found using the phyloscanner pipeline. There are no sampling bias issues involved in the IBM as every individual in the simulations is considered. Appendix 2 includes some sensitivity analysis results if the procedure used to find the 355 is changed.
  
  Discuss how the use of subtype B disease progression data from the ATHENA cohort may impact results in a subtype C setting. A sensitivity analysis would strengthen this.
  
  Subtype B progression data was used in the absence of any appropriate data from subtype C, but the literature does not suggest any major difference between the two (lines 699-701).
  
  (5) Include More Detail on Undiagnosed Populations and ART Effects
  
  Estimate the roles of undiagnosed and untreated subpopulations in driving transmission.
  
  As mentioned above, this analysis has been added.
  
  Clarify mechanistically how ART might influence age gaps in transmission dynamics.
  
  This now is clarified in the introduction (lines 127-129).
  
  (6) General Improvements
  
  Provide p-values where comparisons are made (e.g., in Table 2).
  
  Use consistent terminology and definitions across Methods and Results.
  
  Add more discussion on limitations, especially regarding generalizability to other SSA settings.
  
  All of these have been inserted as previously mentioned.
  
  By addressing these points, the manuscript would present a more coherent narrative and a stronger, evidence-based contribution to the field. We appreciate you all for your fantastic effort and hope you will reflect the feedback in your final paper.
  
  Reviewer #1 (Recommendations for the authors):
  
  Thank you for the opportunity to review this interesting manuscript.
  
  In the public review, I have recommended that the authors should incorporate quantitative results for all of the qualitative findings statements. As one example, I would recommend that "We found the largest gaps for female recipients were among the youngest of those recipients" is re-written as "The largest gaps for female recipients were in the age group XXX-YYY with OR=ZZZ (XXX-YYY)." such as odds ratios, and specific outcome definitions including ages. To give one more example: "immediate increase in the average age at transmission of both sources and recipients" could be rephrased as "increase in the average age at transmission by XXX (YYY-ZZZ) years for sources and XXX (YYY-ZZZ) for recipients over [TIME PERIOD]."
  
  We hope the revisions we have made to the statistical presentation are satisfactory as a response to this request.
  
  Again in the public review, I recommended checking the Results section for any qualitative claims not substantiated by the analyses performed, and ensuring the corresponding analyses are presented to support the claims. An example is: "Trends are minor or non-existent in the former two variables." - please annotate Figure 6 (assuming the authors meant to reference Figure 6 and not 7 here?) to show over what period trendlines were fit and provide the coefficient and CI. To support the stated claim even more strongly, a p-value might be apt with a null hypothesis of a slope of zero.
  
  Please check the numbering on all figure references in the text, as some appear to be misnumbered. E.g., where the text refers to Figure 7, I believe the authors meant to reference Figure 6.
  
  The change to how we handled the statistics has changed the message of figure 6 (which is now figure 7) and rendered this somewhat moot. We have checked that all figure and table references are now correct.
  
  Figure 3 is very nice, but if the axes were flipped on one panel, it would make them easier to compare, and then adding some statistics to assess whether the patterns are the same or different when a man vs woman is the source.
  
  We have flipped the axes here.
  
  Figure 4 was too complicated for me. I could not follow the Sankey flows because there is too much going on and overlapping. Consider revising to make it easier to digest... perhaps to table format?
  
  As mentioned above, we would prefer to keep this figure, but we have situated it better in the text.
  
  Reviewer #2 (Recommendations for the authors):
  
  A few points that would improve the clarity and the strength of the manuscript
  
  (1) There is a need to clarify more about how the IBM and phylogenetic data does not suffer from sampling bias. For e.g.,
  
  Line 205: What proportion of the transmissions modeled in the IBM from Zambia?
  
  All of them. We confined the analysis of the IBM to the Zambian communities from which phylogenetic data was acquired (lines 755-758).
  
  Line 217: What proportion of the phylogenetic pairs (cherries) suggesting transmission were the 355 that had high confidence in directionality. How do these pairs compare to the others
  
  There was no identification of “cherries” involved in picking these pairs; the phyloscanner procedure does not use that step. We confined our analysis solely to the pairs for which we did identify a direction of transmission; that is the 355. Appendix 2 includes a sensitivity analysis involving varying the parameters by which these were identified.
  
  (2) I appreciate the authors noting that MSM transmissions are unlikely to be playing a role in this cohort, as noted in previous work by the group. However, systematic undersampling of men is common in other study cohorts of HIV. While the MSM and heterosexual networks may be relatively distinct, undersampled men who are bridging the networks could impact the estimates. Can the authors use the time to diagnosis analysis (HIV phyloTSI) to estimate rates of undiagnosed men and women?
  
  We feel that this is beyond the scope of this work. The phylogenetics dataset in its totality could be used for this purpose (although it is probably highly biased towards undiagnosed individuals due to the considerable majority of samples coming from the healthcare facilities). However, we concentrate here solely on the subset involved in our probable transmission pairs, which is fairly small. Extending the scope to an exploration of the full dataset would seem like a separate study, which we do have plans to do.
  
  We have used the IBM for this question instead (lines 303-321), however, as MSM transmission was not modelled, it is also not ideal for answering this question. Ultimately we feel that the way these studies were implemented makes it an unsatisfactory tool for answering the MSM question, important as it is.
  
  (3) Expanding on the point above, in other settings, transmission to young men has been associated with partnerships with older men, and if these young men then transmitted to young women, would we see a similar effect as noted in these models (assuming the young men were less well sampled).
  
  Our previous work (Hall et al., 2024) suggested no excess of identified male-male pairs in the phylogenetics dataset which might suggest cryptic male-to-male transmission. The age disparities would be worth exploring had this been found, but is curtailed by the lack of it.
  
  (4) Related to the point above, is there an estimate of the populations (age and sex) that are undiagnosed in the IBM model? Can this be teased out... is transmission from men to women more likely 2/2 lack of diagnosis... or lack of engagement in care?
  
  We have explored results by diagnostic status as it pertains to age and sex, but we feel that moving on to a more general exploration of the role of diagnosis and lack of engagement in care is again going beyond the scope of what is already a long paper.
  
  (5) I'm still not fully clear as to why ART might affect age gaps. Can this be explained in more detail?
  
  See lines 127-129.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2025.05.02.25326873v3
www.biorxiv.org www.biorxiv.org

4D Single-Cell Spatial Transcriptomics Reveals Dynamic Morphogenetic Gradients and Regenerative Domains in Planarians

1
1. GigaScience 10 Jul 2026
  
  in GigaScience
  
  AbstractRegeneration relies on precise spatiotemporal gene expression and cellular responses to establish tissue identity and body patterning. Using high-resolution Stereo-seq (715 nm) on 353 sections from 16 whole animals at 8 regeneration timepoints, we constructed a 4D spatiotemporal transcriptomic map of planarian regeneration. Our analysis captured 36 refined cell types from 3,508,004 segmented cells, enabling genome-wide transcriptional imputation of gene expression dynamics across body axes at cellular, tissue, and organismal scales. We identified dynamic positional gradients and distinct spatially distributed cell types during regeneration, including an injury-induced Anterior Regenerative Zone (ARZ). The ARZ exhibited enriched positional signals in epidermal, muscle, and neural cells and was regulated by Mediator 8, which is crucial for polarity remodeling and blastema formation. This study provides a comprehensive spatial molecular and cellular map of regenerative processes, highlighting injury-induced spatial domains and key regulatory factors in planarian regeneration. We also provide an interactive web portal, offering a valuable resource for exploring and analyzing regeneration mechanisms in a spatiotemporal context.
  
  This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag064), which carries out single-anonymized peer review. These reviews are published under a CC-BY 4.0 license and were as follows:
  
  Reviewer 2：
  
  In the manuscript '4D single-cell spatial transcriptomics reveals dynamic morphogenetic gradients and regenerative domains in planarians,' Han and colleagues generate a truly stunning spatial transcriptomics dataset of planarian regeneration from the species Schmidtea mediterranea. The authors' dataset includes whole 3D reconstructions of two regenerating planarian fragments at 8 different timepoints during regeneration, a fantastic accomplishment and resource of broad interest to the regenerative biology community. The authors analysis of the dataset includes characterization of spatially biased genes (SBGs) and exploration of an anterior regenerative zone (ARZ) and the role of the gene med8 in its' regulation. While the authors' dataset is remarkable and their analysis of spatially biased genes and med8 function is interesting, I'm not yet convinced that their conclusions are fully tested by the included experiments. In addition, I think that the authors have not included sufficient quality control metrics for their spatial dataset, which makes determining the limitations or caveats of their analysis and conclusions more difficult. However, my concerns could be addressed by additional analysis and minor experiments, or by softening the conclusions of the authors to include alternative models. I've detailed the areas of analysis/discussion that I believe require improvement below:
  
  Major Criticisms: 1. Stereo-seq resolution and capture efficiency: The authors assert that their spatial approach is high enough resolution to resolve cell types and they claim to have characterized 36 cell types in their abstract. However, the 'cell type' in their dataset that they choose to focus on - Clu.31 - has gene markers expressed in three different cell types that have been shown to be distinct in the literature and prior planarian atlases. The authors should analyze gene expression signatures of other stereo-seq 'cell types' to determine if they also show mixed expression signatures. In addition, I am curious if stereo-seq is more likely to capture highly expressed genes (like those expressed in parenchymal cell types) than more lowly expressed genes (like the transcription factors expressed in stem cells). If it exists, this bias could influence annotation of cell types in highly heterogeneous regions of the worm like the parenchyma or parapharyngeal region. Finally, there is very little QC data in the supplementary materials (Size/volume of segmented cells, UMIs and features per cell, variability in features/UMIs per section, per replicate, and per cell type, etc.) I think this analysis would be highly valuable for the reader to interpret the data and the 36 identified 'cell types'.
  
  Dynamics of spatially biased genes: The authors analysis on the dynamics of spatially biased genes (SBGs) is very interesting, but the 'oscillations' the authors referred to were not clear to me in the data across all or even most of the pattern clusters in Figure 2A. In general, it seemed more like the pattern cluster was 'noisy' or more broad before stabilizing to its final location. In addition, the PCA analysis in Figure 2B seems to show that Intact and 14dpa transcriptomics is very similar, but 0h, 12h, and 36h timepoints are very distinct from 3, 5, 7, and 10 day fragments. This would suggest that early wound response gene expression is highly distinct (even opposing) the gene expression programs active during late in regeneration. More exploration of this idea, as well as clarified language on exactly what the author means by 'oscillations' and which gene groups follow this pattern would greatly improve this section and better support the author's conclusions.
  
  The Cellular/Functional identity of Clu.31: The authors state throughout the manuscript that Clu.31 (the ARZ) is an injury-induced anterior state enriched for SBGs and regulating polarity establishment. However, it is also possible that this spatial state represents the anterior peripheral nervous system (numerous sensory neurons and surface epithelial cells that help sense mechanical and chemical cues). SBGs could be enriched because this combination of cell types is only present in the anterior of the animal. Indeed, the authors show that the ARZ is localized to the anterior in intact animals in the absence of an injury (Figure 3) and enriched genes (S4Aii) strongly indicate that Clu.31 contains gabrg+ mechanosensory neurons. If Clu.31 is regenerating nervous system, this would also explain its ventral bias and expression of tgs-1 and other nb2 genes, since nb2 neoblasts have been suggested to be both an amputation responsive neoblast subset (Zeng et al. Cell) and a neural progenitor state (Raz et al. Cell Stem Cell). Clarifying how the composition of the tri-lineage region changes during regeneration may help distinguish if Clu.31 is truly an injury induced region vs. the regenerating sensory nervous system. For example, it is known that agat-1+ cells transcriptionally responsive and enriched at the wound site a 2-4 days post amputation, but less so at later timepoints (Benham-Pyle et al Nature Cell Biology, Kent et al. Developmental Biology). This shift in composition should be observable in Clu.31 since it contains agat+ epidermal cells. Such a shift in composition or the identification of a regeneration-specific marker expressed in Clu.31 would add support to the author's conclusions. Regardless of the outcome of these experiments/analyses, the discussion and interpretation of the data could be modified to address the hypothesis that Clu.31 represents the cellular neighborhood created when the peripheral nervous system intercalates with the anterior DV boundary epithelium and body wall muscle, which needs to be regenerated in amputated worms. As is, the comparison to the apical epithelial cap considered in the discussion (Line 438) may be pre-mature.
  
  Med8 function: Med8 produces a clear phenotype in the authors' experiments, and their data indicates that it is required for ARZ formation. However, I am not sure that the authors data supports the claim that Med8 is directly regulating blastema and PCG expression, as opposed to regeneration of the nervous system (which is highly interconnected with formation of the anterior pole and the size of the anterior blastema) and stem cell function more broadly. The fact that Med8 RNAi also leads to head degeneration in intact worms (Figure S6F) strongly suggests a more fundamental defect in neural differentiation or stem cell function. The strongest evidence presented by the authors supporting a broader function in polarity establishment is the disruption of posterior Wnt expression, (Figure 5F and G), but these in situs are single representative images with no quantitation and could also be explained by a stem cell defect. Additional data could be provided (e.g. visualization of wound-induced gene expression, quantitation of anterior or posterior stem cell numbers and proliferation rates at 2dpa) to support regulation of PCGs or blastema formation. The authors could also leverage their single cell sequencing to determine if Med8 RNAi impacts neural progenitor abundance more than other progenitor cell types. Together, these experiments would determine if Med8 is important for amputation-induced blastema formation and polarity re-establishment vs. stem cell function and neural differentiation more broadly.
  
  Minor Criticism/Feedback: 1. In Figure 1I, the authors show DEGs enriched in each cluster/region. In the blastema regions, I was surprised by the number of DEGs for each time point. It appears that there are ~10K upregulated and 10K downregulated DEGs by the later time points, which suggests that 2/3 of the transcriptome is differentially expressed… The authors should clarify in the text or methods what cutoff they used for the DEGs and how significant the DEGs are in this figure. 2. For readability, I really think that all figures should be on a white background. 3. How do gene expression profiles from the stereo-seq compare to bulk rnaseq at similar timepoints? 4. It is very interesting that there are some cell types that appear to contract and then expand during regeneration (Cluster 0, 23) or that aggregate/become more targeted during regeneration (pharynx pouch, cluster 29). Molecular differences between early and late cells within these cell types would be particularly interesting for understanding different phases of regeneration, but this may be beyond the scope of the current study. 5. The authors frequently reference Han et al. submitted, but this manuscript would need to be pre-printed or published in order for this work to reference it. 6. The Y axis of Figure 2E should be labeled
Visit annotations in context

Annotators

GigaScience

URL

biorxiv.org/content/10.64898/2026.02.18.706529v1
www.biorxiv.org www.biorxiv.org

Congenital aphantasia reveals frontotemporal and cingulate structural alterations underlying conscious access to imagery

1
1. Public_Reviews 10 Jul 2026
  
  in eLife
  
  Author response:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this paper, the authors provide a systematic investigation of structural brain differences associated with congenital aphantasia (self-reported lifelong absence of voluntary visual imagery). Specifically, the authors analysed a structural neuroimaging dataset involving 18 individuals with aphantasia and 18 visualizers to test two competing hypotheses: (1) that aphantasia reflects alterations in visual pathways and early visual cortex, and (2) that it instead reflects differences in higher-order frontotemporal and cingulate systems. To test these hypotheses, the authors employed multiple analysis approaches (e.g., cortical morphometry, tractometry, graph-theoretic network analysis).
  
  They report structural differences between the two groups in frontotemporal and cingulate systems. In contrast, they found no reliable group differences in early visual cortex or major visual tracts. On this basis, they propose that aphantasia is primarily associated with differences in higher-order systems supporting integration and conscious access to internally generated representations, rather than with deficits in sensory visual representations themselves.
  
  Strengths:
  
  (1) The present work addresses an important gap in the mental imagery literature, providing a systematic investigation of structural neuroimaging differences in congenital aphantasia. By showing that structural differences between aphantasics and visualizers are mainly concentrated in frontotemporal and cingulate systems (rather than in visual cortex), it makes an important step toward a better understanding of individual differences in mental imagery and provides a set of candidate regions for future mechanistic work.
  
  (2) A key strength of the study is the multimodal approach employed to address the main research question, integrating tractometry, functional region-of-interest (fROI)-based tractography, graph-theoretic network analysis, and surface-based cortical morphometry, which provide a converging assessment of structural differences between aphantasics and visualizers.
  
  (3) The complementary use of Bayesian analyses alongside NHST to assess evidence for null results is a further strength of this work.
  
  Weaknesses:
  
  (1) A weakness of this work is related to aspects of the framing and, in particular, what can be confidently inferred from the results. The framing of existing accounts of aphantasia in the Introduction appears limited in that it reduces the views on aphantasia to two options (sensory strength account versus conscious access account) without acknowledging a third distinct position, namely that aphantasia reflects a specific deficit in the voluntary generation of imagery (Milton et al., 2021; Zeman et al., 2015, 2020; Whiteley, 2021; Cavedon-Taylor, 2022). Like the conscious access account, the view that aphantasia involves a deficit in the generation of sensory representation also speaks against the hypothesis of reduced sensory strength of internally generated representations. This third view could be acknowledged/discussed as it also maps quite well onto the presented results.
  
  (2) Relatedly, I think the main weakness of the paper concerns the interpretation of results being restricted to a lack of "conscious access". The paper frames its findings as mainly evidence for a conscious access failure, the view that visual representations are generated by aphantasics but cannot be consciously accessed. However, the structural findings are equally consistent with a voluntary generation failure, especially since the same higher-order regions examined can also be implicated in the top-down generation and control of imagery. The authors themselves initially define aphantasia as "lifelong absence of voluntary visual imagery". Given the nature of structural imaging data (as opposed to functional data), it is not possible with the present study to distinguish between a lack of generation versus a lack of conscious access. As such, examining this alternative interpretation appears appropriate, and it would considerably strengthen the paper. Structural MRI alone is not sufficient to dissociate imagery generation from conscious access, as these are fundamentally functional questions.
  
  (3) Some inconsistency and lack of clarity around the specific choice of regions/networks, which could be better motivated and explained. E.g., the "core imagery network" analysed in the white-matter connections analysis was derived from a previous 7T study (with which the sample partially overlaps) and is not necessarily the network most commonly associated with visual imagery in the literature (e.g., see Dijkstra et al., 2019; Pearson, 2019). It is, for instance, unclear why V1 was examined in the cortical thickness analysis but not in the previous one, given that both analyses are related to the visual pathway hypothesis. Related to this, in the graph-theoretic analysis, the rationale for network selection is inconsistently established in the Introduction. The attention and salience networks do have some grounding in the Introduction through the mention of specific regions such as FEF and anterior insula, though these are discussed as individual regions rather than as networks. However, the default mode network receives no motivation in the Introduction. More explicit elaboration on these choices would be appropriate.
  
  (4) The interpretation provided in the Discussion tends to oversimplify what is in fact a heterogeneous and rich set of structural findings into a relatively coherent mechanistic account. The observed differences are spatially and directionally variable across tracts, cortical regions, and metrics: e.g., FA is reduced in the UF and posterior interparietal corpus callosum but increased in the dorsal cingulum; cortical thickness is reduced in aPFC but increased in medial temporal regions, and so forth. The Discussion acknowledges this in part (e.g., proposing increased dorsal cingulum FA as potentially compensatory) but does not address the directional heterogeneity systematically. The authors could discuss more explicitly what the opposing directions of effects mean for their overall interpretation. Relatedly, some parts of the Discussion link specific structural findings to specific imagery processes in ways that go beyond what the current data can support. The authors could more clearly distinguish between what the structural data show and what functional interpretations are taken from prior work.
  
  We will add two recent in-press Cortex papers to the Discussion. One provides lesion-based double-dissociation evidence against V1 as a necessary causal substrate of visual imagery. The other shows that aphantasic individuals can display visualizer-like oculomotor patterns during mental map exploration despite reporting little or no imagery vividness. Together, these studies help clarify our interpretation of our null V1 findings and structural effects in higher-order brain regions, which are consistent with aphantasia involving altered integration or access rather than a primary V1-dependent imagery deficit.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This paper addresses whether congenital aphantasia reflects an alteration of visual representations themselves, or rather of the systems that allow internally generated representations to reach conscious experience.
  
  Strengths:
  
  The study is novel and ambitious. The authors combine several complementary structural MRI approaches in a rare and well-characterised population, and the convergence of the findings toward frontotemporal and cingulate systems, with relative sparing of early visual cortex and major visual pathways, is particularly interesting because it could affect the way visual imagery is modelled and tested experimentally and clinically.
  
  Weaknesses:
  
  Overall, I found the manuscript conceptually and methodologically strong. My main concern regards the interpretation of the anatomical findings, rather than the findings per se. The authors discuss their results within a rich cognitive framework. However, the current dataset does not appear to include independent behavioural or neuropsychological measures that would allow the proposed cognitive interpretation to be tested in the same participants. As a result, the manuscript sometimes moves quite rapidly from 'these structural differences involve systems associated with higher-order control, salience, conscious access' to 'these structural differences may explain the cognitive mechanisms of aphantasia'. I agree that this is the most interesting interpretation, and probably the right one to explore. Although plausible, it remains indirect. The authors already acknowledge this point when discussing memory, affective control, and semantic processing. However, the same logic should be extended to the interpretation of the full set of findings. For example, if the salience/anterior insula findings are interpreted in relation to access to internally generated representations, it would be useful to know whether aphantasic participants also differ behaviourally on tasks tapping interoception or related aspects of internal monitoring. I appreciate that collecting additional behavioural data may not be feasible at this stage, especially given the difficulty of recruiting participants with such a specific manifestation. However, I think it should be acknowledged more explicitly in a dedicated limitation paragraph.
  
  We thank the reviewer for this thoughtful and constructive comment. Lack of introspective report of voluntary imagery is arguably the defining signature of aphantasia. This motivated us to primarily interpret our anatomical findings in a broader cognitive context of higher-order control, internal monitoring, and conscious access in aphantasia. We expect that a reliable behavioural test measuring imagery sensitivity and accessibility would allow us to direct link these findings to individual imagery ability. Nevertheless, to our best knowledge, this kind of test on imagery is still missing. Instead, our findings point to some plausible structural signature or brain regions that may be related to conscious imagery, which motivate future studies to examine their direct or causal roles. We agree with the reviewer, future studies should test the relationship between these anatomical structures and the accessibility to internal representation, together with related aspects of internal monitoring. We will therefore add a dedicated paragraph to discuss the plausible cognitive mechanisms during the revision.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors investigate the structural brain basis of congenital aphantasia, a condition characterised by a lifelong absence of voluntary mental imagery. They test two competing accounts: one predicting structural differences in early visual pathways, the other predicting differences in higher-order frontotemporal and cingulate systems. To do this, they combine four complementary structural imaging approaches: white-matter microstructure profiling along anatomically defined tracts, tractography seeded from functional regions of interest, whole-brain structural network analysis, and cortical thickness mapping. The main finding is that white-matter differences are selective for frontotemporal and cingulate pathways and absent in early visual pathways, which the authors interpret as support for the higher-order account.
  
  Strengths:
  
  The multi-modal design is a genuine strength: running four independent analyses increases the chance of detecting real effects and of identifying false positives that appear in only one stream. The statistical choices within each analysis are appropriate. Permutation-based correction with a threshold-free method is well-suited to the tract-level comparisons. The use of Bayes factors to quantify evidence for null results, rather than simply reporting non-significant tests, is particularly valuable here, since the absence of visual pathway differences is central to the argument. The robustness checks across multiple brain parcellations for the network analysis strengthen confidence in those findings.
  
  Weaknesses:
  
  The main limitation concerns the relationship between two of the analysis streams. The measure used to weight structural connections in the network analysis is calibrated to match fiber density estimates derived from the same diffusion signal that drives the white-matter microstructure differences. If the two groups differ in tissue organisation in certain pathways (which the microstructure analysis suggests they do), that difference will feed into both measures. The authors should acknowledge this dependency when discussing convergence across analyses.
  
  More broadly, the imaging metrics used throughout (measures of fiber organisation and weighted connection counts) reflect what the diffusion model captures from the tissue and cannot be directly read as measures of axon number or connection strength. This is a known limitation of the field, but it is relevant to the strength of structural claims made in this paper.
  
  The network analysis is presented without comparison to a null network. Without this, it is hard to know whether the node-level differences reflect specific network topology or simply follow from overall differences in connectivity weight or density between groups.
  
  The study runs four separate discovery analyses on the same 36 participants, each corrected within itself but with no control across analysis streams. At 18 participants per group, this is exploratory work. Some of the language used in the abstract and discussion, like "first comprehensive characterization" and "selective structural phenotype", reads as more definitive than the data support at this sample size. Framing the results as hypotheses to be replicated would make the paper stronger.
  
  The paper frames the results as distinguishing between two competing accounts. The positive evidence for the higher-order account is clear. The absence of differences in visual pathways is a different kind of result: it means such differences were not detected in this sample, not that visual pathways are uninvolved. The discussion at times moves toward that stronger conclusion, which the data do not support.
  
  The cortical thickness analysis finds one cluster in the predicted direction, while the other analyses each return multiple effects. One cluster in a whole-brain search with 18 participants per group is not strong evidence and should not be presented as equivalent to the other results.
  
  Effect sizes are reported without confidence intervals throughout. With 18 participants per group, the uncertainty around those estimates is large, and confidence intervals would give readers a more accurate sense of what can be concluded.
  
  We are grateful to the Reviewer for the constructive and thoughtful assessment of our manuscript. In response to the reviewer’s comments, we will revise the manuscript to clarify the dependency between diffusion-derived analysis streams, to state more explicitly the biological limits of diffusion MRI metrics, to add a null-network sensitivity analysis for the clustering coefficient findings, to include confidence intervals for reported effect sizes, and to temper the interpretation of the cortical thickness result. We will also revise the Abstract and Discussion to better reflect the exploratory nature of the study and to frame the findings as hypotheses requiring replication in larger independent samples. We believe that these revisions will make the manuscript more balanced, transparent, and appropriately cautious, while preserving the central conclusion that congenital aphantasia is associated with structural differences centered on higher-order frontotemporal and cingulate systems.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.22.713248v2
www.biorxiv.org www.biorxiv.org

Behavioral Signatures of Post-Decisional Attention in Preferential Choice

1
1. Public_Reviews 10 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  It is unclear to what extent the model's success relies on the way non-decision time is formalised in the model. In the proposed PDG model, non-decision time is decomposed into separate visual encoding, saccadic execution, and manual execution components. Several values (assumed or recovered) do not match known physiological or behavioural ranges. This is a common issue in the literature, and the authors may want to address it in light of broader work discussing what non-decision time consists of in both manual and saccadic actions (e.g., Bompas et al., 2024, Non decision time: the Higgs boson of decision, Psychological Review).
  
  In particular, the "saccadic execution" parameter appears far too long and too variable to reflect merely execution; instead, it likely includes decisional components. This would make more sense since manual and saccadic planning essentially rely on distinct brain areas, hence it seems unrealistic that crossing a single threshold would trigger both manual and saccadic execution. Similarly, recovered manual non-decision times are substantially longer (though not more variable) than expected motor execution durations for button presses. These patterns suggest that parts of what the model treats as non-decision time are likely decisional in nature, although perhaps related to "action decision" rather than the "value-based decision" of interest to the authors. To what extent these two processes neatly follow each other or overlap could be usefully considered.
  
  We have added a paragraph to the Discussion explaining how our model’s estimates of sensory and motor latencies relate to corresponding values inferred from physiology or behavioral manipulations (e.g., Bompas et al., 2024). Specifically, we write:
  
  “The key assumption of the PDG model is that there is a delay between the moment a choice is internally committed and the moment it is externally reported with a key press. Because eye movements are typically faster than manual responses (𝜏<sub>e</sub> < 𝜏<sub>m</sub> in our simulations), this delay creates a window during which gaze can already be directed toward the covertly chosen item before the response is formally registered. We do not interpret these non-decision latencies as irreducible physiological minima for moving the eyes or pressing a button (Bompas et al., 2025). Rather, they are inferred indirectly by fitting an additive non-decision-time parameter to the behavioral data, which we decompose into a sensory delay (𝜏<sub>s</sub>) and a manual execution delay (𝜏<sub>m</sub>). Values of 𝜏<sub>e</sub> are then chosen so that the model reproduces the observed magnitude of the behavioral effects. This estimation procedure has important limitations. Some participants show relatively “flat” chronometric functions: response times vary little with value despite otherwise normal psychometric performance. Such patterns likely reflect processes not explicitly represented in the model, including procrastination, reduced motivation, task-unrelated thought, or noise in item ratings. Within a drift-diffusion framework, however, these cases are accommodated by assigning a long non-decision time together with a short evidence-accumulation period (Table S1). Consequently, some estimated non-decision times are substantially longer than would be expected if they represented only sensory and motor delays. A further limitation is conceptual. We model non-decision time as occurring either before or after evidence accumulation, whereas in reality decisional and non-decisional components are likely temporally interleaved (Graziano et al., 2011). This simplification may also inflate the recovered latency estimates. With these caveats in mind, sensory and oculomotor delays on the order of 300 ms remain broadly plausible, although they likely lie near the upper end of a realistic range. The estimated eye-movement latency is especially long. For instance, in monkeys trained to report simple perceptual decisions with a saccade, roughly 100 ms elapses between the threshold-crossing signal in parietal cortex (or the superior colliculus) and the executed eye movement (Roitman and Shadlen, 2002; Stine et al., 2023). Crucially, however, varying the assumed non-decision latencies across a reasonable range does not alter the qualitative predictions of the model (Fig. 8).”
  
  Further, we have added a parameter sensitivity analysis. Importantly, although the magnitude of the predicted effects depend on the non-decision latencies, the qualitative aspect of these predictions do not (new Figure 8). Specifically, (i) the increasing tendency to look at the ultimately chosen item as time elapses (new Fig. 8A), (ii) the lack of an interaction between the last-fixation bias and overall value (Fig. 8B), and (iii) the absence of an effect of choice consistency on Δdwell (Fig. 8C) are all findings that are independent of 𝜏<sub>e</sub>.
  
  Reviewer #2 (Public review):
  
  The paper focuses on analyzing the Krajbich 2010 data, but shows that the second effect replicates in many other datasets. A more principled approach, in which both effects are analyzed and presented for all datasets, would be more convincing. The results should then be shown together for clarity/readability.
  
  Following this suggestion (and the reviewer’s elaboration in the private comments to the authors), we have substantially restructured the manuscript. Both aDDM predictions are now presented together (new Fig. 2), and Figs. 3–4 test these predictions across multiple food-choice datasets. In doing so, we no longer treat the data from Krajbich et al. (2010) separately, and we extend the analysis of the last-fixation–choice association (MELFB) to additional datasets. We note that the same datasets could not be used in both Figs. 3 and 4, as some lack information on the final fixation required for the MELFB analysis. Nevertheless, results are highly consistent across datasets and align with findings from a recent study by Ting & Gluth (2025), which independently identified and examined one of our key predictions; this work is now cited in the revised manuscript. Finally, to reduce redundancy, we have consolidated all aDDM variants and optimal models into a single figure (new Fig. 10).
  
  Similarly, it would be nice to show to what extent the models' predictions depend (not depend) on using the best-fitting parameter values (are there any parameter settings under which the two effects are not predicted?)
  
  The key predictions of the model depend on the difference between the manual (𝜏<sub>m</sub>) and eye-movement-related (𝜏<sub>e</sub>) latencies. We have now added a parameter-sensitivity analysis to show how the model predictions depend on this difference. The new analysis shows that while the quantitative predictions do depend on the precise latency values, the results are qualitatively similar across values of 𝜏<sub>e</sub> (new Figure 8).
  
  Reviewer #3 (Public review):
  
  There was limited discussion about why one might allocate attention post-decision. I would have appreciated more discussion on the potential functional consequences or implications of post-decision gaze.
  
  Thank you for this suggestion. We added a new paragraph to the discussion (paragraph #2), where we argue that it is sensible for a decision maker to direct the gaze to the chosen item once a covert choice commitment has been made, as the benefits of attending to a stimulus do not end with the decision itself. Specifically we now write:
  
  “Instead, these observations are better explained by a post-decision account of the gaze-choice association that is, one in which gaze shifts to the selected item after a covert commitment to a choice. We argue that directing gaze to the chosen item after a covert choice commitment is sensible, as the benefits of attending to a stimulus do not end with the decision itself. In naturalistic settings, for instance, selecting a food item is typically followed by the action of reaching toward it, where visual attention supports spatial localization and motor planning for the upcoming action. Although participants in our computerized task did not physically act on their choices, these sensorimotor processes are likely highly automatized and may still be engaged by default, even when not strictly required. Beyond motor preparation, post-decisional attention may also serve additional functions, such as facilitating sensory anticipation of the reward, supporting metacognitive evaluation of the decision, and contributing to value updating for future choices. From this perspective, a degree of attentional “stickiness” whereby the chosen item remains preferentially attended after commitment could emerge as an effectively optimal policy once these post-decisional processes are taken into account. Moreover, a specific feature of the task design may further reinforce this tendency: in the snacks paradigm, the unchosen item typically disappears from the screen immediately after a response is registered. It is therefore plausible that directing gaze to the chosen item after commitment partly reflects anticipation of the imminent disappearance of the unchosen option. To disentangle these mechanisms, it would be interesting for future work to test whether this attentional bias persists when the chosen item, rather than the unchosen one, is the stimulus that disappears upon response.”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Major Comments:
  
  (1) Framing of the modelling approach
  
  The manuscript would benefit from acknowledging the known limitations of DDM-based frameworks, especially given that the entire study is conducted within these constraints. The introduction highlights successes of the DDM, but the manuscript does not mention any of its conceptual or empirical limitations.
  
  We are unsure about what specific limitations the reviewer has in mind, but we have added a paragraph to discussion mentioning some limitations, like the inflation of the non-decision times and the difficulty of interpreting the fit parameters (Paragraph #5 of Discussion: “The key assumption of the PDG model is that there is...”).
  
  (2) Dependence on non-decision time assumptions
  
  The alternative model's explanatory power appears to rely heavily on assumptions regarding the decomposition of non-decision time: fixed visual encoding (𝜏<sub>s</sub>= 0.3 s), manual non-decision time (𝜏<sub>m</sub>; two free parameters), and saccadic execution (𝜏<sub>e</sub>; fixed parameters μ<sub>e</sub> = 0.35, σ<sub>e</sub> = 0.11).
  
  - 𝜏<sub>e</sub> is substantially longer and more variable than typical saccadic execution times, suggesting it likely incorporates decisional components.
  
  - Estimated 𝜏<sub>m</sub> values are approximately twice as long as known manual execution durations.
  
  - σnd is more plausible, implying that variability is captured correctly but mean durations are not.
  
  Together, these points raise the possibility that portions of what the model treats as non-decision time are in fact part of a (action) decision process. Only then does it make sense to assume that Tm is usually larger than Te. If Tm and Te were truly execution delays, then Tm would always be larger than Te.
  
  You may find it helpful to consider the framework in Bompas et al. Psych Review (2024), which discusses in detail what non-decision time is likely to comprise across effectors.
  
  Thank you we have added (i) a sensitivity analysis showing that our results are robust to changes in the specific value used for the eye movement related latencies (new Fig. 8), and (ii) a new paragraph in Discussion addressing the issue of the mismatch between our parameter estimates and the manual and saccadic execution times (Paragraph #5 of Discussion: “The key assumption of the PDG model is that there is...”).
  
  (3) Code availability.
  
  The authors should consider sharing all relevant code and data publicly.
  
  We agree, we now share the code and data on GitHub and indicate so in the revised manuscript.
  
  Minor Comments:
  
  (1) Lines 74-77. These are not worded as predictions but as questions; one tests predictions, but answers questions. I feel it would be clearer to stick to predictions (like in the abstract), and the introduction could benefit from explaining these predictions in a bit more detail (I found it difficult to get my head around these predictions from the intro text only).
  
  We rewrote the section in the introduction where we provide a gist of the model predictions (last paragraph of Introduction). We agree with the reviewer that the previous explanation was not clear.
  
  (2) It is confusing that panel B appears to the left of panel A in Figure 2.
  
  We agree. We have restructured the manuscript (following the suggestion of another reviewer), and now Figure 2 has changed and the panels follow a more logical order.
  
  (3) Figure 3C - remove MATLAB toggles.
  
  Yes, thanks.
  
  (4) Figure 5A shows the proportion of left choices, but the text and legend refer to right choices.
  
  Good catch, thank you.
  
  Reviewer #2 (Recommendations for the authors):
  
  This may appear self-serving, but the authors seem to be unaware of some highly relevant work from our group. Most importantly, in a recent publication (Ting & Gluth, 2024, JEP General), we have already looked at the dependency of the last- (or final-) fixation bias on overall value in value-based (VB) and perceptual (P) decisions. In VB, we found a negative effect; in P we did not find a significant effect. This is largely consistent with the current results, showing a negative but not significant trend. Another relevant work is Gluth et al. (2020, Nat Hum Behav), where we extended the aDDM by assuming that the probability to fixate on an option is a function of the accumulated evidence for that option. It would be interesting to know whether this assumption changes the predictions of the aDDM. Finally, we just published a new theory on how people search for information to make efficient value-based decisions (Gluth et al., in press, Psychol Rev; https://osf.io/preprints/psyarxiv/3qzak_v2). Although this theory focuses on multi-attribute choices, it can be applied to "simple" choices, too (by assuming that there is only one attribute = value). Interestingly, while the model also mispredicts a (slight) increase of the last-fixation bias with overall value, it correctly predicts the independency of the dwell-time advantage effect on choice consistency as well as the small increase of the effect with RT (attached here is a figure to show this: [https://elife-rp.msubmit.net/elife-rp_files/2026/01/22/00149589/00/149589_0_attach_9_477122. pdf], and the match with the empirical data shown in Figure 3B and 12 is striking). In general, the model shares many features of the Callaway and Jang models, but does not need to assume a biased value prior, which the authors suggest is responsible for the misprediction of the second effect. I leave it up to the authors to discuss this new theory, but I wanted to point this out.
  
  Thank you for pointing this out; these are all relevant points and studies.
  
  We now note that the first of our predictions has recently been identified and tested by Ting and Gluth (2025).
  
  We also considered extending the manuscript with a variant of the model proposed by Gluth et al. (Psychological Review, 2026). In fact, we attempted to fit this model to the Krajbich et al. (2010) dataset under the assumption that the duration of each sampling epoch is a free parameter. We find this model very interesting. However, in our current implementation it appears to make the same qualitative prediction as the aDDM, namely that ΔDwell depends on choice consistency (see Author response image 1).
  
  Given this, we have decided not to include these results in the manuscript. It remains possible that with further development particularly with a more realistic specification of fixation durations (e.g., allowing them to depend on value) the model could account for the full set of observed effects. We think this would be best addressed in a separate study.
  
  That said, we do find the model promising, as it provides a better account than most of the alternative models we explored for the patterns shown in panels D, H, and I.
  
  Author response image 1.
  
  Fits of a variant of the MACS model (Gluth et al. 2026) to the data of Krajbich et al. (2010).
  
  The paper would benefit substantially from restructuring. The aDDM's predictions are provided first, together with the empirical data, and then the optimal models are discussed. But Figure 2 shows all of this together. Later, the new (PDG) model is elaborated, and its predictions are shown. Towards the end of the results, variations of the aDDM and combinations of aDDM and PDG are shown in a series of figures (8-11), followed by a last figure showing one of the tested effects in other datasets. All of this feels pretty much thrown together without a clear structure. For instance, the aDDM and the optimal models could be described together (or the optimal models get a separate figure). The additive variants could be described earlier. And some figures could be put into the supplement. And the empirical results of the different studies could be shown together.
  
  We fully agree with this suggestion. We have now restructured the manuscript along the lines proposed by the reviewer (see the more detailed explanation of the restructuring in our response to the public comments).
  
  I strongly suggest avoiding the term "influence" in the y-axis of Figure 2, upper row, as it implies causality. Similarly, in line 182, the term "causal influence" is used in the context of the Callaway model, but as far as I know, this is not what the model assumes.
  
  We replaced the y-axis label with “Association of last dwell with choice (β)”
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Figure 2 - Panel labels for A and B are reversed?
  
  We have restructured the manuscript (following the suggestion of another reviewer), and now Figure 2 has changed.
  
  (2) Does 3C include a .pdf screenshot?
  
  Thank you, it’s a Matlab bug on Mac. I guess they want us to switch to Python -:)
  
  (3) Figure 4 - It would be helpful if the green line were defined in the figure legend.
  
  Added
  
  (4) The effect size in 5B looks much more dramatic than in 2B(A?) - Is this for one example subject as opposed to all subjects? Please clarify what is different about the data.
  
  We are no longer showing the psychometric functions in Figure 2.
  
  (5) Line 252 - they say they compared the probability of choosing the right item (Fig. 5B) by the y-labels of that figure, which are all p(choose left).
  
  Yes, corrected now.
  
  (6) In general, they reference the subpanels of Figure 5 out of order, which causes the reader to jump around. They might consider reordering the panels of the figure so they follow the ordering of descriptions in the text.
  
  We agree, we have rearranged the figure panels to follow the ordering of the descriptions in the text.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.10.698805v2
www.biorxiv.org www.biorxiv.org

Sensory adaptation and pupil-linked arousal support flexible evidence accumulation during perceptual decision making

1
1. Public_Reviews 10 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  (1) Alternative mechanisms for performance differences.
  
  The authors assume that the difference in performance between the low-switch (LS) and high-switch (HS) frequency conditions is explained by a change in the "leakiness" of integration. However, several other mechanisms could potentially explain this effect:
  
  (1) Temporal Uncertainty: Integration might start later in the HS condition, leading to lower performance.
  
  (2) Reduced Efficiency: Integration could be less efficient in the HS condition (i.e., lower signal-to-noise ratio) without a change in the leak parameter itself.
  
  (3) Evidence Contamination: Motion information from the adapting stimulus in the HS condition may be integrated rather than ignored, which might be the case since the transition from the adapting to the test stimulus is not externally cued.
  
  To distinguish between these alternatives, I suggest two possible analyses. First, a formal model comparison could be performed, though I acknowledge this may be inconclusive in the absence of response-time data. Second, an analysis of motion energy kernels could be revealing; the leak hypothesis makes the specific prediction that for long test stimuli, early samples should contribute more to the choice in the LS condition than in the HS condition, relative to late samples.
  
  We thank the reviewer for raising these important points. We agree that we cannot definitively identify the algorithmic underpinnings of the behavioral effects we report and have made substantial revisions to the manuscript to be clearer about what is supported and what is speculative in our claims. Most importantly, we agree that we do not know if the context-dependent differences in how accuracy depends on viewing time are based on adjustments to a leak or to something else (e.g., a saturating non-linearity, as we identified in Glaze et al, 2015, that is separate from the leak itself), which we cannot resolve with this dataset, even with more formal model comparisons. We therefore:
  
  Changed the wording throughout the manuscript to refer to changes in leakiness as just one of several possible sources of the behavioral differences. We also added this point to the list of “limitations” (and possible future directions, including using motion-energy kernels, which would require us to use lower-coherence test stimuli) in the Discussion (L487-493).
  
  Added a new figure panel (Fig. 2D), a new Extended Data figure (Extended Data Fig. 3), and additional explanatory text (L168-175) that collectively describe the behavior in more detail, including quantifying a “crossover” dynamic similar to what we reported previously (Glaze et al, 2015).
  
  Added new explanations (L152-163) and analyses (Extended Data Fig. 9) indicating that the monkeys used some information from the end of the adapting stimulus to inform their decisions, which accounts for the patterns of choices at the shortest viewing durations.
  
  Indicate that the context-dependent differences in the slopes of the psychometric functions (and complementary analyses based on “raw” accuracy measures as a function of binned viewing duration) rule out the temporal uncertainty and evidence contamination explanations, but are consistent with effects on the temporal dynamics of the decision process (L175-179).
  
  (2) Independence of neural and pupil-linked signals.
  
  The authors take the lack of session-wise correlation between context-dependent contributions from neural and pupil terms as evidence that these two signals provide independent contributions to the behavioral effect. However, could this lack of correlation simply be a result of high variability or noise in these estimates? The data shown in Figure 7B suggests that measurements are very noisy, which might obscure a potential relationship.
  
  We agree that the lack of session-wise correlation between neural and pupil terms cannot be taken as definitive evidence of independence. We have both softened the language around the claim (L368) and added a sentence to the Discussion (L464-468) acknowledging that this lack of correlation may reflect underlying noise and/or variability rather than true independence of the underlying mechanisms.
  
  Reviewer #1 (Recommendations for the authors):
  
  (3) The neural data analyses rely fundamentally on "switch" trials (Figures 3-5). It might be informative to also examine "non-switch" trials to see if there are specific neural markers indicating the exact moment the motion stimulus becomes behaviorally relevant. Given that this may fall outside the primary focus of the paper, it is up to the authors whether to pursue this line of inquiry.
  
  We thank the reviewer for this suggestion. We agree and have added new analyses of data from non-switch trials (Extended Data Fig. 9), which show some effects of stimulus information from the adapting epoch on the monkeys’ choices, as we detail below in response to related comments from the other reviewers.
  
  Reviewer #2 (Public review):
  
  Aspects of the behavioral analysis would benefit from a tighter connection between theoretical claims about evidence accumulation and the empirical features of the psychometric functions. For example, the rightward shifts observed across adapting conditions are interpreted as consistent with a reset of accumulation on switch trials, but similar patterns could also arise from failures to detect the test stimulus on a subset of trials, leading responses to default to the final adaptor direction. Likewise, changes in psychometric slope and asymptote are attributed to differences in evidence accumulation without explicit modelling or consideration of alternative explanations.
  
  Clarifying how specific features of the psychometric functions map onto distinct components of the decision process will strengthen the link between the theoretical framework and the behavioral data.
  
  We agree and have made substantial revisions to address these important points. Specifically, we added a new figure panel (Fig. 2D), new Extended Data Figures (3 and 9), and several lines of explanatory text (L152-179) that collectively describe the behavior in more detail, including clarifying that: 1) for the shortest viewing durations, the monkeys’ decisions were informed by information from the adapting stimulus, which accounts for generally lower accuracy on LSF (longer exposure to the final adapting direction, thus more accumulated evidence for that direction before processing the switch) vs. HSF (shorter exposure to the final adapting direction, thus less accumulated evidence for that direction before processing the switch) switch trials; and 2) as viewing duration increased, the rate of rise of accuracy versus viewing duration was higher for LSF vs. HSF trials, implying differences in the process of evidence accumulation. As detailed in our response to a similar comment from Reviewer 1, above, we are now careful to temper our claims about the specific computational basis (e.g., a leak or other form of nonlinearity) for these differences.
  
  We also de-emphasized our treatment of the asymptotes of the psychometric functions. In principle, these regimes could give insights into leakiness (which can limit the total amount of information that can be accumulated) and lapses (which are measured at the asymptotes). In practice, however, the long-duration trials that constitute the asymptotes were relatively under sampled (to promote the unpredictability of the offset of the stimulus, which we believed was the more important consideration when designing the experiment), yielding unreliable estimates.
  
  A slight concern is the lack of a consistent analytical approach for relating behavioral changes to neural and pupil-linked measures. Different sections of the manuscript rely on different behavioral metrics-such as differences in accuracy within a selected stimulus-duration range (e.g., Figure 5C) or psychometric slope differences (Figure 6C) without clear justification for these choices. The analytical approach likewise varies between simple correlational analyses (Figure 5C, Figure 6C), pseudo-experimental group comparisons (Figures 5D, E), and the inclusion of neural or pupil terms in the behavioral psychometric regression model (Figure 7B). While each metric and approach may be defensible in isolation, adopting a more consistent framework will help convince readers that the reported effects are robust and not contingent on the selective choice of metric or analysis.
  
  We thank the reviewer for this thoughtful critique and agree that the rationale for our choice of behavioral metrics and analytical approaches could be stated more clearly. We have added text to the relevant sections of the Results (L247-251) clarifying these choices. In particular:
  
  The neural analyses (Figures 3D-E, Figure 4, Figure 5D-E) focused on preferred-motion switch trials, because: 1) low switch-frequency non-switch trials provide an additional 800 ms of exposure to the final adapting-stimulus motion direction relative to high switch-frequency non-switch trials, which confounds comparisons of context-dependent evidence encoding between conditions, and 2) MT neurons exhibit minimal responses to null motion (although note that we also included analyses based on ROC area, which is computed from both preferred- and null-motion switch trials, to account for possible contributions of null-motion responses; Figure 5A-C). Thus, to ensure a meaningful comparison between neural and behavioral measures, we used behavioral accuracy on switch trials as the relevant metric in Figure 5C-E, rather than psychometric slope, which is estimated across both switch and non-switch trials.
  
  The pupil analyses (Figure 6) focused on a time window preceding test-stimulus onset, representing the arousal state around when the decision process started, and included both switch and non-switch trials. Thus, for these analyses we used psychometric slope, which is estimated across both switch and non-switch trials.
  
  We used several different analyses to compare and contrast the neural-behavioral and pupil-behavioral relationships because they provide complementary and useful insights. The correlational analyses in Figures 5C and 6C characterize session-level relationships between neural/pupil signals and behavior. The group comparisons in Figures 5D–E provide a complementary visualization of the same relationship. The model-based approach in Figure 7 then allows direct quantification of the trial-wise contributions of each signal to behavior within a common framework. Importantly, the conclusions drawn from each approach converge on the same interpretation, which we believe speaks to the robustness of the reported effects.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Figure 2 legend. Description of 'running average (5-trial window)' is unclear - presumably this is a running average in stimulus space rather than across trials.
  
  We thank the reviewer for flagging this ambiguity. We have updated the legend (L136-137) to clarify that the running average is computed across trials sorted by test-stimulus duration.
  
  (2) L158. Difficult to establish an asymptotic performance level for HSF conditions within the stimulus duration range tested.
  
  We have removed the reference to asymptotic performance and replaced it with a discussion of performance on longer-duration switch trials in the context of the newly added Figure 2D.
  
  (3) L515 Equation 1. While this is a standard formulation of lapse rate in psychometric functions, the construction here in terms of switch probability is not standard. Given the task and training, it seems more likely that on lapse trials, the animal will respond according to the last adapted direction (rather than randomly switch/stay with equal probability).
  
  We thank the reviewer for this point. We agree that it is possible that on at least some of the “lapse” trials the monkeys may respond according to the final adapting-stimulus direction rather than choosing randomly. However, we cannot distinguish those alternatives using this task design. We include a statement to this effect in Methods (L569-571).
  
  To explore the idea further, we refit the behavioral data using separate upper and lower asymptotes corresponding to lapse rates on switch and non-switch trials, respectively. Across monkeys, there were no significant differences between upper and lower lapse rates for either low (Wilcoxon signed-rank test for equal medians: p = 0.15, Cohen's d = -0.13) or high switchfrequency (p = 0.07, Cohen's d = -0.16) conditions. So, at the very least, there was no evidence for lapse-like errors driven by switch- (or non-switch-) specific defaults to the final adapting direction.
  
  (4) L256. Statistical significance of attenuation is not directly tested here.
  
  We have replaced "were attenuated" with "we did not identify any reliable context-stability differences" (L297) to accurately reflect what was directly tested without implying a statistical comparison between groups of sessions that was not performed.
  
  (5) L429. Does the increase in explanatory power warrant the increased complexity of the model here?
  
  We thank the reviewer for raising this important point. We used Tjur's pseudo-R<sup>2</sup> because it does not increase by default with added model complexity, making it more conservative than other R<sup>2</sup> measures in this respect. Tjur's pseudo-R<sup>2</sup> is a coefficient of discrimination, and as such its value increases only when additional terms improve the model's ability to separate predicted probabilities across response outcomes. Thus, the observed increases in explanatory power when adding neural or pupil terms reflect real improvements in discriminability rather than an artifact of model complexity. We have added a brief clarification of this point to the Methods (L662-664).
  
  Reviewer #3 (Public review):
  
  The task design may not be optimal. While the amount of time the monkey is exposed to each motion direction during the adapting stimulus is matched, it's hard to know if the reduced MT responses to the test stimulus are truly due to the greater frequency of switches during the HSF adapting stimulus or because the monkeys have been exposed to more repetitions of the stimulus. It's increased sensory adaptation in either case, but it makes it problematic to interpret this as temporal context-dependent adaptation specifically. I think this could potentially be partially addressed by an analysis that is in the paper, but could potentially be emphasized/fleshed out more, specifically the results shown in Figure 4D that seem to show that most of the reduction in neural response for adapting units occurs between the first and second stimuli.
  
  The reviewer raises an important point. The number of stimulus repetitions and switch frequency are confounded in the experimental design, making it difficult to attribute context-dependent differences in MT responses to the temporal pattern of switches rather than to accumulated repetitions. We also note, as the reviewer acknowledges, the observed differences reflect sensory adaptation either way. Figure 4D does offer relevant evidence, suggesting that a majority of the change in neural response occurred with just one stimulus repetition. This finding complicates an interpretation where adaptation scales with the number of stimulus repetitions. We have added several lines to the Results about these points (L231-233).
  
  The pupillometric analysis seems to be an indirect way of assessing whether the accumulator itself might be modulated by temporal context, but the link could be made clearer. The authors show that context-dependent behavior is related to pupil size, which is related to arousal/neuromodulation, but it would be helpful to have some idea of what neural mechanisms underlying adaptive decision-making are actually impacted by this neuromodulation. Lacking neural data to address this question (e.g., from a brain region proposed to be involved in the accumulation process), at least more discussion of this would be helpful. Essentially, I'm unsure of how to interpret the pupil results: the argument that temporal context affects instantaneous evidence encoding in MT that then drives the accumulator is very clear, but I am a bit confused about what, mechanistically, I should think about the effect of neuromodulation doing.
  
  We thank the reviewer for this thoughtful comment and agree that the mechanistic interpretation of the pupil results could be made clearer. We acknowledge that we cannot directly identify the neural mechanisms underlying the arousal-related contributions to adaptive evidence accumulation from pupil data alone, given that pupil size is an indirect and imperfect proxy for neural (e.g., LC-NE system) activity. However, we can offer some informed conjecture and have added to the Discussion (L469-482) in an effort to elaborate on possible mechanisms.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Abstract could be retooled - does not emphasize the pupillometry/arousal results very much, and they are presented more as a control than an independent result.
  
  We agree and have revised the Abstract accordingly.
  
  (2) Do all neural/pupil analyses use only switch trials? Sometimes the figure captions do specify only switch trials, but not everywhere. It would be helpful to specify either in the Methods or at the beginning of each figure caption that all subplots show switch trial results. Also, if you do always use switch trials, it would be useful to see in the Supplement how the non-switch trial results differ from switch trials. It seems like they may in interesting ways based on the behavioral results (supporting a reset of evidence accumulation on switch but not non-switch trials).
  
  We thank the reviewer for flagging these important points. We have added a justification for switch trials (L186-190) as well as clarification about which trial types were used for which analyses (L246-249) and information about trial types to relevant figure captions. We have also added a new Extended Data figure (Extended Data Fig. 9) examining relationships between neural activity and behavior on non-switch trials. As inferred by the reviewer, behavior on non-switch trials is consistent with the use of information from the adapting stimulus.
  
  (3) In Figure 3C, 5B, etc, when computing firing rate for the test stimulus (50-500 ms), are differently sized windows used to compute the rate for different test stimulus durations (since some will be <500 ms)? Or are only trials where the test stimulus duration is > 500 ms used for this analysis?
  
  We thank the reviewer for raising this point. To clarify, the 50–500 ms window does not reflect a fixed window applicable for all trials. Rather, neural activity from 50 ms after test-stimulus onset through test-stimulus offset was included for each trial, with 500 ms serving as the upper bound for trials with longer durations (> 500 ms). We have clarified this in the Methods (L607-610) to avoid ambiguity.
  
  (4) I think it might be better to be consistent with the time windows used for analysis; specifically, to choose either the 50-500 ms window used in Figures 3, 4, and 5B, or the 200- 400 ms window used for the remaining analyses in Figure 5.
  
  We agree that using the same window for all of the analyses would improve consistency, but not doing so provides advantages that we believe take precedent and now describe in more detail. The broader 50–500 ms window used for Figures 3, 4, and 5B was chosen to characterize MT neural activity over a relatively large a time window, ensuring that every trial contributes to each estimate. Because test-stimulus durations were drawn from a truncated exponential distribution (100–1200 ms), restricting these analyses to the 200–400 ms window would have excluded the substantial proportion of trials with durations <200 ms (but would yield similar figures and conclusions). The narrower window used in subsequent analyses allows us to focus on the conditions that exhibited the biggest modulations of neural activity when comparing them to behavior.
  
  (5) Similarly, provide justification for using only trials ending 375-600 ms after test stimulus onset for the behavioral correlations. It seems reasonable to choose a subset of test stimulus durations where the monkeys' behavior is greater than chance but less than ceiling, but it would be good to specify this so that it doesn't seem arbitrary.
  
  We agree and have added text to make this important point (L249-251).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.02.03.703553v2
www.biorxiv.org www.biorxiv.org

Decoding spine nanostructure in cultured neurons derived from mouse models of neuropsychiatric disorder reveals a schizophrenia-linked role for Ecrg4

1
1. Public_Reviews 10 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #2 (Public review):
  
  Okabe and colleagues build on a super-resolution-based technique they have previously developed in cultured hippocampal neurons, improving the pipeline and using it to analyze spine nanostructure differences across 8 different mouse lines with mutations in autism or schizophrenia (Sz) risk genes/pathways. It is a worthy goal to try to use multiple models to examine potential convergent (or not) phenotypes, and the authors have made a good selection of models. They identify some key differences between the autism versus the Sz risk gene models, primarily that dendritic spines are smaller in Sz models and (mostly) larger in autism risk gene models. They then focus on three models (2 Sz - 22q11.2 deletion, Setd1a; 1 ASD - Nlgn3) for time-lapse imaging of spine dynamics, and together with computational modelling provide a mechanistic rationale for the smaller spines in Sz risk models. Bulk RNA sequencing of all 8 model cultures identifies several differentially expressed genes which they go on to test in cultures, finding that ecgr4 is upregulated in several Sz models and its misexpression recapitulates spine dynamics changes seen in the Sz mutants, while knockdown rescues spine dynamics changes in the Sz mutants. Overall, these have the potential to be very interesting findings and useful for the field. My major concerns from the initial manuscript, especially regarding cherry picking and circularity have been addressed with revised analytical approaches. I have some remaining minor comments.
  
  (1) The comparison between two wild-type samples versus wild-type-mutant samples is helpful - I think this could be added to the manuscript.
  
  As suggested, we added the figure comparing two wild-type samples against wild-type mutant samples as Supplementary Figure 2.
  
  (2) For results of time-lapse imaging - please spell out in the results section the direction of change (lines 270 - 277).
  
  As suggested, we added the direction of change (an increase in the turnover rate) to the text (page 12, lines 270-271).
  
  (3) Using linear mixed effect models for statistical analysis is a significant improvement. While a sample size (n) of mice = 3 is not ideal, I think given the multiple different mouse lines used and intensity of analysis, this is probably the best that can be done, although further validation in larger samples eventually is to be hoped for.
  
  We appreciate the reviewer for recognizing the effort required to collect data across multiple mouse lines.
  
  (4) The revised text is much improved, but I still think the authors should be upfront somewhere in the text that the schizophrenia-associated genes can only confer biased risk for schizophrenia (and that the clinical phenotype can also include autism). As I said before, I think this is the best we can do and I agree with their choices, but it is important not to overstate the link. The differences they see make it clear that these are still relevant distinctions.
  
  As suggested by the reviewer, we further modified the discussion related to the comparison between ASD- and schizophrenia-associated mouse models (pages 23-24, lines 508-522).
  
  “The nanoscale features of dendritic spines in mouse models of Nlgn3<sup>R451C/(y or R451C)</sup>, Syngap1<sup>+/−</sup>, POGZ<sup>Q1038R/+</sup>, and 15q11-13<sup>dup/+</sup>, which we classified as being related to ASD, are highly heterogeneous. This heterogeneity may reflect the broad clinical spectrum of ASD, which ranges from mild impairments in social skills to severe intellectual disability. Accordingly, these four mouse models may represent distinct subgroups characterized by different degrees or forms of hippocampal dysfunction. Notably, among the ASD-related models, 15q11-13<sup>dup/+</sup> showed population-level spine properties closer to those found in the 22q11.2<sup>del/+</sup> and Setd1a<sup>+/-</sup> mouse models. Although we classified 22q11.2<sup>del/+</sup> and Setd1a<sup>+/-</sup> as schizophrenia-related models, both 22q11.2 deletion syndrome and Setd1a haploinsufficiency in humans are also associated with ASD, suggesting substantial overlap in the genetic risk factors underlying ASD and schizophrenia. Further systematic analyses linking rare genetic variants to synaptic phenotypes in mouse models may provide important insights into the mechanisms underlying both shared and disorder-specific synaptic alterations in neurodevelopmental and psychiatric disorders.”
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) I would suggest that it might be preferable to use the word 'neuropsychiatric' rather than 'mental' in the title.
  
  As suggested, we modified the manuscript title.
  
  (2) I think it would be clearer to say that DEGs are listed if present 'in three or more models' rather than >2 (I appreciate the latter is mathematically clear, but can easily be read as 2 or more if reading fast). This is changed in the figure legend, but I suggest it is also changed in the main text (line 352-3)
  
  As suggested, we changed the main text to incorporate "in three or more models" (page 16, line 352).
  
  (3) Please add to Methods (line 557) that 'control cultures were prepared from littermate embryos....'
  
  As suggested, we added the phrase "control cultures were prepared from littermate embryos" (page 26, line 559).
  
  (4) Sorry to add something, but please could the authors add a definition of how they calculate spine turnover (and add units to the y axis of Figure 5A-C)?
  
  As suggested, we modified the y-axis of Figure 5A-C (% as unit) and added the method of calculating spine turnover rate in the text (page 36, lines 808-811).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.10.675343v3
www.biorxiv.org www.biorxiv.org

High-resolution imaging of presynaptic ER networks in Atlastin mutants

1
1. EMBOpress 09 Jul 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1:
  
  Major comments:
  
  Lines 103-116 (first paragraph of the results section) describe mainly published data that is more suitable for the introduction section. It is annoying to refer to different published articles in the Results section to strengthen the results instead of showing them. The same goes for paragraphs two and three. Why mention those data in the Results section if they are already published and known?
  
  We have reorganized this material by moving some background information to the Introduction. Our intention was not to incorporate published data to strengthen our results, but rather to provide essential context for interpreting our findings. We have therefore left some of this foundational information in the results section to create a clear narrative flow, enabling readers to understand the basis for our experimental design and interpretations without needing to recall details from earlier paragraphs in the Introduction. For example, we considered it crucial to restate the earlier report of the BiP:sfGFP:HDEL phenotype in Atlastin mutants, since our results supporting luminal ER protein displacement contradict the previous fragmentation model.
  
  The following concept was in line 103 in the Results section, and is now in the introduction in lines 82-92: "Conventional light microscopy, commonly used in studies of neuronal ER structure, lacks the resolution necessary to visualize individual ER tubules in small structures, such as presynaptic terminals. The ER is highly sensitive to fixation, and live imaging experiments in neurons in vivo have been conducted on upright microscopes using water dipping objectives with a typical axial resolution limit of >300 nm, which cannot distinguish the densely packed ER tubules at presynaptic terminals (3,8,21,28,39-42). Electron microscopy offers higher resolution, but cannot be used in live samples and has typically been limited to thin 2D sampling (in which it is difficult to distinguish ER cross-sections from synaptic vesicles) (8,20,22)."
  
  Figure Legends-(in all Figures): The number of experimental repeats must be mentioned in the figure legends.
  
  This information is provided in Supplementary Table 1, which contains detailed information about the genotype, statistical analysis, and number of larvae and NMJs analyzed. If the journal requires this information in figure legends, we can move it.
  
  The way the figures are labeled is worrisome; supplementary figures are not ordered numerically.
  
  We will be happy to rename supplementary figures according to journal guidelines.
  
  The tubule extension in Figure 2D is not convincing. Is there a movie showing those changes? Better images are needed. It is essential to show which supplemental movie corresponds to which panel.
  
  We have now included a corresponding video of the same neuron used as an example of tubule extension. We also added another frame to the figure to provide further information on the tubule event we captured. (Figure 2D, Movie S10)
  
  This is unnecessary in the results section: "To investigate the relationship between ER structure and function at synapses, we examined mutants of Atlastin, a GTPase that regulates ER tubule fusion. Drosophila has a single homolog while mammals have three Atlastin homologs, with Atlastin-1 enriched in the brain (Rismanchi et al., 2008)."
  
  This information was moved to the introduction.
  
  "This reduction in ER membrane marker intensity has also been observed in other HSP mutants, suggesting this is a common feature of ER shaping mutants and could indicate changes in ER membrane composition, integrity, or tubule thickness (Perez-Moreno et al., 2023)." This comparison is important and should be shown in the same settings as for the Atlastin mutant rather than referring to published data.
  
  We agree with the reviewer that it is important to determine whether other ER-shaping proteins, besides Atlastin, also show a decrease in tdTomato:Sec61b to support our claim that this could be a common feature among ER-shaping mutants. To do this, we examined mutants of another ER-shaping protein, Reticulon 1, which regulates membrane bending and stabilization in ER tubules. These loss-of-function mutants were a gift from Dr. Cahir O'Kane at the University of Cambridge and were used in his lab's Pérez-Moreno et al., 2023 publication. We found that in our hands tdTomato:Sec61b levels were reduced in Reticulon 1 mutants, consistent with the results reported by Pérez-Moreno et al. (2023). These results are in Figure 3E-F. We also examined the synaptic distribution of the luminal ER marker, BiP:sfGFP:HDEL, in Reticulon 1 mutants to see if it is displaced to the cytosol. Notably, it remained ER-associated, unlike in Atlastin mutants. These results are in Figure 6F-G, results lines 267-270, and discussion lines 542-545.
  
  Does the distribution of the luminal ER marker in Figure 6F diffuse due to mislocalization or reflux after being localized to the ER and then refluxed to the cytosol as was previously shown for the ER to Cytosol signaling (ERCYS) mechanism? Could you assess other ER-luminal protein localization biochemically? It is highly recommended to look at another soluble ER-protein localization in the Atlastin mutant without overexpression, which can be an artifact.
  
  ER stressors can induce ERCYS, in which some luminal proteins, including PDIA3, DNAJB11, ERp29, and an eroGFP reporter, reflux by 30-70% to the cytoplasm without subsequent degradation (unlike ERAD (ER-associated degradation). This phenomenon has only previously been observed in yeast and glioblastoma tumor cells from mice and human . We believe that our work provide the first suggestion that this may occur in neurons, and particularly in a neurological disease model.
  
  We do not believe that the reflux phenotype for BiP:sfGFP:HDEL is due to its overexpression for two reasons: (1) we observe reflux in our neuronal Atlastin knockdown experiments, even when the levels of BiP:sfGFP:HDEL are significantly reduced artificially because of titration of the GAL4 between the RNAi and the reporter (Figure 7A), and (2) BiP:sfGFP:HDEL overexpression somewhat suppresses endogenous BiP upregulation ((Figure 10 and see Reviewer 1.10), arguing that the transgene does not induce ER stress). We included a new "limitations of the study" section to be transparent about the caveats of the BiP:sfGFP:HDEL reporter (lines 639-664).
  
  Identifying potential endogenous neuronal ERCYS substrates in our in vivo preparation poses several challenges. First, biochemical approaches, such as fractionation, are not possible in our complex in vivo sample because neuronal ER proteins would mix with ER from other tissues upon homogenization. Second, detecting endogenous proteins with antibodies requires fixation and permeabilization, which notoriously disrupts ER structure and even causes our reporter BiP:sfGFP:HDEL to collapse from a smooth distribution, as visualized by live imaging and FRAP, to a punctate distribution. Third, using antibodies rather than neuronally restricted transgenes makes it challenging to determine whether the signal originates from the neuron or from dense ER structures in the surrounding muscle. Fourth, some ER luminal proteins can displace as little as 30% in the ERCYS examples cited above, and the sensitivity of our imaging assays may limit our ability to detect these small changes. Finally, the limited availability of tagged transgenes and antibodies specific to Drosophila luminal ER proteins (see next paragraph) poses additional challenges. These limitations highlight the need for future studies to develop novel tools and techniques to more definitively test whether we are indeed observing ERCYS. We have included a paragraph on these future challenges in our discussion in lines 639-664. Identifying endogenous targets of ERCYS in fly neurons is a worthwhile goal, but beyond the scope of the current study. These next steps will particularly benefit from identifying the machinery involved in the reflux of our BiP:sfGFP:HDEL reporter.
  
  Tools we tested: We investigated several options: (1) a tagged PDI transgene (a gift from Karen Hibbard), which was not detectable at presynaptic terminals, (2) a tagged BiP (FlyORF; F000956) that did not localize to the ER, and (3) full-length endogenous BiP detected by antibody staining. We did not detect obvious reflux of endogenous BiP to the cytoplasm (Figure 9), with the caveat that in fixed samples, the BiP signal was not tightly co-localized with the ER marker even under control conditions. However, we did use this antibody to detect an increase in BiP in Atlastin mutant presynaptic terminals, indicating ER stress (see Reviewer 1.10).
  
  Though we have not identified endogenous targets, we believe that our studies with the exogenous reporter will be of great interest to the field, as they clarify the previously reported Atlastin phenotype and provide the first report of a new defect in a human disease animal model.
  
  In comparison to Summerville et al. (2016) in Figure 7, the experiment was not done in the same way. It is important to keep the same settings for comparison
  
  In Figure 7D-E, we compare the distribution of BiP:sfGFP:HDEL in cell bodies, axons, and muscles between controls and Atlastin mutants. To clarify the experimental approach relative to Summerville et al. (2016): while both our studies examined the same cellular compartments (cell bodies, axons and nerve terminals) using the BiP:sfGFP:HDEL reporter, we employed super-resolution Airyscan microscopy. This enhanced resolution was critical for definitively demonstrating that this is a functional rather than a structural phenotype and that ER displacement is progressive, and repeating this experiment at lower resolution as previously reported does not provide any new information. We identified two distinct distribution phenotypes in Atlastin mutants expressing BiP:sfGFP:HDEL, which were not described in the Summerville et al., 2016 paper. From our manuscript (lines 249-251): "We identified two distinct ER network phenotypes in Atlastin mutants expressing BiP:sfGFP:HDEL: "Partial loss" NMJs retained both diffuse signal and identifiable ER network structures, while "Complete loss" NMJs showed no visible ER network structures. Note that the "Complete loss" phenotype in Atlastin mutants reflects the absence of detectable luminal marker signal in organized ER structures, but not the complete absence of ER membranes, as demonstrated by our ER membrane marker tdTomato:Sec61β results."
  
  Does the Atlastin mutant induce the unfolded protein response and stress within the ER? It is necessary to look for UPR markers in those settings. It was shown previously that ER stress leads to protein reflux from the ER to the cytosol. Is there a difference in the ER stress markers in the presynaptic terminal?
  
  The reviewer suggested that Atlastin mutant synapses may exhibit ER stress. To address this, we examined levels of the ER chaperone BiP, a well-established ER stress marker whose expression increases during UPR activation. We first validated that our BiP antibody can detect changes in ER stress by feeding control larvae with 50mM DTT for 24 hours. These results are in the new Figure 10A. Note that we were unable to test sensitivity to ER stress in this way in Atlastin mutant larvae because they did not consume the DTT-treated food, as assessed by blue food coloring in the larvae's guts.
  
  Using this antibody, we measured baseline BiP levels at NMJs of Atlastin mutants on normal food, and found they were slightly increased compared to controls. We conclude from these experiments that Atlastin mutant synapses have mild ER stress. Notably however, Atlastin mutants co-expressing UAS-BiP:sfGFP:HDEL or UAS-tdTomato:Sec61b did not show significantly increased endogenous BiP levels, suggesting that transgene expression at least partly suppresses the mild ER stress response, even though there is extensive cytosolic displacement. These results argue (1) that the mild ER stress in Atl mutants does not strictly correlate with the reflux phenotype, and (2) that the reflux phenotype is not an artifact of overexpression-induced stress. These results are described on lines 430-436 in the results section and shown in Figure 10B-E, and their implications discussed on lines 585-598.
  
  We also explored another strategy to detect ER stress by assessing eIF2α phosphorylation, a key event in the Unfolded Protein Response (UPR) pathway. We obtained a phospho-eIF2α antibody (Cell Signaling; #3597) that was reported to work in Drosophila. However, when we tested this antibody by Western blot, we were unable to detect a band at the expected molecular weight for phosphorylated eIF2α, even in positive-control samples treated with DTT to induce ER stress. We therefore concluded that this antibody is not suitable for reliably detecting ER stress in our experimental system. The failure of this antibody highlights the challenges of finding robust tools to measure ER stress in Drosophila.
  
  It is important to add biochemical experiments to show that no fragmentation of the ER membrane occurred. It can be simply demonstrated by looking at the redox state of the ER, which would change if it were mixed with the reducing cytosol. Moreover, this can be shown by using an ER-targeted redox-sensitive fluorescent protein that is tethered to the ER membrane to follow changes in the redox state of the ER.
  
  The reviewer asked us to test whether the redox state of the ER is disrupted, which could indicate exchange between the cytosol and ER due to membrane rupture. As noted above, biochemical approaches such as fractionation are not possible in this in vivo sample. We attempted to address this concern by creating a UAS-Sec61β:roGFP construct, using the roGFP sequence from Igbaria et al. (2019) to monitor the ER lumen redox environment in Atlastin mutants. Since Sec61β is membrane-tethered, it should remain in the ER and not undergo reflux, making it an ideal sensor for detecting any mixing between the reducing cytosolic environment and the oxidizing ER lumen that would occur if membrane fragmentation and/or ruptures were present. We tested this approach in wild-type Drosophila S2 cells and used the Gal4-UAS binary expression system to co-express Actin-Gal4 (to drive expression of UAS constructs), UAS-Sec61β:roGFP (redox sensor), and UAS-BiP:Halo:HDEL (as a control reporter insensitive to DTT treatment).
  
  Our experiments showed no detectable changes in the fluorescent properties of UAS-Sec61β:roGFP following 30 min 10mM DTT treatment compared to DMSO vehicle control, including no increase in 405-nm excitation fluorescence or changes in 488nm/405nm excitation ratios. These results suggest that either the roGFP sensor requires further optimization for sensitivity in this cellular system or that additional controls and calibration steps are needed to establish the dynamic range of the assay. We believe this experiment falls beyond the scope of the current study, given the extensive optimization required. However, it represents an important future direction for testing membrane fragmentation as a mechanism underlying the phenotypes observed in Atlastin mutants. The possibility of ER integrity defects is mentioned in the discussion on lines 547-559.
  
  Minor comments:
  
  It is important to call figures by order. Figure 2C is called before 2A-B. Figure 2B is called before Figure 2A.
  
  The revised manuscript has all figures in order of appearance in the text.
  
  Figure legends (Figure 2): "The same control dataset used in E-G was used in Figure 5 and Figure 5_Supplement." Why is this relevant?
  
  We wanted to be transparent about reusing the same control dataset across multiple figures to avoid any appearance of data duplication. This notation clarifies that, although the data appear in different contexts (Figures 2 and 5. This version does not contain a Figure 5_Supplement), it represents the same biological samples analyzed for different parameters, ensuring readers understand that these are not independent datasets.
  
  Figure 4F is called before Figure-4D-E which are not called.
  
  We revised our manuscript and reorganized Figure 4 to ensure that all figure panels are referenced in sequential order and that panels 4D-E, which were previously not cited in the text, are now properly referenced when discussing their corresponding results.
  
  Figure 5B is called before the previous ones. Same for Figure 5A supplement.
  
  We referenced Figure 5A in lines 211-212, which precedes our discussion of Figure 5B. To clarify the figure order, we removed the early references to Figures 2D-G and Movies 7-14, which were mentioned only to indicate that we were analyzing the same dataset in different ways.
  
  The revised manuscript has all figures in order of appearance in the text.
  
  Referees cross-commenting
  
  I agree with the comments raised by reviewer2 and 3. Basically it is highly important to validate those data by genetic rescue. Moreover, it is essential to know the source of the displaced luminal marker to the cytosol. Is it mislocalization or it is a reflux of pre-existing protein to the cytosol after insertion to the ER. It is also recommended by me and the reviewers and me to test the endogenous protein rather than overexpression.
  
  We have addressed these points in our responses to the following reviewer questions:
  
  Genetic rescue: Please see our responses to Reviewer 1/Question #10 and Reviewer 2/Question #1.
  
  Source of displaced luminal marker: We provide some evidence addressing this in our response to Reviewer 3/Question #1.
  
  Endogenous protein localization: We have examined this and detailed our findings in our responses to Reviewer 1/Question #7 and Reviewer 2/Question #6.
  
  Reviewer #1 (Significance (Required)):
  
  General assessment: This interesting paper shows that proteins can escape the ER under special conditions. However, the authors need more evidence to show that and rely less on the overexpression system, especially of BIP-GFP, which can cause proteostasis stress within the ER. Advance: The results have been oversimplified in their explanations, and some points and complexities of the study need to be addressed further to make the most of them. These are often some of the more interesting concepts in the paper. I think many points can be addressed in the text by the authors being clear and concise with their reporting. At the same time, other experiments would turn this paper from an observational one into a very interesting mechanistic one. This paper is based on previously published articles from the group and other groups, and it is a nice progression. However, as mentioned, this paper depends primarily on published data, and the novelty is somehow lost between all the comparisons to other published data instead of emphasizing that. Without a substantial mechanistic improvement, the paper would remain observatory.
  
  Audience: The microscopy tools can be great addition to researchers in the field to monitor protein trafficking especially Cell biologists (basic research)
  
  My expertise: ER homeostasis, protein trafficking, cell biology
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  Summary The endoplasmic reticulum (ER) is a continuous organelle that extends throughout neurons to regulate fundamental processes. The analysis of ER dynamics at synaptic terminals is limited by the challenge of imaging these structures at high resolution. In this manuscript, the authors use super-resolution (~170 nm) live imaging and a combination of membrane and luminal ER markers at the Drosophila larval NMJ, an important model synapse, to investigate dynamic ER architecture in vivo. They report a detailed characterization of the presynaptic ER organization and dynamics at wild-type and GTPase Atlastin mutant NMJs. Their analysis using the ER membrane marker tdTomato:Sec61b reveals the presence of an intact ER network in Atlastin mutants. This contrasts with the apparent ER fragmentation phenotype previously reported and replicated here when using a luminal marker. Their findings instead point to the progressive displacement of luminal proteins to the cytosol in Atlastin mutants specifically at synapses. The authors propose that the disruption of ER protein dynamics at synapses is a compartment-specific ER stress response. The manuscript is well written, results are clearly presented, and experiments are technically rigorous.
  
  Major comments
  
  The baseline ER phenotypes in Atlastin mutants are mild with complete loss of ER network only observed in terminal boutons. This interesting and unexpected result should be further confirmed by genetic rescue. The authors can use a UAS rescue line previously reported in PMID: 19341724.
  
  We tested the UAS-Atl-myc rescue line and unfortunately found that even in wild-type neurons, overexpression of Atlastin produced strong ER organization defects that precluded the rescue experiment. Instead, to confirm the cell autonomy of the phenotype and to test it wth an independent tool, we performed a presynaptic knockdown of Atlastin by RNAi and found that BiP:sfGFP:HDEL is displaced, as observed in the Atlastin null mutant. These results are in now shown in Figure 7A-C.
  
  Lines 204-7: It's not clear how a greater coefficient of variation indicates that the marker is more concentrated in subsynaptic structures or what is meant by 'subsynaptic structures.'
  
  We added the following text to explain, in lines 181-183: "A higher CoV indicates an uneven distribution of tdTomato:Sec61β within the presynaptic terminal, with some areas showing higher concentrations than others (in contrast to the uniform, diffuse signal expected from fragmentation)." To avoid confusion with postsynaptic structures called the subsynaptic reticulum, we have removed the term "subsynaptic". The intended meaning is distinct structures found within the presynaptic terminal.
  
  There's a mistake in Figure 6C and the associated text. The summed percentage of the three phenotypic categories adds up to 110% for Atlastin mutants. *
  
  The reviewer noted that the summed percentage of the three phenotypic categories in Figure 6C adds up to 110% for Atlastin mutants, which appears to be a mathematical error. However, this is not an error, but rather a reflection of our quantification methodology, in which a single bouton can exhibit more than one type of ER dynamics per movie recorded. Our quantification counts each phenotype independently, so boutons displaying multiple phenotypes contribute to more than one category. This approach provides a more comprehensive view of the range of ER dynamics present in Atlastin mutants, as restricting the analysis to mutually exclusive categories would underrepresent the complexity of the phenotypes observed. To make this point clear, we made the following change to the text in lines 257-259: "We note that the sum of these percentages exceeds 100% because one NMJ exhibited multiple phenotypes: one branch had a complete loss, while the other branch had no phenotype. These phenotypes were counted separately."
  
  Figure 8: the ER looks fragmented in 1st instar controls and mutants. The authors should address this difference from more mature NMJs.
  
  We would like to clarify that the bulk of experiments in this manuscript (including all ER dynamics, luminal marker redistribution, and membrane marker analyses discussed throughout the Results) were performed in 3rd instar larvae, which are more mature larval NMJ preparations standard in the field. Figure 8 was included specifically to test whether the Atlastin mutant phenotype we describe throughout the paper is also detectable at an earlier developmental stage, not to replace or reinterpret our primary findings.
  
  Regarding the specific observation that the ER appears more fragmented in Figure 7F-H relative to the more mature NMJs shown elsewhere: this fragmentation, observed similarly in both control and Atlastin mutant 1st instar larvae, likely reflects technical challenges associated with dissecting these smaller, more delicate early-stage specimens rather than a genotype-specific effect. Because fragmentation occurred similarly in both genotypes, we could still reliably assess the redistribution of BiP:sfGFP:HDEL as our primary phenotypic readout in this experiment. We have added the following text (lines 306-309) to clarify this point: "Note that in 1st instar larvae, both normal networks in controls and residual networks in Atlastin mutants appeared more fragmented than in 3rd instar preparations, likely due to the technical challenges of dissecting these smaller, more delicate specimens. Since ER fragmentation occurred similarly in both genotypes, we could still reliably assess the redistribution of BiP:sfGFP:HDEL as our primary phenotypic readout.
  
  The images in figure 9B do not seem representative of the quantification in Figure 9D. Specifically, the partial loss Atlastin NMJ appears to have recovered as fully as the complete loss Atlastin NMJ.
  
  The images showed FRAP recovery across the entire bouton, but we photobleached only a small region within each bouton and quantified only this region. We have now added outlines to clearly delineate the specific FRAP regions that were analyzed in each image, which clarify that the partial loss Atlastin showed less recovery than the overall bouton. We have also reordered the figures to more clearly convey our message (Figure 9 is now Figure 8).
  
  We also made a few changes to the paragraph on lines 347-350 to clarify our experimental reasoning: "We photobleached en passant boutons using a defined region of 6.8 x 7.8 microns (dashed box in Figure 8D) to ensure that BiP:sfGFP:HDEL could recover from the ER networks surrounding the FRAP region (Movies S20-S23)."
  
  We also added this sentence to the figure legends of Figure 8: "The dashed boxes in (D) indicate areas that were photobleached and analyzed for recovery quantification in (E-F)."
  
  Optional: An overexpressed luminal marker is displaced to the cytoplasm in Atlastin mutants. It would be interesting to know and increase the significance of the findings if the same is true of endogenous luminal proteins under biological stress conditions.
  
  As noted in our response to Reviewer #1 suggested that Atlastin mutant synapses may exhibit ER stress. To address this, we examined levels of the ER chaperone BiP, a well-established ER stress marker whose expression increases during UPR activation. We first validated that our BiP antibody can detect changes in ER stress by feeding control larvae with 50mM DTT for 24 hours. We were unable to perform this experiment in Atlastin mutant larvae because they did not consume the DTT-treated food, as assessed by blue food coloring in the larvae's guts. These results are in Figure 10A. In the future, it will be of interest to establish a protocol to examine Atlastin mutants by feeding or treating larval fillets with DTT.
  
  We measured BiP levels at NMJs of Atlastin mutants and found they were slightly increased compared to controls. Atlastin mutants co-expressing UAS-BiP:sfGFP:HDEL or UAS-tdTomato:Sec61b did not show significantly increased endogenous BiP levels, suggesting that transgene expression suppresses the mild ER stress response. We conclude from these experiments that Atlastin mutant synapses have mild ER stress. These results are in Figure 10B-E).
  
  Optional: Applying this approach in stimulated conditions (high potassium, increased temperature) might reveal a greater activity-dependent role for Atlastin at synaptic terminals.
  
  This is a very interesting idea, as we have only examined synapses at rest. However, this is beyond the scope of this paper.
  
  Minor Comments
  
  Line 16: Atlastin should be italicized.
  
  Thank you for catching this typo. We have fixed it.
  
  Figure 5A: Based on the relative intensities, it appears that control and mutant images are not contrast matched but this isn't stated.
  
  Thank you for catching this omission. We added to the figure legend: "Control and Atlastin mutant images are not contrast matched."
  
  Line 822: The number of static Atlastin mutant boutons used for analysis is missing.
  
  Thank you for catching this omission. We have fixed this supplementary table.
  
  Figure 9: The blue arrows are not annotated in the figure legend.
  
  Thank you for catching this omission. We have fixed this figure legend.
  
  Reviewer #2 (Significance (Required)):
  
  Atlastin is linked to Hereditary Spastic Paraplegia (HSP) and this study changes our understanding of the compartment-specific impacts of its loss. This study reveals the importance of using both membrane and luminal ER markers to accurately interpret phenotypes as well as the importance of considering compartment-specific effects on ER. These findings represent significant mechanistic and conceptual advances. The lack of genetic rescue is a limitation and adding an investigation of an endogenous luminal protein under basal and stress conditions would add significantly to our understanding of Atlastin dysfunction in HSP. Notably, the in vivo imaging approach introduced here can be adapted broadly for live imaging of Drosophila larvae. Thus, this work will be of interest to both neuronal cell biologists and the wider Drosophila community. This review is based on our expertise in neuronal cell biology.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  In this manuscript, the authors investigate the structural dynamics of the endoplasmic reticulum (ER) in Drosophila neurons and examine the role of the ER-shaping protein Atlastin in ER morphology. Their discovery on the neuromuscular junction (NMJ)-specific contribution of Atlastin to ER integrity is intriguing and may provide valuable insights into the pathological mechanisms underlying Atlastin mutations associated with hereditary spastic paraplegia (HSP) and hereditary sensory neuropathy. The key observation on ER protein showing an aberrant cytoplasmic localisation in mutant cells appears convincing. Though this phenomenon's characterisation stays at the point of primary observation with its mechanics unclarified, establishing this new and unexpected functional rather than structural Atl effect is important and useful for the field. The observation that ER is structurally preserved in this mutant with absolute lack of Atl are also extremely useful.
  
  It is unclear if the cytoplasmic localisation affects an exogenous overexpressed ER marker or endogenous protein would also appear in cytoplams, the authors should consider adding an immunostaining data to test that.
  
  Authors offer speculations on potential reasons for the cyto localisation of the ER marker suggesting that relocation at the cell periphery specifically combined with slow clearance there is the most likely explanation (still unclear what stops the marker from spreading through the entire cell). They suggest that decrease in cotranslational translocation is unlikely as this would result in somatic accumulation of the marker. However, if the clearance in the periphery is less efficient than in soma, the accumulation there might reflect a compromised translocation. Any clarifying experiments, if practical, to directly demonstrate how ER proteins in relocates to the cytoplasm in atl mutant would help understanding better the phenomenon. For example, would proteasomal inhibition make the marker accumulate more across the cell? Authors also suggest links to ER stress. Would stress induction phenocopy the mutant?
  
  Reviewer #3 asked whether defective proteasomal clearance underlies the cytosolic accumulation of BiP:sfGFP:HDEL in Atlastin mutants. We addressed this directly. First, proteasome function appears intact in the mutants: baseline ubiquitinated protein levels (FK1 antibody) were comparable between control and Atlastin mutants, and MG132 treatment produced a similar increase in ubiquitination in both genotypes, confirming both antibody specificity and normal proteasome activity. We then examined BiP:sfGFP:HDEL directly. In controls, MG132 caused the marker to accumulate at axons and presynaptic terminals, showing that it is normally cleared from these compartments by the proteasome. Critically, this accumulated marker remained associated with intact ER networks: MG132 did not induce diffuse cytosolic BiP:sfGFP:HDEL in any compartment (cell bodies, axons, or presynaptic terminals), even where levels rose substantially. Thus, blocking proteasomal clearance raises ER-localized marker but does not generate the cytosolic pool seen in Atlastin mutants, indicating that impaired clearance is not sufficient to cause the displacement phenotype. We separately noted that BiP:sfGFP:HDEL was already elevated in Atlastin mutant axons without MG132, paralleling the axonal tdTomato:Sec61β accumulation in Figure 4, consistent with reduced baseline clearance specifically in mutant axons, but this does not lead to cytosolic displacement. This experiment is now shown in Figure 11, described in Results (lines 445-475), and discussed in lines 576-581.
  
  Minor comments:
  
  Line 146:
  
  "fast dynamics (Thank you for catching this mistake. We have corrected it.
  
  Fig. 2D: The data representation of "Tubule displacement" image is unclear. The ER tubule indicated by the red arrow does not seem to show any changes over time (like static). time 0 in stamp appears behind the image.
  
  Thank you for catching the typo. We have fixed it. Additionally, we added black arrows to highlight a tubule that is not moving, allowing the reader to compare it with the moving tubule. We also included a video of all types of ER tubule dynamics to ensure the reader can also look at the raw data (Movies S9-11).
  
  Line 157-158 (and relevant method sections):
  
  The definition of static and dynamic boutons is ambiguous. The author should describe in more detail this point including how long they observed the structure to define the changes in ER tubule dynamics.
  
  We provide in the methods (lines 779-791) a detailed explanation of how we categorized boutons as dynamic or static. In addition, we added the following to explain in the results section how we defined static vs dynamic:
  
  Old sentence: We qualitatively categorized boutons as "static" if we observed no change in ER network structure or "dynamic" if we observed at least one change.
  
  New sentence in lines 143-147: "We imaged boutons for 40 sec at 0.92 sec intervals to capture ER dynamics over this observation period. Boutons were qualitatively categorized as "static" if we observed no detectable changes in ER network structure throughout the entire 40 sec imaging session, or "dynamic" if we observed at least one of the three defined dynamic events during this time window."
  
  Fig. 2E: What n=75 and n=29 represent is unclear, are these the number of boutons in en passant and terminal subjected for qualitative analysis?
  
  We removed these n values from the figure and added this information to the Supplementary Table 1, which contains detailed information about the genotype, statistical analysis, and number of larvae and NMJs analyzed.
  
  Fig. 2: What the qualitative analysis represents is unclear, are the points pulled from different experiments?
  
  The data in Fig. 2 E-F comes from movies acquired in the same experiment. The number of independent animals and NMJs imaged is described in Table 1.
  
  * *Line 231: Regarding "...we found a small but significant reduction in dynamic boutons in Atlastin mutants (76%), ...", how do the authors assess significance. If proportion of static/dynamic ER in boutons was obtained from multiple experiments, it should be presented e.g. as in average {plus minus} standard deviation, or clarify that the proportion is representative of x independent experiments.
  
  The videos used for this figure were acquired from a single experiment. We use a chi-square test to determine significance relative to the "expected" distribution of dynamics types from controls, as these are categorical rather than continuous data (see PMID 31145670). Information regarding genotype, statistical analysis and number of larvae and NMJs can also be found in Supplementary Table 1.
  
  Line 267-269 and Fig. 6B: The author's conclusion that "Complete loss of ER network structure in NMJ of BiP:sfGFP:HDEL overexpressing Atl mutant" seem to be based on the lack of signal from luminal marker, which may be undetectable due to changes to tubular volume or marker loss to the cytoplasm, as suggested by the authors, while the membranous ER structure is intact. It would be useful to discuss this point and potentially add ER membrane-stained control.
  
  We agree with the reviewer that Atlastin mutants categorized as 'complete loss mutants' do not actually lack ER at synapses. We think this is an important point so we added the following to the results in lines 251-254: "Note that the "Complete loss" phenotype in Atlastin mutants reflects the absence of detectable luminal marker signal in organized ER structures, not the complete absence of ER membranes, as demonstrated by our ER membrane marker tdTomato:Sec61β results."
  
  We attempted to co-label the ER membrane and ER lumen, but these crosses yielded very few live larvae (in either controls or Atlastin mutants, and those that survived had severely deformed NMJs. We added Figure 6-Supplement showing the results of this experiment, and described them on lines 270-273.
  
  Fig. 6C: In Atl mutant, why does the total of the proportion exceed 100% (10 + 45 + 55)?
  
  The reviewer noted that the summed percentage of the three phenotypic categories in Figure 6C adds up to 110% for Atlastin mutants. This is not an error, but rather a reflection of our quantification methodology because a single bouton can exhibit more than one type of ER dynamics per movie recorded. Our quantification counts each phenotype independently, so boutons displaying multiple phenotypes contribute to more than one category. This approach provides a more comprehensive view of the range of ER dynamics present in Atlastin mutants, as restricting the analysis to mutually exclusive categories would underrepresent the complexity of the phenotypes observed. To make this point clear, we made the following change to the text in lines 257-259: "We note that the sum of these percentages exceeds 100% because one NMJ exhibited multiple phenotypes: one branch had a complete loss, while the other branch had no phenotype. These phenotypes were counted separately."
  
  Fig. 9C, line 342-344: In FRAP experiment using CD8, it seems that the Partial loss Atl mutant shows slower recovery that control. There seems to be a mismatch in triangle symbols of Partial loss Atl mutant between legend and plot (one is filled and the other is empty). This should be clarified.
  
  Thank you for catching this mistake. We have fixed the figure.
  
  fig. 10 is a clever way to verify the cytoplasmic localization of the ER marker; however, its description and annotation can be improved, and it would be stronger if 4 curves in F for mutant and controls with the trap and normal were shown.
  
  The reviewer suggested merging our graphs but we believe that keeping them separate is clearer.
  
  Line 495: Drosophila have ReepA and ReepB, but not Reep1-4. If the authors discuss their speculation based on their observation (using Drosophila), the gene names should be unified in the same species, and explain the corresponding genes to mammalian cells.
  
  We made the following changes to address the reviewer's concern about gene nomenclature consistency (lines 502-506): "These ER-derived vesicles are likely to involve ReepA and ReepB, the Drosophila orthologs of mammalian REEP1-4, which regulate ER vesicle formation in mammalian cells (67). Notably, while overexpression of Atlastin can regulate REEP vesicle fusion in mammalian systems (67), it is not essential for vesicle formation, suggesting similar regulatory relationships may exist between Atlastin and Reep genes in Drosophila."
  
  Line 548; should UPR be Unfolded Protein Response?
  
  Thank you for catching the typo. We have fixed it.
  
  Reviewer #3 (Significance (Required)):
  
  This study advances the understanding of how ER morphogens affect neuronal cells specifically, the lack of which limits researchers ability to comprehend the neuronal pathologies associated with ER structure-function. The observation on ER content aberrant localisation caused by the lack of key structural protein should be of a great interest for cell and neuronal biologists and researchers of the associated diseases and shows the field a new direction. Though, mechanistic details remain to be unraveled, it constitutes a fundamental, conceptual advance.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2023.09.01.555994
www.biorxiv.org www.biorxiv.org

Viral commitment to infection depends on host metabolism

1
1. Public_Reviews 09 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  In the wild, bacteria can be found in a wide range of metabolic states, including states in which they are resource-limited. Because phages heavily rely on the infected cell's molecular machinery to replicate, it is natural to wonder how phage-bacteria interactions depend on the metabolic state of the cell. In this work, Marantos et al. investigate specifically how the rate of infection of 5 different phages changes between cells grown in energy-rich conditions and cells grown in energy-depleted conditions. Their results clearly show that 4 out of the 5 phages studied display a significant reduction in infection rate in cells that are energetically depleted and provide a potential explanation for this observation by looking into the mechanisms that these phages use to irreversibly infect their host cells.
  
  The work also tries to explain the observation using a mathematical/mechanistic model that describes infection as the sequence of two steps, where a phage first needs to bind to a cell receptor, from which it can potentially unbind, and then irreversibly infects by injecting its genome. While the model is sensible from a mechanistic perspective, the experimental evidence that supports how each model's rate is affected by the cell metabolic state is weak, as only ratios of these rates can be inferred from the data.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors investigate the dependence of phage adsorption rates on host metabolic state, using 5 coliphages that differ in their infection cycles and host receptors. They find that four of the 5 phages showed significantly reduced infection under low metabolic states, with phages that generally have weaker adsorption being more strongly affected by low metabolism. The authors complement their findings with a 2-step infection model where phages can disengage from their hosts after initial adsorption. The paper illustrates the power of standardized experimental protocols for quantitative trait comparisons and highlights the dependence of phage infection success on host physiology.
  
  Strengths:
  
  The paper is well written and clearly structured.
  
  The experiments are well-designed, and particularly commendable is the diligent use of control scenarios to allow for quantitative comparison between phages. This standardized protocol will be valuable for the entire phage community.
  
  The authors convincingly show the impact of host physiology on phage adsorption success. This dependence has so far mainly been considered for intracellular phage replication, and the paper shows that host physiology has to be taken into account at all steps of phage infection.
  
  Weaknesses:
  
  There are some concerns about the experimental setup and which conclusions can be drawn from it:
  
  Before phage infection, bacterial cultures are grown to exponential growth, washed, and then resuspended with glucose or arsenate-azide for 10min. It is however, questionable that 10 minutes is enough to simulate high and low metabolic states realistically. 10 minutes seems to be quite short to go from exponential growth to a low metabolic state, given the transcriptional memory of previous environments. It seems more likely that the population will be quite heterogeneous, with cells in various states of transition towards low metabolic states.
  
  While we agree with the reviewer that during metabolic transitions there may be a period in which the population is heterogeneous, with cells in different stages of transition toward a low metabolic state, the 10-minute treatment used here was chosen based on prior work showing that arsenate–azide rapidly inhibits cellular energy metabolism and is sufficient to eliminate the hyper diffusion of the λ receptor (Winther et al., Biophysical Journal 2009, http://dx.doi.org/10.1016/j.bpj.2009.06.027). We have also corrected the DOI for this reference in the manuscript. Furthermore, the ATP pool of log-phase E. coli turns over several times per second (Holms et al., Arch. Mikrobiol. 1972, http://dx.doi.org/10.1007/BF00425016). We therefore assumed the bacteria were energy depleted after 10 minutes. We have clarified this point in the revised manuscript.
  
  Given that arsenate and azide inhibit cellular metabolism, i.e., have antimicrobial effects, cells might not just downregulate metabolism but also activate the stress response, and this causes some of the observed effects on phage adsorption. Therefore, the 'low metabolic state' of the cells in this paper could mean that cells are starved or that they are stressed or both.
  
  The reviewer is correct. We don’t exclude indirect effects. However, as nutrients were removed from the bacteria by washing and energy metabolism was inhibited by the addition of arsenate and azide, we assumed a stress response requiring biosynthesis would be unlikely to occur.
  
  The abundance of receptors could change between the high and low metabolic media conditions and contribute to the observed differences in adsorption, while the authors seem to assume in their model that the initial adsorption rate always remains the same.
  
  We do not think that the observed differences in adsorption are explained by a change in receptor abundance. In a previous study using the same experimental protocol as in the present work, phage λ was compared to the metabolically insensitive mutant λh (Brown et al., PNAS 2022, http://dx.doi.org/10.1073/pnas.2106005119). If the lower adsorption in the low-metabolic condition were caused by a reduced number of receptors, then λh should also have shown a lower adsorption rate under the same condition. Instead, no measurable effect on λh adsorption rate was observed. We therefore conclude that the effect is not explained by changes in receptor number on the timescale of the experiment. We have clarified this point in the revised manuscript.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Marantos et al. showed that for some coliphages, the energetic state of the bacterial host cell has a strong impact on whether phage infection is initiated. The authors drew this conclusion from the observation that there are more free phages remaining in the medium after infection of arsenate-azide-treated cells as compared to after infection of untreated cells. These data were analyzed and reported both as ratios of the treated vs. untreated conditions and using a mass-action kinetic model of phage-cell collision in the infection mixture. The data supported the findings that for four phages infecting Escherichia coli bacteria, namely, phages λ, ɸ80, m13, and T6, the phages are less likely to initiate infection if the host bacteria are energy-depleted. However, for phage T5, the authors found that their infection propensity is not impacted.
  
  Strengths:
  
  The data presented by the authors clearly supported the principal conclusion of the study ("Viral commitment to infection depends on host metabolism"). The five phages chosen by the authors represent different viral lifestyles and infection mechanisms, highlighting the potential applicability to other Escherichia coli phages. Finally, the authors successfully used a classic mass-action model of phage-cell collision to interpret their data. The simplicity of their experimental assay, combined with the use of this mathematical model, offers other investigators who study phage-bacterial interactions in other contexts a potentially useful toolkit to examine infection in general, and specifically, the dependence of phage infection on the host's metabolic state.
  
  Weaknesses:
  
  (1) The authors isolated and measured the numbers of free phages in the medium after infection of bacteria under different treatments. These measurements were analyzed in two different ways: (1) simply as ratios (corrected/normalized using different controls), and (2) fitted using a simple mathematical model. I have concerns regarding both analyses.
  
  (1.1) For the first method, having different time points at which the sample of each phage is collected critically complicates data interpretation. As one incubates the phage-bacteria mixture for a longer time, more infection occurs, and the number of phages collected from the mixture decreases. Therefore, the different incubation time forfeits the goal of "a systematic and quantitative comparison across different phages [...]", just as the authors self-criticized. Conceivably, the authors could have used the shortest measurement time for all phages (i.e., 10 minutes, as for phage λ). Alternatively, the authors could have applied a systematic criterion such as half (or any other fraction) of the latent period of each phage, which would still "maximize the incubation period while ensuring that manipulations were completed before the first infection cycle concluded". In my view, the seemingly arbitrary measurement time for each phage renders the entire first analysis very challenging to interpret. It also goes against the author's proposition that the protocol was "standardized" or "consistent". It is not clear what the readers are supposed to take away from this first analysis, or rather, which evidence, finding, or conclusion the manuscript would lose if the authors only presented the modeling-based analysis.
  
  (1.2) The second method of analysis sought to remove the dependence of the measurements on time. I completely agree with this goal, and the findings extracted from this analysis significantly contributed to the merits of this manuscript. However, the authors achieved this goal using a single time point for each phage to calculate the infection rate (η). As shown in Figure S3, each of the phage depletion curves is anchored by only one data point (note that the P(t)/P(0) = 1 at t = 0 is assumed, not measured). This goes against the typical way this collision model is used in the literature, where a time series is measured and used to fit the model (e.g., DOI 10.1007/978-1-60327-164-6 18, or more recently, PMID 39700139). This practice in the current manuscript reduced the robustness of the inferred η values. This problem is exacerbated by assumptions used by the authors in formulating this model. For instance, the authors used a constant value for the bacterial concentration, B, because "bacterial growth and lysis were negligible" (lines 135-136). However, considering that the bacteria were cultured at 37oC in a very rich medium (first in YT broth, then in 2% glucose), the measurement times of 20, 30, and 55 minutes are most likely one or a few generations of bacterial growth and division.
  
  Related note: I suggest that one of the panels in Figure S3 should be moved to the main text, since it is critical to the second method of analysis.
  
  We would like to clarify that the manuscript does not present two separate methods, but rather one method presented in two steps: a first step with results that are directly tied to the experimental measurements and show whether the effect is present for each phage, followed by a second, analytical step that makes the results comparable across phages.
  
  The first step presents the ratios because they directly reflect the measurements performed in the experiment and allow the reader to see the effect of the metabolic state for each phage in contrast to its control. We agree that these ratios are time-dependent and therefore not suitable for quantitative comparison between phages. Their purpose is to illustrate the experimental outcome and to show that the effect is present (or absent) on a per-phage basis not to compare magnitudes across phages.
  
  We then follow this with the second step, allowing the reader to follow the logic of the analysis. The analytical step that follows does not represent a second method, but a continuation of the same analysis. Here, we remove the time-dependence specifically in order to make comparison of the effect across phages possible, by connecting our results to standard measures such as the adsorption rate η. Importantly, P(0) is measured for every phage in every experiment. The only modeling assumption used (a standard one in the field) is the exponential form for the decay in free phage number, which naturally yields P(t)/P(0) = 1 at t = 0.
  
  Regarding the reviewer’s concern that bacterial growth may not have been negligible over the relevant time window, we note that recent work on rich-to-minimal growth lags in E. coli reports substantial delays before growth resumes after nutrient downshift. One 2023 study (Wu et al., Nature Microbiology 2023, https://doi.org/10.1038/s41564-022-01310-w) considering wild-type E. coli shows in Fig. 2c a lag of up to about 2 hours after a shift from MOPS minimal medium with 0.2% glucose plus 18 amino acids to the same medium without amino acids. Another 2023 study (Zhu and Dai, Nature Communications 2023, https://doi.org/10.1038/s41467-023-36254-0) examining both rel+ and rel− strains reports a growth lag of about 49 minutes for rel+ and more than 5 hours for the relA deletion strain. While these conditions are not identical to ours, they support the general point that growth does not immediately resume after such shifts. We therefore think it is unlikely that, following transfer from YT, the cells underwent one or a few full generations during the time window of our adsorption measurements.
  
  On the related note: Following the comments of all reviewers on Figure S3, we have decided to remove it to avoid confusion.
  
  (2) The data were able to distinguish phages that successfully infected bacteria and those that remained free in the medium, and the authors appropriately interpreted the data as such throughout the Results section. However, in the Discussion (starting from the very first sentence, line 172), the authors used terms that include "adsorption" and "entry" more interchangeably (for example, see the three sentences in lines 310-313, for "viral entry efficiency is shaped by [...]", then "adsorption kinetics modeling"). I do not see how the authors' data could distinguish between adsorption (the phage particles attaching to the outside of the cell) and entry (the phage DNA being injected into the cell). Conceivably, any phage particles that irreversibly attach to a cell but do not yet inject their genome into the cell would still be removed from the medium and therefore not quantified. Another example: in lines 189-191, the authors interpreted that "[...] when the bacterium is in a low metabolic state, the phage does not bind irreversibly to the host", but how do the authors eliminate the case of no phage binding (i.e., the reversible step) to begin with?
  
  We agree with the reviewer that our use of the terms adsorption, entry, and infection should have been more careful. Our experiment can only identify the irreversible commitment of phage to a host cell. We have therefore revised the text to refer consistently to phage commitment.
  
  Similarly, in lines 283-293, how do the authors delineate whether energy depletion would increase the k_off term or decrease the k_inj term, because either would result in more free phages in the medium as observed in the data? I believe that the writing of the Discussion, as it stands now, is doing a disservice to the conclusions presented in the Results section.
  
  We thank the reviewer for this important point. We agree that the model would work either by k_off or k_inj being dependent on the host metabolic state, and that our original wording was therefore too restrictive. The data do not distinguish between these possibilities; they only constrain the ratio k_off/k_inj. In the revised text, we therefore formulate the argument in terms of this ratio: if energy depletion leads to reduced commitment, this can arise either because k_off increases, because k_inj decreases, or because both change, as long as k_off/k_inj becomes larger in the inactive case. Put differently, what matters is not which individual rate changes, but that the balance between leaving and committing shifts in a way that disfavors commitment to inactive cells. This also leads to the trade-off now discussed in the revised manuscript: efficient commitment to active hosts requires a small k_off/k_inj, whereas strong discrimination against inactive hosts requires this ratio to become significantly larger in the inactive case. Depending on whether this is achieved through changes in k_off or k_inj, the cost of discrimination appears either as slower commitment or as additional energy dissipation. We agree that the previous wording overstated the mechanistic interpretation, and we have revised the Discussion accordingly to bring it in line with what the Results actually support. Based on the comments from all reviewers, we have also revised the terminology throughout the manuscript: instead of error correction, we now refer to this as a discrimination process, and we replaced k_inj by k_com to reflect that our assay resolves irreversible phage commitment rather than DNA injection specifically.
  
  (3) The authors presented an argument that performing infection of all five phages in the same condition is an advantage, allowing for comparison across different phages. While this goal is a completely valid one, it is difficult to reconcile that with the fact that different phages require different optimal conditions for successful infection. For instance, phage T5 famously requires Ca2+ for successful infection into the host bacterium (and later successful replication); see PMID 13174489. However, all infections were performed in TMG, which lacks Ca2+. Perhaps the absence of T5 dependence on the host metabolism is because the infection condition used by the authors was not optimal for T5 to begin with? Similar arguments could be made for other phages.
  
  Our study alone cannot eliminate that possibility. However, we have cited multiple previous studies, for example references citing Braun et al., showing that T5 remains insensitive to the host metabolic state under different buffer conditions. We therefore believe it is unlikely that the lack of metabolic dependence we observe for T5 is simply due to suboptimal infection conditions.
  
  (4) Whereas the manuscript examined five coliphages, only phage T5 and phage λ were discussed extensively. I believe some discussion points for these two phages need clarification.
  
  We focused our discussion on the phages T5, λ and φ80 because these are the phages for which similar effects have been reported previously in the literature. This allowed us to connect our findings directly to existing work and to discuss mechanistic hypotheses in a meaningful comparative framework. For the remaining phages, to our knowledge no prior studies have examined their behavior under comparable metabolic conditions, and therefore a similarly detailed discussion would have been speculative. Nevertheless, all five phages are treated equally in the presentation of the experimental results and in the quantitative comparison of adsorption rates.
  
  (4.1) Phage T5: The data obtained by the authors show that the infection rate of phage T5 is not impacted by the metabolic state of the host cell. Considering that the authors used the terms "infection", "adsorption", and "entry" interchangeably to refer to the irreversible commitment of a phage to a host cell (see point 2), this discussion regarding phage T5 lacks one critical literature context: DNA entry of phage T5 is known to occur in two phases (first-step transfer and second-step transfer). Critically, the second step can only occur if phage proteins encoded by the phage DNA transferred in the first step are expressed (see PMID 10577483 and the cited papers therein). In that context, metabolic poisoning of the host bacteria should have impeded T5 infection. The authors should comment on this point.
  
  As the reviewer pointed out, our usage of the terms infection, adsorption, and entry should have been more careful. Our experiment can only identify irreversible commitment of phage to a host cell. For T5, we expect that this irreversible commitment already occurs upon first-step transfer of phage DNA. As a result, even if second-step transfer is impeded under metabolic poisoning, our method would not resolve that effect. We have added this clarification to the revised manuscript.
  
  (4.2) Phage λ: The experiment using phage λ in this current study shares many resemblances to that in Brown et al. 2022. That feature alone is not a problem, but at many places in the text, the writing is ambiguous as to whether it is discussing the results in Brown et al. 2022 or in the current manuscript. I am giving three examples below, but this is not exhaustive: (i) Lines 67-69, there is no Brown et al. 2022 reference immediately after "a mutant phage variant (λh) could bypass this dependency [...]" (not just in the previous sentence); (ii) Line 228 should clearly say "Our previous findings suggested that phage λ is capable of [...]", since it concerns Brown et al., 2022, not the current study; and (iii) Lines 245-246, there is no Brown et al., 2022 reference immediately after "we observed that a mutant variant [...] even energy-depleted host" (without a reference, it reads like the authors "observed" that finding in this current manuscript).
  
  The reviewer is right. In those places, the text was ambiguous as to whether it referred to the present study or to Brown et al. (2022). We have now inserted the reference at the relevant points and revised the wording where needed to make this distinction explicit.
  
  Also, regarding phage λ: The discussion between line 230 and line 249 is very interesting, but since it concerns the differences between λ PaPa and Ur-λ, the authors should consider mentioning and discussing a very relevant recent study, PMCID: PMC6312755.
  
  We agree that the study by Guan et al. is very relevant and interesting. However, our point in this part of the Discussion is only to clarify that we used λ PaPa and not the originally isolated λ strain. We have therefore limited the discussion here to that distinction.
  
  (5) Control experiments, or references to prior studies, are needed to support that the As/Az treatment at this concentration and duration (at least 10 minutes) is sufficient to deplete the metabolic state of the cell. For instance, this can be shown by impeded or null cell growth, arrested motility (using a standard swimming assay), or a fluorescent reporter for the energetic state of the cell.
  
  The 10-minute treatment used here was chosen based on prior work showing that arsenate–azide rapidly inhibits cellular energy metabolism and is sufficient to eliminate the hyperdiffusion of the λ receptor (Winther et al., Biophysical Journal 2009, http://dx.doi.org/10.1016/j.bpj.2009.06.027) where the effect was assessed by monitoring the rate of movement of the λ receptor on the bacterial surface. We have clarified this point in the revised manuscript.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  As mentioned earlier, I found the paper interesting and addressed an important and significant knowledge gap.
  
  My biggest concern is about the interpretation of the experimental data in light of the two-step model. In particular, around line 286, it is stated "k_inj is more sensitive to metabolic state than k_off". Assuming k does not depend on metabolic state, which is a fair assumption, the equation for eta only depends on the ratio between k_inj and k_off and not on the individual parameters separately. Consequently, there is no way of saying which one of the two is more affected by metabolic state, unless the model already assumes that k_off is not influenced by metabolic state. The results could equally be explained by k_inj decreasing in metabolically depleted cells, or k_off increasing in such cells. If this is an assumption of the model, this should be clearly stated and not reported as a consequence of the data, as it is at the moment. Also, how does this mathematical model connect to the fitting function used in Figure 2b?
  
  We thank the reviewer for this important point. We agree that the model would work either by k_off or k_inj being dependent on the host metabolic state, and that our original wording was therefore too restrictive. The data do not distinguish between these possibilities; they only constrain the ratio k_off/k_inj. In the revised text, we therefore formulate the argument in terms of this ratio: discrimination requires that k_off/k_inj be larger for inactive hosts than for active hosts, such that commitment is specifically reduced in the inactive case. Put differently, what matters is not which individual rate changes, but that the balance between leaving and committing shifts in a way that disfavors commitment to inactive cells. This introduces a trade-off: efficient commitment to active hosts requires a small k_off/k_inj, whereas strong discrimination requires this ratio to become significantly larger for inactive hosts. If this is achieved through changes in k_off, discrimination comes at the cost of slower commitment by allowing more time to leave; if it is achieved through changes in k_inj, it can preserve fast commitment to active hosts but requires additional energy dissipation in order to actively modulate commitment. We have therefore revised the text accordingly to frame the argument in terms of this trade-off, rather than attributing the effect specifically to k_inj. Based on the comments from all reviewers, we have also revised the terminology throughout the manuscript: instead of error correction, we now refer to this as a discrimination process, and we replaced k_inj by k_com to reflect that our assay resolves irreversible phage commitment rather than DNA injection specifically.
  
  I have a related experimental criticism. The kinetic model presented assumes an exponential decay of free phage, which is a commonly used assumption in the phage literature. Given that the phage types used in this study lyse relatively slowly, it would be good to actually see adsorption curves, in which free phage is measured at different time points between inoculation and lysis. This data would not only provide useful evidence for the kinetic model, but it should also replace what is now in Figure S3, which consists of fitting one experimental point with one line. As it currently stands, Figure S3 is not useful actually misleading.
  
  We appreciate the reviewer’s point. We agree that adsorption curves, in which free phage is measured at different time points between inoculation and lysis, would provide a stronger basis for evaluating the kinetic model. However, we do not have the resources to perform these additional experiments within the scope of the present study. Following the comments of all reviewers on this point, we have therefore decided to remove Figure S3 to avoid confusion.
  
  Finally, it is not clear to me why the quantity "Ratio" has been chosen to be presented in Figure 1, rather than the ratio of estimated adsorption rates eta'/eta, which is much more intuitive for a phage study and contains the same information. I would recommend switching to this choice, unless there is a clear rationale for why the quantity "Ratio" is more useful/effective. Showing eta'/eta would also increase the readability of Figure 1, as it would move the y-axis to a logarithmic scale and better visualize values around 1.
  
  We used “Ratio” in Figure 1 to illustrate the experimental design, controls, and measured quantities directly, as it more transparently reflects the data collected. In the second part of the analysis, where we compare time-independent adsorption rate estimates, we have presented the corresponding values of η′/η as suggested.
  
  Minor comments:
  
  (1) Introduction
  
  Line 31: "... such as nutrient limitation, fluctuating temperatures, and variable energy availability" - if drawing a distinction between energy availability and nutrient limitation, please make explicit what this distinction is. Energy availability seems like a natural consequence of nutrient availability.
  
  While energy and nutrient availability are often linked in E. coli, they represent distinct physiological constraints. Nutrient limitation refers to the lack of essential biosynthetic precursors such as nitrogen, phosphorus, or amino acids. Energy availability, in contrast, reflects the cell’s ability to generate ATP and reducing equivalents through metabolic processes. For example, under anaerobic conditions, E. coli may have ample nutrients but limited energy production due to the lower efficiency of fermentation compared to aerobic respiration. Thus, energy limitation can occur independently of nutrient limitation.
  
  (2) Results
  
  (a) Whole Section: Please label equations.
  
  All equations have now been labelled in the revised manuscript.
  
  (b) Lines 105 to 114: As stated in Major Comments, I think the clarity of the paper would be improved by introducing the relative adsorption rate here and dropping the concept of Ratio entirely. However, if the authors wish to use Ratio, I would recommend the following:
  
  Lines 105 to 109 are confusing to read because of the number of connectives: "... ratio of free viruses from permissive AND resistant hosts respectively TO the free viruses in buffer under energy-depleted AND energy competent conditions". This would be clearer if each quantity were given an algebraic symbol, and RP, RR, and Ratio were defined through formal algebra, rather than mixed mathematical and sentence notation.
  
  This section has been rewritten for clarity. We now introduce explicit algebraic symbols and define the quantities formally, which removes the ambiguity present in the sentence-only description while retaining the intended meaning.
  
  The chemical names "arsenate" and "azide" should appear in the body of the text before they appear abbreviated in an equation. Please state at this point that these are both metabolic inhibitors, as it is not immediately clear what role they play or why you are using them.
  
  The text has been updated to introduce arsenate and azide by name before the abbreviations are used, and we now explicitly note that they act as metabolic inhibitors.
  
  On line 114, the authors helpfully provide an interpretation of Ratio = 1. It would be useful to provide at the same time interpretations of Ratio >1 and <1, perhaps 2 and 0.5 specifically?
  
  We have added brief explanations illustrating the interpretation of Ratio values greater than and less than 1, including examples of 2 and 0.5.
  
  I would consider giving this quantity a more interpretable name than Ratio. This quantity represents how much a bacteriophage preferentially adsorbs to metabolically active cells, so perhaps "Selectivity" or "Adsorption Bias"?
  
  We intentionally retained the generic term “Ratio”, as this quantity reflects an intermediate experimental measure used to describe the process rather than a newly defined metric. Its purpose is to bridge the experimental observations and the subsequent quantification of effects on the adsorption rate (η).
  
  (c) Lines 117 to 122: the authors sometimes refer to ratios explicitly, "average ratio of around 1.6" and other times say e.g., "a greater than 3 times increase in viral particles". Using more consistent language (saying "Ratio" every time) would be clearer.
  
  We have standardized the terminology in this section and now refer to all fold-changes consistently using “Ratio” to avoid ambiguity.
  
  (d) Figure 1
  
  Phages λ and T6 look like they have ratios less than 1 for resistant cells? If this is true / if the ratio is statistically significantly below 1, please comment.
  
  The ratios for λ and T6 are not statistically different from 1. The apparent deviation is within the standard error of the mean. To make this clearer, we have added the corresponding p-values to Table S2 in the Supplementary Information.
  
  Ratios near 1 are difficult to distinguish from 1, especially in panels A and D. Using a logarithmic scale on the y-axis would make the plots more readable.
  
  Because the values in these panels are not statistically different from 1, changing to a logarithmic scale would not alter the interpretation. We therefore retained the current axis scaling to reflect that there is no meaningful deviation from 1 in these cases.
  
  The data corresponding to individual experiments have no error bars. Given that the number of free virions was determined by plaque assay, which carries an intrinsic sampling error, this uncertainty should be reflected in the plots.
  
  We thank the reviewer for this important comment. Because plaque assays have compound sources of stochastic variation, assigning a per-measurement error bar would risk implying false precision. For this reason, we present the values from each biological replicate directly, and the uncertainty is represented in the statistical summary across replicates. Specifically, for each phage and condition we show the three independent experimental measurements and report the mean along with the standard error of the mean. This approach allows us to represent biological variability without implying a precision that cannot be accurately quantified at the level of single plaque counts.
  
  Similarly, the average value does show error bars, but it is not stated what these error bars correspond to: standard error in the mean, standard deviation of the sample, or combined uncertainty?
  
  The caption has been updated to state that the error bars represent the standard error of the mean.
  
  The resistant bacteria seemed to have ratios close to 1 in all cases. Is this because very few virions adsorbed under both energy conditions?
  
  Resistance is commonly associated with a lack of a surface receptor for the phage (or generally an entry pathway). We use the resistant bacteria as a control group for the effect of the conditions on adsorption. For resistant bacteria, the Ratio should be 1 since virions do not adsorb under both energy conditions. Any slight variations from 1 should come from sampling errors or small heterogeneity in the population.
  
  (e) Figure 2
  
  Please comment on what the error bars here represent. Error bars in Figure 2 A seem to permit negative (or at least zero) values of relative adsorption rate for phages m13 and T6, possibly implying an overestimate of the error? If it is the case that multiple values used to calculate the mean are far apart, possibly showing the values individually through a superimposed swarm plot would be clearer.
  
  This point is now addressed in the Supplementary Information, where we clarify how the error bars were calculated.
  
  (3) Discussion
  
  (a) Line 189: "high metabolic state" is imprecise. Say "energy-competent" to be consistent with earlier language.
  
  To maintain continuity with earlier terminology, we now include “energy-competent” in parentheses alongside “high metabolic state,” while retaining the original phrasing for readability.
  
  (b) Figure 3, population level
  
  Show adsorbed virions physically attached to bacteria, rather than removing them completely from the image, as currently, the implication is that at a high metabolic state, there are fewer virions total, not fewer virions remaining in solution because more are adsorbed. You could go as far as to add a third "after centrifuging" row, showing the adsorbed phages stuck in the pellet and the unadsorbed phages remaining in solution.
  
  Thank you for this suggestion. Figure 3 has been updated to depict adsorbed virions attached to bacterial cells, clarifying that the decrease represents adsorption rather than loss of total particles. This change improves the accuracy and interpretability of the schematic.
  
  (4) Methods and Materials
  
  (a) Figure 5
  
  The step "estimate cell numbers from OD" appears to follow incubating plates overnight. If the cells you are counting come from the pellet produced by centrifuging 3 steps prior, you could add a fork into the black line connecting the steps, with one branch corresponding to the supernatant and phages, and the other to the pellet and cells?
  
  Thank you for pointing this out. The order in the figure has been corrected: cell numbers are estimated from OD before overnight incubation. This resolves the confusion without the need for branching in the workflow diagram.
  
  (a) Line 332
  
  You allow as much time as possible for adsorption without the possibility of lysis. Did you determine the lysis times / latent periods of these phages through one-step-growth-curves, or use published results, in which case please cite? Having obtained the lysis time by either method, what fraction of the lysis time did you allow for adsorption? Also, please add supplementary tables with lysis times used for the different phages.
  
  We thank the reviewer for this comment. We used published latent-period values as guides and verified compatibility with our own system when selecting incubation times. We have clarified this in the text and added the relevant citations. We did not use a common fixed fraction of the lysis time for all phages; instead, incubation times were chosen to allow sufficient time for adsorption but not for completion of the first lytic cycle. For λ, productive lytic development was blocked in the host background used, as in Brown et al., PNAS 2022, http://dx.doi.org/10.1073/pnas.2106005119. For ϕ80 and T5, we used published latent-period values as guides and verified their compatibility with our own system (De Paepe and Taddei, PLoS Biology 2006, http://dx.doi.org/10.1371/journal.pbio.0040193). M13 is a chronic filamentous phage and therefore does not have a standard lytic latent period; in our host–phage combination, it required more than 1 h before phage release. For T6, we relied primarily on the kinetics observed in our own system, since adsorption was unusually slow for this phage–host pair under our assay conditions. Although literature reports describe shorter T6 latent periods under specific assay conditions (Foster and Johnson, Journal of General Physiology 1951, http://dx.doi.org/10.1085/jgp.34.5.529), this is consistent with published work showing that adsorption and infection kinetics can vary substantially with host background, surface structure, and experimental conditions (Heller and Braun, Journal of Bacteriology 1979, http://dx.doi.org/10.1128/jb.139.1.32-38.1979; Storms et al., Biochemical Engineering Journal 2012, http://dx.doi.org/10.1016/j.bej.2012.02.010).
  
  (5) Supplementary
  
  Figure S1
  
  This data is useful in understanding the main body of the paper, and I think this should form part of a main figure (possibly with the individual experimental data points superimposed over the bars). This could come before or as part of Figure 1?
  
  We thank the reviewer for this suggestion. We have explored including these data directly in the main figure but found that doing so substantially reduced the readability of the figure, as the underlying table is visually dense. For this reason, we chose to summarize the results in Figure 1 and present the detailed data separately in Figure S1 of the Supplementary Material, along with the Ratio analysis, which more effectively conveys the trends without overloading the main figure.
  
  Reviewer #2 (Recommendations for the authors):
  
  Minor comments:
  
  (1) L16-18: This sentence could be made more accessible as 'error correction' is not an intuitive term in the phage field.
  
  We have updated the overall theory section including the terminology. Instead of error correction, we now refer to it as a discrimination process.
  
  (2) L96-98: Does this potentially indicate a trade-off where evolution for stronger binding cannot evolve at the same time as responsiveness to metabolic activity?
  
  We agree that this sentence made a stronger evolutionary claim than our data support. Since we only tested four laboratory phages, we cannot conclude that there is an evolutionary trade-off between stronger binding and responsiveness to host metabolic activity. We have therefore removed this sentence to avoid making an unsupported evolutionary interpretation.
  
  (3) L102: What does 'post-cellular' mean?
  
  Postcellular supernatant is simply the liquid that remains after cells have been removed. During centrifugation, the cells pellet at the bottom, and the liquid above (which can contain viruses) is the postcellular supernatant.
  
  (4) L105-107: Worth splitting into two sentences as it is a bit unclear if ratios are built between permissible and resistant hosts or between buffers or both.
  
  Thank you for the suggestion. We have rewritten this section into two sentences to clarify how the ratios are constructed, and we hope the revised wording improves readability.
  
  (5) L110-122: Figures S1 and S2 could be referenced here.
  
  References to Figures S1 and S2 have now been added in this section.
  
  (6) L137: As P(0) is the viral concentration in buffer, I am assuming that the phage lysate has been diluted in buffer and phages have been added to cultures from the same dilution tube to guarantee equal starting numbers, but I couldn't find this in the methods.
  
  This clarification has been added to the Methods and Media section of the Supplementary Information.
  
  (7) L243: It would be worth defining what 'hyperdiffusion' means.
  
  We have added a brief definition of “hyperdiffusion”.
  
  (8) L253-256: I do not entirely follow this explanation.
  
  We thank the referee for pointing out this lack of clarity. This was also raised by Reviewer #3. The point we intended to convey is that λ behaves differently toward E. coli LamB depending on whether it is on a living cell or isolated in buffer, but makes no such distinction for Shigella LamB, binding it in both contexts. More specifically, previous work showed that wild-type E. coli extracts could only inactivate λ in the presence of added solvents, whereas control extracts prepared similarly from Shigella did not require added solvent for λ inactivation. This observation is consistent with E. coli LamB requiring a specific state to irreversibly bind λ. We therefore meant to suggest that the capacity for metabolic-state sensing is not simply a function of phage identity, but also depends on receptor-specific properties that differ between the two bacterial species.
  
  We have rephrased it as follows: Notably, wild-type λ is inactivated by E. coli K-12 extracts only when solvents are added, whereas Shigella extracts inactivate λ without this requirement (Randall-Hazelbauer and Schwartz, J. Bacteriol. 1973; Schwartz, J. Mol. Biol. 1975; Schwartz and Le Minor, J. Virol. 1975). This suggests that E. coli LamB requires a specific state for irreversible binding, a conditionality absent in Shigella LamB, indicating that the capacity for metabolic-state sensing may depend on receptor-specific properties.
  
  (9) L284: Why is k_inj necessarily more sensitive to the metabolic state than k_off? Could membrane changes under stress increase k_off?
  
  We thank the reviewer for this important point. We agree that the model would work either by k_off or k_inj being dependent on the host metabolic state, and that our original wording was therefore too restrictive. The data do not distinguish between these possibilities; they only constrain the ratio k_off/k_inj. In the revised text, we therefore formulate the argument in terms of this ratio: reduced commitment in inactive cells can arise through an increase in k_off, a decrease in k_inj, or both, as long as k_off/k_inj becomes larger in the inactive case. What matters is therefore not which individual rate changes, but that the balance between leaving and committing shifts in a way that disfavors commitment to inactive cells. This also underlies the trade-off now discussed in the manuscript: efficient commitment to active hosts requires a small k_off/k_inj, whereas strong discrimination against inactive hosts requires this ratio to become much larger in the inactive case. We have revised the Discussion accordingly to bring it in line with what the Results actually support. Based on the comments from all reviewers, we have also revised the terminology throughout the manuscript: instead of error correction, we now refer to this as a discrimination process, and we replaced k_inj by k_com to reflect that our assay resolves irreversible phage commitment rather than DNA injection specifically.
  
  (10) Figure 1: There seems to be more variation between replicates in phage Lambda than in other phages. Is this caused by receptor number heterogeneity in the population?
  
  Unfortunately we do not have a way to compare receptor number heterogeneity across the different phage receptors in our experiments. We therefore cannot conclude that the larger variation observed for phage λ is caused by receptor number heterogeneity in the population.
  
  (11) Figure S1: There seems to be a significant difference between phage Lambda viability in the two buffers - do the authors have an idea where this comes from?
  
  There is no difference in λ viability between the two buffers. The apparent difference in the figure is due to sampling variability.
  
  (12) Figure S3: Last sentence of the legend probably shouldn't say 'upper'.
  
  Following the suggestions from all of the reviewers we have removed Figure S3 as it created more confusion than clarity.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) The text reads as incomplete in some places. Can the authors please provide clarifications on the following points?
  
  (1.1) Lines 235-256: How do the authors draw a conclusion that "a phage can detect host metabolic status" from a study that used purified LamB receptors (i.e., no live cells with any metabolism) extracted from two different bacterial species (i.e., not a difference in metabolic states)?
  
  We thank the referee for pointing out this lack of clarity. This was also raised by Reviewer #2. The point we intended to convey is that λ behaves differently toward E. coli LamB depending on whether it is on a living cell or isolated in buffer, but makes no such distinction for Shigella LamB, binding it in both contexts. More specifically, previous work showed that wild-type E. coli extracts could only inactivate λ in the presence of added solvents, whereas control extracts prepared similarly from Shigella did not require added solvent for λ inactivation. This observation is consistent with E. coli LamB requiring a specific state to irreversibly bind λ. We therefore meant to suggest that the capacity for metabolic-state sensing is not simply a function of phage identity, but also depends on receptor-specific properties that differ between the two bacterial species.
  
  We have rephrased it as follows: Notably, wild-type λ is inactivated by E. coli K-12 extracts only when solvents are added, whereas Shigella extracts inactivate λ without this requirement (Randall-Hazelbauer and Schwartz, J. Bacteriol. 1973; Schwartz, J. Mol. Biol. 1975; Schwartz and Le Minor, J. Virol. 1975). This suggests that E. coli LamB requires a specific state for irreversible binding, a conditionality absent in Shigella LamB, indicating that the capacity for metabolic-state sensing may depend on receptor-specific properties.
  
  (1.2) Line 270, in the abstract, and in the caption of Figure 4: The authors described the model using terms such as "an error-correction mechanism" or "standard error correction", but there is little explanation. Can the authors clarify what kind of "error" is discussed here, and how it is "corrected"? In the "standard error correction" model, what determines which method of correction is "standard"? If "error correction" is a standard term in phage-bacterial interaction modeling, please provide references.
  
  We agree with the reviewer that our use of the term error correction was not appropriate in this context. The proper term is discrimination process rather than error correction. We have now corrected this terminology throughout the manuscript and clarified the underlying logic in the relevant sections.
  
  (1.3) Line 301: The authors speculated that phage T5 is "better suited to ecological niches", but I am not sure how that is consistent with their data showing T5 is more rampant, that they infect both energy-competent and energy-depleted cells, not just depleted cells. Why "niches", and why are T5 better suited to environments "where energy-limited cells dominate", not just any environment?
  
  We agree that this point was not stated clearly enough. What we intended to convey is that T5 would be at a net disadvantage in a niche containing a mixture of energy-competent and energy-deficient hosts. We have updated the main text accordingly.
  
  (1.4) Line 303, and related to point 6.3. above: Phage λ can also infect and replicate in "starved bacterial cells" (shown in Kourilsky 1974 and Geng et al. 2024, both of which were cited in this manuscript). How do the authors reconcile these reports with the discussion point in line 303, and their data that only phage T5, but not λ, shows insensitivity to the host metabolic state?
  
  Our data do not imply that phage λ is unable to infect starved bacteria. As shown in Kourilsky (1974) and Geng et al. (2024), λ can indeed infect and replicate in nutrient-limited cells. Our results specifically indicate that λ infection under starvation proceeds with a reduced adsorption rate, while T5 maintains the same adsorption rate even when the host is starved. Thus, our conclusion is that T5 is insensitive to the host metabolic state at the level of adsorption, whereas λ is not. We acknowledge that the wording in line 303 may have unintentionally led to confusion, and we have revised this part of the text to avoid that.
  
  (2) The following comments relate to the text and figures in the manuscript. There are many places in the manuscript that could use fine proofreading and copy-editing for clarity and consistency. For example:
  
  (2.1) If I understand it correctly, the equation in between lines 109 and 110 should be clarified using terms such as "Free viral particles after mixing with bacteria in Arsenate and Azide" and "Free viral particles in bacteria-free buffer with Arsenate and Azide". As it stands, it is not clear which terms correspond to conditions where bacteria are present.
  
  The equation has been updated to explicitly indicate which terms refer to mixtures containing bacteria and which refer to bacteria-free controls, so that the correspondence between conditions is now clear.
  
  (2.2) Equations in between line 276 and 283, and elsewhere: Some concentration terms are enclosed in brackets ("[BP]"), while most are not.
  
  This notation has been clarified. We now use “[PB]” specifically to denote the transient phage–bacterium complex, distinguishing it from the product P⋅B. All other concentration terms are written without brackets for consistency.
  
  (2.3) Figure 4 and in equations: "BP" or "PB"?
  
  The notation has been made consistent throughout; we now use “PB” exclusively to denote the phage–bacterium complex.
  
  (2.4) Line 284 and line 286: The "inj" in "k_inj" is sometimes italicized, sometimes not.
  
  The notation has been standardized so that k_inj is now formatted consistently throughout the manuscript, without italicizing “inj.” Also we have replaced k_inj by k_com to reflect that our assay resolves irreversible phage commitment rather than DNA injection specifically.
  
  (2.5) Figure 5: Was the step "Estimate cell numbers from OD" really performed on the next day after the experiment (i.e., >12 hours after infection and phage plating), not immediately after cell washing?
  
  Thank you for pointing this out. The figure has been updated to reflect the correct order of steps: cell numbers are estimated from OD immediately after washing, followed by overnight incubation of the plates.
  
  (2.6) Figure S1: As it stands now, the x-axis of each panel can be read either as "Permissive, Resistant bacteria, Buffer" (missing "bacteria" for the first pair of bars), or "Permissive (bacteria), Resistant (bacteria), Buffer (bacteria)" (extra "bacteria" for the last pair of bars).
  
  The intended interpretation is the second one (permissive bacteria, resistant bacteria, buffer).
  
  (2.7) Figure S3: The panel letters "A" and "B" are missing in the figure. Also, it is not clear why the legend for the five phages and the legend for the measurement times are not combined.
  
  Following the suggestions from all of the reviewers we have removed Figure S3 as it created more confusion than clarity.
  
  (2.8) Strain table in the Methods and Materials: Please write genotypes with italicization, and consistently indicate mutations and deletions with the minus sign superscript or the Δ prefix. Also, for the S3222 strain: Is it really the entire Mal regulon mutated ("Mal-"), or just lamB-? In Brown et al. 2022, it was only the latter.
  
  Genotypes have been reformatted with consistent notation. For S3222, the correct designation is Mal-, as in the SI of Brown et al. 2022. In this case, Mal- is intended as a phenotypic designation rather than a specific genotype, and we have therefore formatted it accordingly, i.e. neither italicized nor written in lower case.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.30.651438v3
www.biorxiv.org www.biorxiv.org

An overexpression platform reveals the functional diversity of human KRAB-Zinc Finger Proteins in maintaining cellular homeostasis

2
1. EMBOpress 09 Jul 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  __ General Statements__ We thank the reviewers for their thoughtful and constructive evaluations of our work. We are particularly encouraged that both recognize the value of this study as a scalable and systematic framework for the functional exploration of the human KZFP family and agree that the resource generated here will be of broad interest to the KZFP, transposable element, and genome regulation communities. Reviewer 1 explicitly notes that "the screening framework itself represents a potentially useful resource for prioritizing candidate KZFPs for downstream study" and that "the study may nonetheless serve as a useful starting point for future investigations into KZFP biology and transcriptional regulation." Reviewer 2 similarly emphasizes that "the authors provide an efficient and valuable screening platform that can identify promising candidates for further investigation" and that "the methodological advance represents the primary contribution of the work."
  
  We can only concur with these assessments. The principal goal of this study was not to elucidate the physiological roles of all or even a subset of individual KZFPs, but rather to provide a scalable framework that enables their systematic prioritization and generates experimentally testable hypotheses regarding their functions. To support our argument, we ventured into some mechanistic analyses, but these could not pretend to be complete and definitive. In that respect, we hear the reviewers when they note that the original manuscript does not always sufficiently distinguish candidate discovery from mechanistic validation. In its revised version, we will therefore more clearly frame the inducible K562 overexpression assay as a standardized and sensitive readout of regulatory potency rather than as a direct surrogate of physiological function. Within this framework, K562 fitness defects are interpreted as a quantitative measure of the extent to which ectopic KZFP expression perturbs transcriptional homeostasis in a controlled cellular context, while the direct targets and transcriptional networks identified through our integrative analyses are presented as hypotheses to be tested in more physiologically relevant systems. Accordingly, the revised manuscript preserves the broad scope and resource aspect of the study while incorporating additional experimental validation, expanded methodological descriptions, and a more cautious interpretation of the proposed biological functions of the selected KZFPs.
  
  __Although this document is submitted as a Revision Plan, we have already incorporated a substantial number of revisions into the transferred manuscript. In particular, we have implemented most of the presentation, methodological, and conceptual modifications requested by the reviewers, including clarification of the scope of the study, extensive revisions of the Results and Discussion, expanded Materials and Methods, and numerous figure and text corrections. These revisions are detailed in Section 3 ("Description of the revisions that have already been incorporated into the transferred manuscript"). __
  
  The remaining points requiring additional experimentation or more extensive analyses are described in Section 2 ("Description of the planned revisions").
  
  __ Description of the planned revisions__
  
  Reviewer 1 Major comment 1
  
  “Finally, several aspects of the data presentation are currently difficult to reconcile. In Fig. 1D, the meaning of the purple category is unclear, and the percentage scaling on the x-axis is difficult to reconcile with the cumulative values displayed. For instance, the sum of all the bars would not reach 100%, as the values of the bars span percentages up to 4% at most (for 105 MYO KZFPs) according to this plot. Similarly, the reported numbers of TE-binding KZFPs in Fig. 1E-F and Fig. S1D appear internally inconsistent and should be clarified. Specifically, 53+14=67 KZFPs are reported to bind TEs in total, yet a larger number of KZFPs appears associated with individual TE families (e.g., 86 for LTR.ERV1). If the values shown correspond to percentages rather than absolute counts, this should be explicitly clarified in both the figure and legend. In addition, Fig. S1D appears inconsistent with the counts reported in Fig. 1E-F, as only 5 out of the 53 toxic KZFPs displayed in the plot show no enrichment for any of the highlighted TE families.”
  
  We thank the reviewer for this insightful comment, which has helped us identify areas where the presentation of our data can be substantially improved. We agree that the current presentation of the TE-binding analyses could be clearer and that revising these figures will improve both their readability and the overall consistency of the manuscript. In the revised manuscript, we will clarify the apparent inconsistencies in the presentation of the TE-binding KZFP analyses and revise the corresponding figures and legends accordingly. Importantly, these inconsistencies do not arise from errors in the underlying data but rather from an insufficient explanation of the statistical enrichment analyses and the way the results are represented. We will therefore redesign the relevant figures and expand their legends to more clearly describe the analytical approach, the enrichment criteria, and the interpretation of the results. We believe that these revisions will improve the clarity, transparency, and internal consistency of the manuscript, allowing readers to more readily interpret the TE-binding analyses. Minor comments of the reviewer 1 were extremely useful to detect mistakes and we are grateful for that. All the modifications that were asked see below were included in the manuscript.
  
  Reviewer 1 Major comment 2
  
  “Finally, while the proteomics results aimed at identifying SCAN-dependent interactors are of interest, several aspects of the experimental design and data analysis remain unclear. In particular, it is not specified whether the experiment was performed in biological replicates or as a single measurement. This is important, as it directly affects how the data can be interpreted and how stringent downstream filtering can be. In the Results section, the authors state that "we identified a set of SCAN-dependent interactors, i.e., proteins that co-immunoprecipitated with the full-length construct but were absent in controls and lost upon deletion of the SCAN domain," which suggests a relatively binary, "presence/absence" filtering strategy. However, this description does not specify whether any quantitative threshold (e.g., enrichment ratio) was applied when comparing full-length constructs to deletion mutants. In contrast, the Methods section states that "proteins lacking signal above background were excluded and proteins were additionally required to show stronger signal in at least one bait condition than in GFP controls based on heatmap clustering (see script)," which instead suggests that a threshold-based criterion was used to define enrichment relative to controls and deletion mutants. If this is the case, the exact criteria and thresholds used for filtering should be clearly stated and consistently reported between the Results and Methods sections. If replicate measurements were not performed, this should be explicitly acknowledged, as peptide-level variability may substantially influence the identification of high-confidence interactors, particularly if the applied cutoffs are not highly stringent.”
  
  We agree that a more detailed description of the experimental design and analysis strategy, together with additional validation, will strengthen the interpretation of the proteomic data. In the revised manuscript, we expanded the Results and Materials and Methods sections to provide a clearer and more quantitative description of the filtering strategy, including the enrichment criteria and thresholds used to define SCAN-dependent interactors. To further strengthen these findings, we propose to perform an independent biological replicate of the co-immunoprecipitation mass spectrometry experiment. This additional experiment will increase confidence in the identified SCAN-dependent interactors and further support the conclusions drawn from the proteomic analysis.
  
  Reviewer 1 Minor comments
  
  In Fig. 2A, readability could be improved by adjusting the layering of points, as the darker dots (in particular the red ones) are currently obscured by lighter ones. Alternatively, removing the outline of the points (which is not transparent) may also improve visibility, but in that case the legend for point size would need to be updated accordingly.
  
  Thank you for this helpful suggestion. We will revise Figure 2A to improve its readability by reworking the layering of the points in accordance with the reviewer's recommendation. We will also evaluate the point outlines and, if appropriate, remove them and update the point-size legend accordingly to ensure the figure is clear and easy to interpret.
  
  Reviewer 2 – Major comment
  
  “- The authors looked at available chromatin data in either K562 cells or HEK293 cells, which I think is a very good way of utilizing publicly available data. Since the authors showed that different KZFPs might be functionally relevant in different cell types/tissues, I was wondering if they checked if there is available ChIP Seq or CUT&RUN data in those specific cell types/tissues. If yes, that data should be included in the manuscript.”
  
  We agree that integrating KZFP binding data generated in biologically relevant cell types or tissues would further strengthen the proposed regulatory models. As described in the revised manuscript, we have already adopted this approach for ZNF43 by integrating chromatin landscape data from thymus and liver, where suitable datasets were available.
  
  To further address this point, we propose to systematically explore publicly available ChIP-seq, CUT&RUN, CUT&Tag, and related chromatin profiling datasets for the other KZFPs investigated in this study. Where suitable datasets are available, these analyses will be incorporated into the revised manuscript to further support the proposed tissue-specific regulatory models and provide additional biological context for the identified target genes.
  
  __ Description of the revisions that have already been incorporated in the transferred manuscript__
  
  Reviewer 1 Major comment 1
  
  “The large-scale overexpression screen represents the foundation of the manuscript and provides a potentially valuable resource for prioritizing candidate KZFPs for downstream study. However, several aspects of the experimental setup and data presentation currently limit the interpretation of the reported proliferation defects. First, key details regarding the screening workflow remain unclear. While the Methods section describes the overall procedure, it is difficult to determine when cells were seeded relative to doxycycline induction, in which plate format the cells were maintained throughout the experiment, and whether medium exchange was performed during the 9-day assay. These points are particularly relevant given the use of suspension K562 cells (which can complicate medium exchange in a 96-well plate format and make long-term culture more difficult to control) and a metabolic viability readout (PrestoBlue), as differences in nutrient depletion or overgrowth could also influence the signal independently of reduced proliferation or toxicity. Additional clarification regarding seeding density, timing of induction, plate format, culture handling throughout the assay, and whether cell morphology/density was visually monitored would substantially improve interpretability and reproducibility. Second, it is unclear whether the observed proliferation phenotypes may be influenced by differences in transgene expression levels or integration effects. Were all constructs validated for comparable expression following induction? In the absence of such controls, it remains difficult to determine whether the reported phenotypes reflect specific KZFP activities or differences in overexpression efficiency. While it may not be possible to conclusively distinguish KZFP-specific effects from toxicity associated with high transgene expression levels, this limitation should at least be acknowledged. In addition, the possibility that some phenotypes may be influenced by transgene integration effects should also be considered. Unless independent transductions were validated for the KZFPs classified as toxic, it remains difficult to exclude integration-site-specific contributions to the observed proliferation defects. Third, the normalization strategy would benefit from additional clarification. In Fig. S1A, the LacZ control appears variably affected by doxycycline treatment across plates, whereas the GFP control appears more stable. Since normalization relies on the mean behavior of both controls within each batch and condition, the authors should clarify whether this variability could influence hit calling.”
  
  We agree that additional methodological details improve the clarity and reproducibility of the screening assay. Accordingly, we substantially expanded the Materials and Methods section to describe the experimental workflow, quality controls, data normalization, and hit-calling criteria. The revised paragraph is reproduced below.
  
  Arrayed overexpression screen
  
  To systematically assess the effect of human KZFP overexpression on cellular fitness, K562 cells were individually transduced with doxycycline-inducible lentiviral vectors encoding 366 human KZFPs. Lentiviral particles were produced as described above and used to transduce cells without MOI calculation. __Instead, a fixed volume of viral supernatant (200µL per × 104 cells in 48 well plate filled with 200ul of RPMI) was used for all transductions to ensure comparable experimental conditions. Transduced cells were selected with puromycin before doxycycline induction. Following puromycin selection (1µg/mL for 3 days), cells were seeded at 20 000 cells per well in 24-well plates filled with 1ml of medium in technical triplicate for each KZFP. Following puromycin selection and prior to doxycycline induction, cell survival was visually assessed as a quality control metric for each KZFP construct (Supp __Table 2____). Doxycycline (1µg/mL) was added immediately after cell seeding to induce expression of the HA-tagged KZFPs. At each time point, metabolic activity was measured using PrestoBlue™ reagent according to the manufacturer's instructions (10µL reagent added to 100µL culture medium, incubated for 3h in a 96 plates). Absorbance was recorded at 570 nm and 600 nm using a plate reader (Hidex Sense Microplate Reader), GFP- and LacZ-expressing control wells were included on every plate to account for plate-to-plate and batch-to-batch variability. Peripheral wells were filled with culture medium to minimize evaporation-induced edge effects. Cells were maintained in RPMI supplemented with 10% fetal bovine serum (FBS) and 1× penicillin–streptomycin, and splited (1/10) with aspiration of the surface medium every three days throughout the assay while maintaining doxycycline at 1µg/mL. Cell proliferation was assessed after 4, 7, and 9 days of induction. and the A570/A600 ratio was used as a surrogate measure of viable cell number and proliferative capacity. For computational normalization, raw A570/A600 values were first background-corrected by subtracting the signal from medium-only controls and then normalized in two steps. First, each value was divided by the mean signal obtained from the GFP and LacZ control wells from the corresponding batch and induction condition to correct for inter-batch variability. Second, the resulting value was normalized to the corresponding −Dox condition for the same KZFP and time point to correct for seeding variability, yielding a relative proliferation score that reflects the effect of KZFP induction. KZFPs with a normalized proliferation score ≤ 0.85 at day 9 were arbitrarily classified as proliferation-impairing hits in this screening framework.
  
  After doxycycline induction, dot blot analysis using anti-HA and anti-actin antibodies was systematically performed to assess KZFP expression and sample loading, respectively (Supplementary DotBlot.pdf). The HA signal following doxycycline induction (HA_Dox) and actin signal following doxycycline induction (Actin_Dox) were visually scored from the dot blot signals (__Supp __Table 2).
  
  In addition, to strengthen the methodological description and address these concerns more directly, we will:
  
  1/ Include a supplementary table summarizing our experimental observations for each individual KZFP throughout the screening process (See preliminary Supp Table 2). -> See header here:
  
  2/ Perform and include Dot Blot analyses, to assess and compare transgene expression levels across KZFP constructs. (Supplementary File DotBlot.pdf____). Generation of these files is in progress, with a few missing dot blots still being completed (we have done 303 over 366 already). However, preliminary versions have already been submitted. -> See header of the .pdf here:
  
  In addition, we agree that a more explicit discussion of the limitations of our screening approach improves the interpretation of our findings. Accordingly, we expanded the Discussion to address the limitations associated with variable transgene integration, heterogeneous transgene expression, potential toxicity due to ectopic KZFP overexpression, and the use of K562 cells as a standardized rather than physiological cellular model.
  
  “Several methodological considerations should be taken into account when interpreting these results. As with any lentiviral overexpression screen, three potential sources of technical variability may influence the observed phenotypes: differences in transgene integration sites, heterogeneity in transgene expression levels, and non-specific toxicity resulting from ectopic overexpression. Variable integration sites are unlikely to represent a major source of bias in the present study because all analyses were performed on polyclonal populations of transduced cells rather than individual clones, thereby averaging integration-site effects across many independent events. In contrast, heterogeneity in transgene expression levels is expected, as the abundance of each KZFP depends not only on transduction efficiency but also on intrinsic differences in mRNA stability, translational efficiency, and protein stability. To minimize these sources of variability, all constructs underwent systematic quality control, including assessment of cell survival following puromycin selection and evaluation of transgene expression by HA dot blot after doxycycline induction. Although transgene expression levels varied across KZFPs (Supplementary File DotBlot.pdf), this variability showed no systematic relationship with the proliferation phenotypes, suggesting that differences in overexpression efficiency are unlikely to be the primary determinant of toxicity. Nevertheless, ectopic expression exposes cells to supraphysiological concentrations of KZFPs capable of generating non-physiological interactions or regulatory effects. Therefore, while the screening strategy is well suited for identifying candidate functional regulators, independent validation under endogenous expression conditions remains essential to confirm KZFP-specific functions.”
  
  Reviewer 1 Major comment 2:
  
  “A central conceptual issue throughout the manuscript is that the downstream functional analyses of the selected KZFPs remain largely disconnected from the original screening phenotype. The four candidates were prioritized based on proliferation defects observed upon overexpression in K562 cells; however, the subsequent analyses (with the only exception being a more in-depth experimental analysis of ZNF498 in ciliogenesis, which stands out as comparatively more directly supported by experimental evidence) primarily rely on correlative expression patterns and KZFP ChIP-seq datasets to infer potential biological functions in unrelated cellular contexts. As a result, it remains unclear whether the proposed transcriptional programs are mechanistically linked to the proliferation phenotypes that motivated candidate selection in the first place. This issue is evident across multiple sections of the manuscript. For example, the proposed role of ZNF43 in regulating fatty acid metabolism and detoxification pathways is primarily inferred from tissue-level expression correlations. While these analyses focus on genes identified as potential ZNF43 targets, the underlying ChIP-seq datasets were themselves generated under ZNF43 overexpression conditions. Therefore, the current analyses do not establish whether ZNF43 regulates these pathways under physiological expression levels or within a relevant cellular context, nor how such regulation relates to the proliferation defect observed in K562 cells. Moreover, several proposed target genes remain substantially expressed in tissues where ZNF43 expression is not particularly low (e.g., kidney and heart muscle), suggesting that additional regulators are likely involved. Similarly, the proposed model of ZNF257-mediated regulation of MAGEA genes during spermatogenesis is intriguing but does not fully account for the expression behavior of all MAGEA family members, particularly MAGEA2B, which displays strong expression in spermatocytes despite high ZNF257 expression. This expression pattern should be acknowledged in the main text and reflected in Fig. 3K. In addition, the labels for MAGEA6 and MAGEA2B in Fig. 3C appear to be inverted. More broadly, the proposed regulatory model is difficult to reconcile with the generally restricted expression pattern of MAGEA genes across adult tissues, as their expression does not appear to consistently correlate with ZNF257 levels outside the germline context. Related concerns also apply to the analyses of ZNF498 and ZNF18, where the proposed functions in cilium formation and sperm maturation remain disconnected from the proliferation defects identified in the initial screen.”
  
  We agree that this comment raises an important conceptual point and has helped us clarify the scope of the study and the interpretation of our findings. In the revised manuscript, we explicitly distinguish hypothesis generation from mechanistic validation by clarifying that the proliferation phenotype observed in K562 cells reflects the regulatory potential of ectopically expressed KZFPs rather than their physiological functions. We also adopted a more cautious interpretation of the functional analyses, emphasizing that the proposed regulatory networks are hypothesis-generating and that individual KZFPs are unlikely to act as sole regulators. More broadly, we emphasize that the primary objective of this study is to establish a scalable screening platform for prioritizing KZFPs and identifying biologically relevant contexts for future investigation, rather than to provide a comprehensive functional characterization of individual KZFPs. We agree that this comment highlights an important limitation of our proposed regulatory model. In the revised manuscript, we adopted a more nuanced interpretation by presenting ZNF257 as a contributor to, rather than the sole regulator of, the MAGEA transcriptional program, and by explicitly discussing the exceptions identified by the reviewer.
  
  Modification in the revised manuscript:
  
  1/
  
  “Integrative transcriptomic, chromatin and proteomic analyses reveal diverse mechanisms, including transposable element–linked repression (ZNF43), promoter-proximal regulation (ZNF257), and SCAN domain–dependent transcriptional activation (ZNF498/ZSCAN25 and ZNF18).”
  
  Is now:
  
  “Integrative transcriptomic, chromatin and proteomic analyses identify distinct regulatory properties and generate testable hypotheses regarding diverse mechanisms, including transposable element-associated repression (ZNF43), promoter-proximal regulation (ZNF257), and SCAN domain-dependent transcriptional activation (ZNF498/ZSCAN25 and ZNF18).”
  
  2/
  
  “Detailed follow-up of four such candidates, ZNF43, ZNF257, ZNF498 and ZNF18, revealed as hypothesized distinct modes of action, ranging from TE-linked transcriptional repression to promoter-proximal gene silencing and SCAN domain-mediated transcriptional activation. These findings reinforce the view that KZFPs, while often viewed as a homogeneous family of TE-repressive TFs, are rather functionally diverse regulators with wide-ranging impacts on human biology.”
  
  Is now:
  
  “Detailed follow-up of four such candidates, ZNF43, ZNF257, ZNF498 and ZNF18, identified distinct regulatory properties and generated hypotheses regarding their physiological functions. By integrating overexpression-induced transcriptional responses, chromatin occupancy, proteomic analyses and tissue-specific expression data, we propose candidate biological contexts in which these KZFPs may operate. These hypotheses now provide a framework for future mechanistic studies performed under physiological conditions. Together, these findings reinforce the view that KZFPs, while often viewed as a homogeneous family of TE-repressive transcription factors, comprise functionally diverse regulators with broad potential roles in human biology.”
  
  3/
  
  “We conclude from these data that ZNF43 regulates a transcriptional program related to fatty acid metabolism and detoxification, allowing for the preferential expression of its effectors in the liver (Fig. 2G). Interestingly, neither expression nor chromatin state followed the same pattern at the functionally unrelated DNAI4 locus, indicating that this gene is subjected to other dominant regulators.”
  
  Is now:
  
  “Together, these observations identify a small set of candidates ZNF43 target genes involved in fatty acid metabolism and detoxification and suggest that ZNF43 may contribute to the regulation of these transcriptional programmes in appropriate physiological contexts (Fig. 2G). However, these conclusions are derived from overexpression-based datasets and tissue-level expression analyses and should therefore be considered hypothesis-generating. Interestingly, neither expression nor chromatin state followed the same pattern at the functionally unrelated DNAI4 locus, indicating that additional regulatory mechanisms contribute to the control of these genes.”
  
  4/
  
  “It strongly suggests that ZNF257 contributes to initiating the transcriptional repression of these two MAGEA genes during early spermiogenesis, after which their silencing may be stabilized through stable epigenetic mechanisms such as DNA methylation.”
  
  Is now:
  
  “These observations suggest that ZNF257 may contribute to the initiation of transcriptional repression of a subset of MAGEA genes during the spermatogonia-to-spermatocyte transition, after which their silencing may be stabilized through epigenetic mechanisms such as DNA methylation.”
  
  5/
  
  “Together, these results identify ZNF498 as a transcriptional activator of gene modules controlling cytoskeleton-dependent processes and suggest that this TF may act as a regulator of neuronal cytoskeletal architecture, warranting investigation in relevant neural models.”
  
  Is now:
  
  “Together, these results indicate that ZNF498 functions as a transcriptional activator in our overexpression system and support the hypothesis that it contributes to transcriptional programmes controlling cytoskeleton-dependent processes in physiologically relevant neural contexts, warranting further investigation in dedicated neural models.”
  
  6/
  
  “The co-expression of ZNF18 and its target genes at the spermatid stage suggests that ZNF18 activates a transcriptional program supporting these processes.”
  
  Is now:
  
  “The co-expression of ZNF18 and its candidate target genes at the spermatid stage is consistent with the hypothesis that ZNF18 contributes to transcriptional programmes supporting these processes.”
  
  7/
  
  “The four KZFPs characterised here illustrate this diversity. ZNF43 represses a coherent set of genes involved in fatty acid metabolism and detoxification through binding to nearby LTR/ERV1 integrants, with its expression anticorrelating that of its targets: i.e., highly expressed in thymus and bone marrow, where these metabolic genes are silent, and lowly expressed in liver, where they are most active. This represents a clear example of host genomes coopting TE-derived sequences and shaping their regulatory activities in a cell-type specific manner by the differential expression of KZFPs. ZNF257, by contrast, acts as a promoter-proximal repressor whose targets show accelerated sequence evolution at their promoters, consistent with integration into a KZFP-orchestrated GRN through rapid promoter diversification, a feature previously described for KZFPs (Farmiloe et al., 2023). Its regulation of the MAGEA gene cluster exemplifies a distinct evolutionary mechanism: an ancestral intronic binding site, present in MAGEA6 gene body, before ZNF257 emerged, was propagated across the cluster through tandem duplication, enabling coordinated regulation of multiple paralogs. Temporal expression analysis during spermatogenesis further suggests that ZNF257 initiates MAGEA repression at the spermatogonia-to-spermatocyte transition, after which silencing may be maintained through epigenetic mechanisms such as DNA methylation. ZNF498 and ZNF18, both SCAN-containing KZFPs with variant KRAB domains, on the other hand acted as transcriptional activators. ZNF498 activates a programme centred on microtubule cytoskeleton organisation, as demonstrated by the disruption of ciliogenesis upon its overexpression, and both ZNF498 and its targets are broadly expressed in the central nervous system, particularly in excitatory neurons where microtubule dynamics are essential for axonal architecture. ZNF18 similarly activates genes involved in chromatin remodelling and cytoskeletal reorganisation at the spermatid stage, processes that are hallmarks of spermiogenesis. Together, these case studies demonstrate that even within a single screen, KZFPs with fundamentally different regulatory logics can be identified through a single unifying phenotype and then mechanistically dissected to uncover their unique properties.”
  
  Is now:
  
  “The four KZFPs characterized here illustrate the functional diversity that can be uncovered using this screening strategy. For ZNF43, integration of overexpression transcriptomics with ChIP-exo binding data identified a small set of candidate direct target genes located near LTR/ERV1 elements. Their tissue-specific expression patterns are consistent with the hypothesis that ZNF43 contributes to transcriptional programmes associated with fatty acid metabolism and detoxification, although these analyses, which rely on overexpression-derived datasets and tissue-wide correlations, do not establish physiological regulation or causality. Rather, they identify a candidate regulatory network whose functional relevance will require investigation in appropriate biological models. More generally, these observations support the concept that host genomes may exploit TE-derived regulatory sequences in a tissue-specific manner through differential KZFP expression, while recognizing that additional transcription factors almost certainly participate in controlling these gene expression programmes. Similarly, ZNF257 emerged as a promoter-associated transcriptional repressor in our overexpression system. Evolutionary analyses suggest that tandem duplication propagated an ancestral ZNF257-binding sequence across the MAGEA locus, generating the hypothesis that ZNF257 may contribute to coordinated regulation of this gene cluster during spermatogenesis. The temporal expression profiles of ZNF257 and the MAGEA genes are compatible with such a model but remain correlative and therefore require direct functional validation. ZNF498 and ZNF18, two SCAN-containing KZFPs with variant KRAB domains, displayed transcriptional activation rather than repression following overexpression. For ZNF498, the integration of transcriptomic analyses with expression profiling pointed to microtubule cytoskeleton organization as a candidate biological process, a prediction that was further supported experimentally by the marked impairment of ciliogenesis following ZNF498 overexpression in hTERT-RPE1 cells. This represents the strongest functional validation presented in this study and supports the biological relevance of the analytical framework developed here. For ZNF18, the co-expression of the KZFP and its candidate target genes during spermatogenesis is consistent with the hypothesis that it contributes to transcriptional programmes involved in chromatin remodelling and cytoskeletal reorganization during spermatid differentiation. Together, these case studies illustrate how a standardized overexpression screen can identify KZFPs with distinct regulatory properties and generate biologically coherent hypotheses regarding their physiological functions. Rather than establishing definitive functions for individual KZFPs, this framework prioritizes candidates, proposes relevant cellular contexts, and provides a foundation for future mechanistic studies performed under physiological conditions.”
  
  “In addition, interpretation of the SCAN-deletion experiments is complicated by the reduced expression levels of the deletion constructs relative to the corresponding full-length proteins, making it difficult to determine whether the observed proliferation phenotypes are pathway-specific or partially driven by differential expression.”
  
  We thank the reviewer for this important observation and agree that differences in expression levels between the full-length and ΔSCAN constructs could complicate the interpretation of the observed phenotypes. To address this concern, we performed a quantitative comparison of the expression levels of full-length and ΔSCAN proteins using both western blotting and transgene expression using RNAseq, while accounting for differences in transgene length. This result are now added in (Fig S6C, D).
  
  With modification of the legend:
  
  HA signal after OE of HA-tagged ZNF18, ZNF18∆SCAN, ZNF498, ZNF498∆SCAN or GFP in K562 cells. Actin as control.
  
  Quantification of ZNF18, ZNF18∆SCAN, ZNF498, ZNF498∆SCAN It appears that the difference is small (Minor comments of the reviewer 1
  
  “- In the Abstract and in the "Limitations of the study" section, the term "annotation" is used. It would be preferable to specify "functional characterization" instead of "annotation".
  
  Done as suggested by the reviewer.
  
  In the Introduction, there may be a minor citation confusion. Following the sentence: "Characterized by an N-terminal KRAB domain and a C-terminal tandem array of C2H2 zinc fingers, KZFPs primarily target transposable element (TE)-embedded sequences," the cited references are predominantly experimental studies supporting this statement. However, the inclusion of the review "Bruno, Mahgoub and Macfarlan, 2019" appears less appropriate in this context, as it does not directly present ChIP-seq data supporting this claim. More relevant primary studies from the same research area include "Wolf et al. 2020" and "Bruno et al. 2025.".
  
  Done as suggested by the reviewer.
  
  In Fig. 1A, "D10" appears inconsistent with the text and other figures (Fig. 1B, 1G, 1H), which refer to 9 days post-induction.
  
  Done as suggested by the reviewer.
  
  In Fig. S1, there may be a mismatch in the highlighted plate: the zoomed image appears to correspond to the first plate from the top. The correct plate should be highlighted for consistency.
  
  Done as suggested by the reviewer.
  
  In Fig. 1B, there is a typographical error ("K ZFPs" instead of "KZFPs").
  
  Done as suggested by the reviewer.
  
  In Fig. S1E, it is unclear what "other" refers to. Please clarify whether this represents the mean of all remaining KZFPs or a defined subset, ideally in the figure description.
  
  Done as suggested by the reviewer.
  
  In Fig. S2E, "SetDB1" should be corrected to "SETDB1".
  
  Done as suggested by the reviewer.
  
  In Fig. 3B, it is unclear what distinguishes the upper and lower "Diverse REs". A brief clarification in the figure legend would improve interpretability, particularly regarding the transposable element families included.
  
  Done as suggested by the reviewer.
  
  In Fig. S3C, the x-axis labels appear slightly misaligned and shifted to the right.
  
  Done as suggested by the reviewer.
  
  In Fig. 3C, the labels for MAGEA6 and MAGEA2B appear to be inverted.
  
  Done as suggested by the reviewer.
  
  In Fig. 3K, "MAGE3" should be corrected to "MAGEA3".
  
  Done as suggested by the reviewer.
  
  In the ZNF498 section, line 4, the punctuation should be corrected so that the period appears after the figure reference ("promoters (Fig. S1E).").
  
  Done as suggested by the reviewer.
  
  In the final sentence of the ZNF498 section, a noun appears to be missing after "cytoskeleton-dependent," possibly "processes".
  
  Done as suggested by the reviewer.
  
  In the last section of the Results and corresponding figures and their descriptions, "SCAN dependant" should be corrected to "SCAN-dependent".”
  
  Done as suggested by the reviewer.
  
  Major comments of the reviewer 2
  
  “- The authors chose four KZFPs to study in detail, but why they chose these 4 candidates is unlcear to me. It would be nice to add a more detailed description of the process by which they chose the four candidates.”
  
  We agree that the rationale for selecting the four KZFPs should be presented more explicitly. Accordingly, we revised the manuscript to clarify the selection criteria.
  
  “However, a modest correlation was noted between the number of transcription start sites (TSS) bound by KZFPs and the drop in PrestoBlue signal induced by their overexpression (Fig. 1G), and SCAN-containing KZFPs (SKZFPs) tended to induce proliferation defects more frequently than family members lacking this domain (Fig. 1H).”
  
  Is now:
  
  “However, a modest correlation was noted between the number of transcription start sites (TSS) bound by KZFPs and the drop in PrestoBlue signal induced by their overexpression (Fig. 1G), and SCAN-containing KZFPs (SKZFPs) tended to induce proliferation defects more frequently than family members lacking this domain (Fig. 1H). These observations indicated that KZFPs affecting proliferation do not constitute a homogeneous functional group, prompting us to select representative candidates spanning the evolutionary, structural, and genomic diversity of the KZFP family for mechanistic characterization.____”
  
  “- The materials and methods part of the manuscript is not detailed enough for other researchers to reproduce the study. They should add more details to both experiments and data analysis part of this section. Below I highlight some examples for sake of clarity, but the authors should revise the whole materials and methods section and add more details keeping these examples in mind:
  
  The authors do not state the titer of lentiviral vectors they generate nor the MOI or amount of virus they use to transduce the cells
  
  In many cases, the specific softwares and the software version is not stated e.g., the analysis of the Gene Ontology Biological Processes
  
  It would be beneficial for the readers to get more details about the construct they used, for example a map of the plasmid.
  
  It is unclear how many cells were used for RNA extraction
  
  It is unclear which microscopes were used for imaging.
  
  The concentration of antibodies used for staining and the product number, and provider of the antibody is not always depicted.”
  
  We agree that the additional methodological details requested by the reviewer will improve the reproducibility and transparency of the study. Accordingly, we have expanded the Methods section to provide a more detailed description of the experimental procedures and data analysis workflow.
  
  “Lentiviral particles were produced in HEK293T cells by transient co-transfection of transfer, packaging and envelope plasmids. Cells were transfected at approximately 70–80% confluence using a standard lipid-based transfection reagent. Viral supernatants were collected 48 h after transfection, cleared by centrifugation, filtered through 0.22-µm membranes, and used fresh or stored appropriately until use. Recipient K562 or hTERT-RPE1 cells were transduced under conditions optimized for efficient gene delivery.”
  
  Is now:
  
  “Lentiviral particles were produced in HEK293T cells. 105 cells were seeded in 24 well plates filled with 1ml DMEM the day before transfection. Cells were co-transfected individually with 0.15ug of each plasmids encoding KZFPs tagged with HA (pTRE-KZFPX-HA-PGK-puro), 0.1ug of the packaging plasmid (pR8.74) and 0.07ug of the envelope plasmid (pMD2G) using TransIT®-LT1 Transfection Reagent (MIR 2306), according to the manufacturer's instructions. Viral supernatants were harvested 24h after transfection, clarified by centrifugation, filtered through 0.45-µm filters and used immediately.”
  
  “Coding sequences were cloned into doxycycline-inducible lentiviral transfer vectors designed to express N-terminally HA-tagged proteins.”
  
  Is now:
  
  “Coding sequences corresponding to 366 human KZFP open reading frames were codon-optimized for human expression and cloned into doxycycline-inducible lentiviral transfer vectors expressing C-terminal HA-tagged proteins under the control of a tetracycline-responsive promoter pTRE-KZFPX-HA-PGK-puro. All expression constructs used in the primary overexpression screen have been deposited and are publicly available (De Tribolet et al., 2023). A schematic representation of the lentiviral expression cassette, including the promoter, HA tag, cloning site, antibiotic resistance cassette, and regulatory elements, is provided in Supplementary file. Selected constructs encoding ZNF43, ZNF257, ZNF498 and ZNF18 were used for follow-up mechanistic studies. For SCAN-domain functional analyses, deletion constructs lacking the SCAN domain (ΔSCAN) were generated for ZNF18 and ZNF498 in the same lentiviral backbone. Deletion were done using In-Fusion cloning with specific primers. PCR was performed with high-fidelity polymerase, followed by gel purification and recombination with the linearized plasmid using the In-Fusion HD Cloning Kit (Takara Bio©) according to the manufacturer’s protocol. The product was transformed into HB101 Escherichia coli cells, and colonies were screened by PCR. Positive clones were verified by Sanger sequencing, and confirmed plasmids were propagated and purified for further use.”
  
  “Total RNA was extracted...”
  
  Is now:
  
  “For each biological replicate, approximately 1 × 10⁶ K562 cells were harvested 72 h after doxycycline induction. Total RNA was extracted…”
  
  “Images were acquired by fluorescence microscopy under identical conditions across samples.”
  
  Is now:
  
  “Images were acquired using a confocal microscope Leica-SP8 (Leica Biosystems) with an objective HC PL APO 63x/1.40 and a pinhole size of 1 AU, using identical acquisition settings for all conditions. Images were processed using Fiji/ImageJ (version 2.9.0) without nonlinear intensity adjustments.”
  
  “Cells were fixed and stained with antibodies against ciliary markers (ARL13B)”
  
  Is now:
  
  “Cells were fixed in 4% paraformaldehyde, permeabilized with 0.1% Triton X-100, blocked with 2% BSA, and incubated with rabbit anti-ARL13B (Proteintech, Cat. No. 17711-1-AP, 1:200) followed by Alexa Fluor 568-conjugated donkey anti-rabbit IgG (Thermo Fisher Scientific, Cat. No. A-10042, 1:1000). Nuclei were stained with Hoechst (1 µg/mL).”
  
  “- The authors mention that KZFPs are usually expressed at a low level in the K562 cell line they use, but there is no figure showing the expression level of KZFPs in this cell type. It would be important to see the baseline KZFP expression in these cells, the level of overexpression and compare it to the endogenous expression levels they show in different cell types/tissues, at least for the four candidates studied more in depth. This would help to understand whether this level of activity is something that could occur naturally in a physiologically relevant context.”
  
  We thank the reviewer for this insightful suggestion and fully agree that providing additional context regarding endogenous and ectopic KZFP expression levels will help readers better assess the physiological relevance of our findings. As suggested, we included data showing the baseline expression levels of the four selected KZFPs in K562 cells together with the expression levels achieved following doxycycline-induced overexpression. We also compared these values with publicly available transcriptomic data from cell lines. Importantly, only cell lines are assessed as we need ground through (K562) to estimate transgene expression. We modified Fig. S2, Fig. S3, Fig. S4 and Fig. S5 to add the results of these analysis. Here is ZNF43 as an example:
  
  With the following legend:
  
  “(C) Distribution of endogenous expression levels, (using GFP control cells), of all expressed genes (light grey) and all KZFPs (dark grey) in K562 cells. The solid red line indicates endogenous ZNF43 expression in GFP control cells, whereas the dashed red line indicates the corrected transgene expression following doxycycline induction.
  
  (D) Endogenous ZNF43 expression across Human Protein Atlas cell lines, (https://www.proteinatlas.org/about/download#cell_line), following normalization to the local RNA-seq dataset. K562 cells are highlighted in red. The dashed red line indicates the corrected transgene level measured following doxycycline-induced overexpression in K562 cells overexpressing ZNF43.”
  
  Modified the result section:
  
  “ZNF43 is a ~43-million-year-old KZFP with a canonical TRIM28-recruiting KRAB domain and 19 zinc fingers that preferentially recognize an LTR/ERV1-embedded sequence (Fig. S1F). We first verified that ZNF43 overexpression impaired the growth of K562 cells (Fig. S2A, B). Endogenous ZNF43 expression was readily detectable in K562 cells and across human cell lines (Fig. S2C, D). Following doxycycline induction, transcript abundance markedly increased and exceeded the highest endogenous expression level observed among the analyzed cell lines (Fig. S2C, D).”
  
  We also updated the Methods section:
  
  “Quantification of endogenous and transgene expression levels
  
  Endogenous KZFP expression in K562 cells was estimated from GFP control RNA-seq samples using normalized mean expression values obtained from the differential expression analyses. For ZNF18, whose transgene sequence is identical to the endogenous coding sequence (i.e., not codon-optimized), transgene-derived expression was estimated directly by subtracting the endogenous transcript abundance measured in GFP controls from the total transcript abundance measured following doxycycline induction (OE − GFP). For ZNF43, ZNF257 and ZNF498, the overexpression constructs were synthesized using codon-optimized coding sequences. RNA-seq reads were therefore additionally aligned against the codon-optimized transgene reference sequences to specifically quantify exogenous transcripts without interference from endogenous reads. Because these codon-specific counts are generated through an independent alignment strategy, they are not directly comparable to the endogenous RNA-seq expression values. To calibrate these measurements, a scaling factor was derived from the ZNF18 dataset by comparing the codon-specific read counts with the transgene abundance estimated from the differential expression analysis (OE − GFP). This empirically determined correction factor was subsequently applied to all codon-optimized constructs, thereby expressing transgene abundance on the same scale as the endogenous RNA-seq measurements. Corrected transgene expression values were then used for all downstream comparisons. To compare endogenous expression across physiological contexts, publicly available RNA-seq datasets from the Human Protein Atlas (cell lines) were downloaded and normalized to the local RNA-seq scale. A normalization factor was calculated from the median expression ratio of KZFPs detected in both the Human Protein Atlas K562 dataset and the local K562 GFP control RNA-seq dataset, and subsequently applied uniformly to all Human Protein Atlas datasets. This normalization enabled direct comparison of endogenous expression across biological contexts with the corrected transgene expression values. Global KZFP expression was calculated as the median normalized expression of all annotated KZFPs within each biological context. For the four KZFPs selected for detailed characterization, endogenous expression across Human Protein Atlas cell lines was compared with corrected transgene expression following doxycycline induction. Expression distributions of all genes and KZFPs were visualized using ranked expression plots and density histograms. All analyses were performed in R using the tidyverse package.”
  
  We fully acknowledge that the overexpression system used in this study was primarily designed as a discovery platform to identify candidate functions, targets, and interaction partners of KZFPs that are otherwise expressed at lower levels in K562 cells. As the reviewer correctly points out, determining whether these regulatory effects occur at endogenous expression levels in physiologically relevant cellular contexts represents an important next step. We Thereby also clarified this in the “Limitations to this study” paragraph:
  
  “To better place our experimental system into a physiological context, we compared endogenous KZFP expression in K562 cells with publicly available transcriptomic datasets from the Human Protein Atlas. These analyses showed that K562 cells do not exhibit unusually low global KZFP expression compared with other human cell lines. However, consistent with the restricted expression patterns of this protein family, KZFPs as a whole are expressed at substantially lower levels than the average human gene. For the four KZFPs characterized in detail, doxycycline induction produced transcript levels that exceeded the highest endogenous expression observed across the analyzed human cell lines. Accordingly, the overexpression system used in this study was not designed to recapitulate physiological expression levels but rather to maximize the identification of candidate target genes, interacting partners, and regulatory pathways for KZFPs that are otherwise expressed at low endogenous levels. Consequently, the molecular interactions identified here should be considered as hypotheses requiring validation under endogenous expression conditions in physiologically relevant cellular models.”
  
  “- RNA seq analysis: It is unclear how many cells were used in the RNA seq analysis, I would like to ask the authors to clarify that. Moreover, from my understanding the RNA seq analysis was done on day 3, while the Presto Blue analysis was done on days 4, 7 and 9. I would like to kindly ask the authors to motivate their choice for the day of the RNA sequencing analysis.”
  
  We agree that this information required clarification. The Methods section has been revised to specify the number of cells used for RNA-seq library preparation and to explain the rationale for performing RNA-seq after 3 days of doxycycline induction, before measurable proliferation defects emerge, in order to capture primary transcriptional responses to KZFP overexpression. The corresponding modification has also been added to the Results section when introducing the RNA-seq analyses.
  
  “For transcriptome profiling, K562 cells expressing the indicated inducible constructs were treated with doxycycline for 72 h before harvest. Total RNA was extracted using the NucleoSpin RNA plus kit (Macherey-Nagel) according to the manufacturer’s recommendations. RNA quantity and purity were assessed by spectrophotometry, and RNA integrity was evaluated before library preparation.”
  
  Is now:
  
  “For transcriptome profiling, 1 × 10⁶ K562 cells expressing the indicated inducible constructs were treated with doxycycline for 72 h before harvest. RNA was collected after 3 days of induction to capture the primary transcriptional responses to KZFP overexpression before substantial differences in proliferation became apparent. This early time point was chosen to minimize secondary transcriptional changes resulting from altered cell growth, cell-cycle distribution, or cellular stress, which become detectable in the proliferation assays performed after 4, 7, and 9 days of induction. Total RNA was extracted using the NucleoSpin RNA plus kit (Macherey-Nagel) according to the manufacturer’s recommendations. RNA quantity and purity were assessed by spectrophotometry, and RNA integrity was evaluated before library preparation.”
  
  “We then profiled the transcriptome of K562 cells overexpressing ZNF43 by deep RNA sequencing (RNA-seq)”
  
  Is now:
  
  “We then profiled the transcriptome of K562 cells overexpressing ZNF43 by deep RNA sequencing (RNA-seq) after 3 days of doxycycline induction, a time point selected to capture primary transcriptional responses before the onset of measurable proliferation defects.”
  
  Minor comments of the reviewer 2
  
  “- Figure S1D is not mentioned in the text before figure S1E. The order of the panels should be changed in the figure.
  
  Done as suggested by the reviewer.
  
  "We selected genes that were downregulated upon ZNF43 overexpression and harboured a ZNF43 binding site within 10kb of their TSS (Fig. 1A) - don't the authors mean Fig. 2A?
  
  Done as suggested by the reviewer.
  
  In Figure 4D, the GO terms cannot be read, as the sentences seem to be cut.
  
  Done as suggested by the reviewer.
  
  All figures and figure legends need to be revised. In some cases, the letter size is too small, or the legend and explanation of colours is missing. Please see some examples below: Fig. S6C, Fig 6C, Fig S4C, Fig S5C (letter size too small) Fig S6G, Fig 4E (label/scale is missing)”
  
  Homogenized to Arial 6 by default as requested by most of journal guidelines
  
  __ Description of analyses that authors prefer not to carry out__
  
  We think that by proceeding as described above we will have addressed all major conceptual issues raised by the reviewers.
  
  PeerReviewed
2. EMBOpress 09 Jul 2026
  
  in Review Commons
  
  Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Referee #2
  
  Evidence, reproducibility and clarity
  
  Summary
  
  Foley et al establishes a scalable framework to probe KZFP function. They performed an array of inducible overexpression screen of 366 human KZFPs in K562 cells. This screen, together with the analysis of transcriptomic and available chromatin and proteomic datasets revealed that KZFPs regulate many different mechanisms, highlighting the functional diversity of KZFPs. Understanding this functional diversity is a very interesting, timely and relevant question, but it is also challenging to study. Therefore, the approach the authors develop is promising. While the quality of the experiments and data analysis is high, the weakness I see in the manuscript is the lack of major biological insights in relevant model systems. Please see my detailed comment in the significance part.
  
  Major comments
  
  The authors chose four KZFPs to study in detail, but why they chose these 4 candidates is unlcear to me. It would be nice to add a more detailed description of the process by which they chose the four candidates.
  
  The materials and methods part of the manuscript is not detailed enough for other researchers to reproduce the study. They should add more details to both experiments and data analysis part of this section. Below I highlight some examples for sake of clarity, but the authors should revise the whole materials and methods section and add more details keeping these examples in mind:
  
  The authors do not state the titer of lentiviral vectors they generate nor the MOI or amount of virus they use to transduce the cells
  
  In many cases, the specific softwares and the software version is not stated e.g. the analysis of the Gene Ontology Biological Processes
  
  It would be beneficial for the readers to get more details about the construct they used, for example a map of the plasmid.
  
  It is unclear how many cells were used for RNA extraction
  
  It is unclear which microscopes were used for imaging.
  
  The concentration of antibodies used for staining and the product number, and provider of the antibody is not always depicted.
  
  The authors looked at available chromatin data in either K562 cells or HEK293 cells, which I think is a very good way of utilizing publicly available data. Since the authors showed that different KZFPs might be functionally relevant in different cell types/tissues, I was wondering if they checked if there is available ChIP Seq or CUT&RUN data in those specific cell types/tissues. If yes, that data should be included in the manuscript.
  
  The authors mention that KZFPs are usually expressed at a low level in the K562 cell line they use, but there is no figure showing the expression level of KZFPs in this cell type. It would be important to see the baseline KZFP expression in these cells, the level of overexpression and compare it to the endogenous expression levels they show in different cell types/tissues, at least for the four candidates studied more in depth. This would help to understand whether this level of activity is something that could occur naturally in a physiologically relevant context.
  
  RNA seq analysis: It is unclear how many cells were used in the RNA seq analysis, I would like to ask the authors to clarify that. Moreover, from my understanding the RNA seq analysis was done on day 3, while the Presto Blue analysis was done on days 4, 7 and 10. I would like to kindly ask the authors to motivate their choice for the day of the RNA sequencing analysis.
  
  Minor comments
  
  Figure S1D is not mentioned in the text before figure S1E. The order of the panels should be changed in the figure.
  
  "We selected genes that were downregulated upon ZNF43 overexpression and harboured a ZNF43 binding site within 10kb of their TSS (Fig. 1A) - don't the authors mean Fig. 2A?
  
  In Figure 4D, the GO terms cannot be read, as the sentences seem to be cut.
  
  All figures and figure legends need to be revised. In some cases, the letter size is too small, or the legend and explanation of colours is missing. Please see some examples below: Fig. S6C, Fig 6C, Fig S4C, Fig S5C (letter size too small) Fig S6G, Fig 4E (label/scale is missing)
  
  Significance
  
  Understanding the diverse roles of KZFPs is an important and interesting research question. However, studying KZFPs is challenging, as many KZFP-mediated effects appear to be highly cell type- and tissue-specific. This complexity is also highlighted by the findings of the current manuscript.
  
  A major strength of this study is the development of a scalable system that enables the simultaneous investigation of the entire KZFP family. Performing such analyses on an individual basis would be extremely time-consuming. Therefore, the authors provide an efficient and valuable screening platform that can identify promising candidates for further investigation. In this regard, the methodological advance represents the primary contribution of the work.
  
  At the same time, the study lacks a clear biological conclusion. While the screen identifies KZFPs with potential functional effects, it would substantially increase the impact of the manuscript if the authors selected at least one candidate for in-depth characterization in a biologically relevant cellular context. The current study is still of high quality and importance without these experiments, but such follow-up analyses would greatly strengthen the biological significance of the findings.
  
  Another limitation is that the experiments were performed in a cell type in which many of the investigated KZFPs are not normally expressed. As a result, the forced overexpression strategy may not accurately reflect physiological conditions and could potentially generate false-positive results. This concern is particularly relevant in light of the authors' statement that "KZFPs with sufficient regulatory potency to perturb cellular fitness outside of their normal setting are strong candidates for playing important roles within it." While this may indeed be true for some KZFPs, it is also possible that certain observed phenotypes simply arise from ectopic expression in an inappropriate cellular environment.
  
  More generally, the observation that KZFPs can have functions beyond TE repression is already established in the literature. Therefore, the manuscript provides limited new biological insight into this concept. The authors could potentially strengthen the novelty of the study by placing greater emphasis on specific KZFP subfamilies, such as SCAN-containing zinc finger proteins, which are a novel direction and have been implicated in non-canonical regulatory roles.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.04.20.718945
www.biorxiv.org www.biorxiv.org

Primate Hippocampus Reveals Distinct Rules for Associative Synaptic Plasticity

2
1. Public_Reviews 09 Jul 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this manuscript, the authors have undertaken an investigation of differences between two mammalian species, the brown rat and the crab-eating macaque, in the mechanisms supporting a well-established model of long-term Hebbian synaptic plasticity, Schaffer collateral to CA1 Long-term potentiation (LTP) in the hippocampus. LTP has been long-studied and deeply characterised due to its potential importance in modeling a strong candidate process for the central mechanism of learning and memory. LTP was first discovered in lagomorphs (rabbits), but has since been much more widely studied in rodents (mostly rats and mice), and there has been some complementary work revealing LTP in non-human primates and even in humans, revealing largely overlapping canonical mechanisms of induction, expression, and maintenance. More specifically, this study puts a particular focus on the fascinating associative features of this form of lasting synapse-specific modification, in which a synaptic input can be stimulated with a relatively weak induction protocol that will not produce lasting plasticity on its own, but can undergo lasting LTP if paired with stronger stimulation on a separate synaptic input to the same neuron. This associativity mechanism is particularly attractive within the Hebbian synaptic plasticity framework as it provides a candidate mechanism for associative forms of learning in which stimulus-stimulus, stimulus-reward, stimulus-punishment, or action-outcome associations are formed. A particularly attractive feature of this associative LTP is that there can also be a substantial time-lag between the strong stimulation of one pathway and the weaker stimulation of the other synaptic input, which only undergoes lasting LTP by hijacking the proteins synthesized as a result of strong stimulation elsewhere. This observation has led to the famous tagging and capture hypothesis as an explanation of how such synapse-specific change can be achieved on both stimulated inputs but not on other synaptic inputs, given the potential requirement for cell-wide protein synthesis. This theory, for which there is very strong experimental evidence, posits that a protein tag is left at synapses that have been stimulated with sufficient vigor in recent history, serving as a key mechanism to ensure that those weakly stimulated synapses will undergo change when a larger-scale LTP event occurs due to stronger stimulation elsewhere within a relevant time window. Again, this idea is attractive as it can explain how we might form associations between events that occur slightly separated in time. The manuscript goes on to show that an induction protocol that is particularly physiologically relevant, theta burst stimulation, produces this tag and capture associative effect in ex vivo slices of Macaque hippocampus, much more readily than in side-by-side ex vivo slices of rat hippocampus. Moreover, the manuscript delves into the importance of well-characterised LTP maintenance mechanisms, including PKMzeta and BDNF, which are key factors that ensure that altered synaptic change is maintained for long periods of time despite substantial molecular turnover in the neuron. The observation in this manuscript is that a degree of redundancy for these mechanisms exists in the primate species but not the rodent species, as both mechanisms need to be inhibited to return LTP to baseline in the Macaque, but only one needs to be inhibited to have that effect in the rat. A major emphasis of this study is that there may be a step-wise difference in associative learning mechanisms between rodents and primates that may contribute to their differing cognitive capacities, although I believe a lot more evidence would be required to reach that conclusion.
  
  Strengths:
  
  The strengths of this study are that it is technically very proficient and is from a laboratory that has a long history of seminal work on synaptic tagging and capture. The cross-species comparison, particularly involving non-human primates, is also very hard to achieve, and a major strength here is the side-by-side comparison of slices from rat and monkeys. Further strengths of the study are the use of a number of experimental strategies, including both observation and intervention, to demonstrate differential involvement of LTP maintenance mechanisms. A final major strength is conceptual, as it is undoubtedly useful not only to identify shared mechanisms of plasticity between commonly used model organisms and either humans or much more closely related species such as old world monkeys, but also to reveal differences that have the potential to contribute to differences in memory/cognition.
  
  Weaknesses:
  
  The findings of this study are a very useful building block for understanding how generalisable mechanisms of LTP are. However, arriving at really substantial conclusions from these findings is challenging, as there are a number of variables that are unaccounted for in this study that may explain the differences that have been observed between rats and monkeys. One example of a potential confound to these interpretations is that rats are nocturnal/crepuscular animals, and macaques are diurnal animals. Thus, to undertake a like-for-like comparison, it would be necessary for the rats to be on a reversed light-dark cycle to ensure that the wake cycle of the rat (dark) is being compared with the wake cycle of the monkey (light). It is possible that the authors have done this, but it is not mentioned in the methods section. The reason this is important is that there is a substantial body of work indicating that different mechanisms are at play in hippocampal LTP during wake and sleep. Transcripts and proteins related to synaptic function are dramatically differentially regulated during sleep-wake cycles, and phosphorylation states of key proteins involved in plasticity are also altered. Moreover, synaptic tagging and capture are specifically disrupted by sleep deprivation. Perhaps the authors have already considered this factor and appropriately reversed the light-dark cycle of their rat subjects, in which case a clarification in the manuscript would be useful. Nevertheless, I have used this as an example because there is a variety of potential confounds that may explain the difference between SC-CA1 TBS LTP in rats and monkeys, e.g., circadian rhythms, degree of enrichment, natural light vs indoor lighting, diet, degree of inbreeding, strain, etc. Thus, to make strong conclusions about the potential for differences in plasticity rules/mechanisms and how those may contribute to differences in cognition, I think it would be necessary to compare a wider variety of species, including a good representation of each order (e.g., nocturnal rats and diurnal squirrels, new and old world primates) and not just a single exemplar. I understand, of course, that this is really pushing the boundaries of practicality, but I see no other way to make a strong conclusion or to generalise to mechanisms or properties of plasticity in rodents vs primates. Thus, while I believe the manuscript presents really admirable work, I am not sure the findings are at all easy to interpret.
  
  Review 3
2. Public_Reviews 09 Jul 2026
  
  in eLife
  
  Author response:
  
  eLife Assessment
  
  This is a potentially important study comparing LTP mechanisms between primates and rodents. The experimental methods have some possible confounds, and the power (replicates) and design of the statistical methods could be strengthened, hence the support for the central claims of species differences is currently incomplete.
  
  We thank the Editor and the Reviewers for taking the time to carefully review our manuscript and for providing constructive comments and suggestions, as well as the opportunity to revise our work.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This is an important paper examining LTP induced by theta-burst stimulation in hippocampal slices from macaques and rats. While both species show theta-burst-late-LTP, only the non-human primate theta-burst-late-LTP showed synaptic tagging and capture that converts early-LTP into late-LTP in an independent synaptic pathway.
  
  Strengths:
  
  Synaptic tagging is a fundamental feature of repeated 100 Hz-tetanus-induced LTP, whereas theta-burst induction is arguably more physiologically relevant. Thus, synaptic tagging during theta-burst may differ in the two species, a distinction that may prove important in the mechanisms underlying the cognitive differences between the species.
  
  Weaknesses:
  
  Bursts repeated at the frequency (~5 Hz) of the endogenous theta rhythm induce strong LTP, primarily because this frequency disables feed-forward inhibition and allows sufficient postsynaptic depolarization to activate voltage-sensitive NMDA receptors. Therefore, the species differences may be due to differences in inhibition, rather than in molecular mechanisms of maintenance. One way to assess the relative strengths of this early induction mechanism in rats and macaques is to examine the "depolarization envelope" during the sequential bursts, which may be determined from the recordings already obtained. (Larson and Munkácsy, Theta-burst LTP, Brain Res 2015 Sep 24:1621:38-50. doi: 10.1016/j.brainres.2014.10.034)
  
  Another issue is that the PKMzeta-antisense oligodeoxynucleotides block the synthesis of the kinase. However, Mei F, Nagappan G, Ke Y, Sacktor TC, Lu B (2011), BDNF Facilitates L-LTP Maintenance in the Absence of Protein Synthesis through PKMzeta. PLoS ONE 6(6):e21568, provided evidence that BDNF and theta-burst stimulation can act to increase PKMzeta by a protein synthesis-independent mechanism, presumably through decreased degradation. Therefore, the absence of an effect of the PKMzeta-antisense does not exclude the possibility that persistently increased PKMzeta is the mechanism of theta-burst-late-LTP maintenance in mice or macaques. This issue is worth discussing.
  
  We sincerely thank the reviewer for the positive evaluation of our study and for highlighting the significance of examining synaptic tagging and capture following theta-burst stimulation (TBS) in rodents and non-human primates.
  
  We agree that TBS is a physiologically relevant induction paradigm and that differences in inhibitory circuit dynamics may also contribute to the species-specific effects observed in our study. As highlighted by Larson and Munkácsy (2015), repeated bursts delivered at theta frequency (~5 Hz) can transiently suppress feed-forward inhibition through GABAB receptor-mediated mechanisms, thereby enhancing postsynaptic depolarization and facilitating NMDA receptor activation. We therefore agree that species differences in inhibitory regulation and burst-evoked depolarization may contribute to the distinct expression of synaptic tagging and capture observed between rats and non-human primates.
  
  We further agree that analysis of the “depolarization envelope” during sequential bursts may provide additional insight into the relative strengths of early induction mechanisms. We will therefore perform these analyses using the existing recordings and compare the depolarization envelope between rodents and NHPs in the revised manuscript. Following the reviewer’s suggestion, we will expand the Discussion section to acknowledge the potential contribution of inhibitory circuit dynamics and depolarization envelope differences during sequential bursts.
  
  Importantly, however, we believe that differences in downstream molecular maintenance mechanisms also contribute to these species-specific effects. In support of this, our molecular analyses revealed enhanced recruitment of plasticity-related proteins and transcriptional pathways in NHP hippocampus following TBS, including increased expression of BDNF and PKCζ. These findings suggest that both induction-related network properties and downstream molecular stabilization mechanisms may collectively contribute to the enhanced associative plasticity observed in NHPs.
  
  We also thank the reviewer for the important point regarding PKMζ antisense experiments and the study by Mei et al. (2011). We agree that the absence of an effect of PKMζ antisense oligodeoxynucleotides does not necessarily exclude a role for persistently elevated PKMζ in the maintenance of theta-burst late-LTP. As demonstrated by Mei et al., BDNF together with theta-burst stimulation can maintain late-LTP in the absence of protein synthesis, potentially through stabilization of PKMζ protein levels by reducing degradation rather than through de novo synthesis. However, these findings are not directly comparable to our study, since our experiments involved theta-burst stimulation alone without exogenous BDNF application. Interestingly, our results suggest species-specific differences in the interaction between BDNF and PKMζ signaling pathways. In rats, TrkB/Fc-mediated blockade of BDNF impaired TBS-LTP maintenance, whereas PKMζ inhibition alone had no significant effect. In contrast, in NHP hippocampal slices, inhibition of either BDNF signaling or PKMζ alone failed to abolish late-LTP, whereas simultaneous inhibition of both pathways disrupted LTP maintenance.
  
  These findings suggest that endogenous BDNF signaling and PKMζ may operate through partially redundant or compensatory mechanisms, particularly in the primate hippocampus. Therefore, although our findings indicate that de novo PKMζ synthesis may not be strictly required under the present experimental conditions, we cannot fully exclude the possibility that protein synthesis-independent stabilization or maintenance of PKMζ contributes to theta-burst late-LTP maintenance in rodents or NHPs. We will now clarify this point in the revised Discussion section.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study compares theta-burst stimulation (TBS)-induced synaptic plasticity in hippocampal CA1 slices from rats and non-human primates (Macaca fascicularis). The authors report that while TBS induces persistent LTP in both species, only primate hippocampal slices exhibit synaptic tagging and capture (STC) under these conditions. They further show increased BDNF and PKMζ expression following TBS in primates and propose that a redundant BDNF/PKMζ signaling architecture supports persistent plasticity in primates, whereas rodent TBS-LTP depends primarily on BDNF. The work aims to identify species-specific specializations in associative plasticity with implications for translational neuroscience.
  
  Strengths:
  
  The topic is potentially important because direct comparisons of hippocampal plasticity mechanisms between rodents and primates are rare.
  
  Weaknesses:
  
  (1) Limited biological replication in the primate experiments
  
  The manuscript's strongest claims rely on data obtained from 36 slices from 7 monkeys, qPCR analyses with n=3 biological replicates, and Western blot analyses with n=3 biological replicates. The effective sample size for species-level conclusions is therefore not large. The manuscript frequently treats slices as independent observations while drawing conclusions about species differences. This is particularly problematic for electrophysiological experiments because multiple slices appear to originate from the same animals. The statistical unit should be the animal, not the slice, unless nested analyses are performed.
  
  The authors should (1) report the number of animals contributing to each experiment, (2) provide animal-level analyses, (3) use mixed-effects or hierarchical models where appropriate, and (4) clarify whether multiple slices from the same monkey contributed to the same experimental condition. Without these analyses, the evidence for species-specific mechanisms remains weaker than presented.
  
  We thank the reviewer for this important and thoughtful comment regarding statistical interpretation and biological replication. We agree that, particularly for electrophysiological experiments where multiple slices may originate from the same animal, the effective sample size for species-level conclusions should be considered at the animal level rather than solely at the slice level.
  
  In the revised manuscript, we will clearly indicate the number of biological replicates (animals) together with the number of slices contributing to each electrophysiological experiment, as well as the biological replicates used for qPCR and Western blot analyses. We will also clarify whether multiple slices from the same NHP/rat contributed to the same experimental condition. These details will be incorporated into the figures and figure legends wherever appropriate.
  
  In addition, we will perform animal-level analyses by averaging slice responses within each animal prior to statistical comparison and, where appropriate, apply hierarchical or mixed-effects statistical models to account for the nested structure of slices within animals.
  
  We acknowledge that the number of non-human primates (NHPs) available for this study was inherently limited because of the substantial ethical, logistical, financial, and technical challenges associated with primate electrophysiology and tissue collection. Consequently, achieving sample sizes comparable to rodent studies is often not feasible in NHP research. Nevertheless, to further strengthen the biological robustness of the findings, we are currently in the process of obtaining additional NHP brain samples and plan to repeat key experiments in an additional 3-4 animals. We believe these revisions and additional experiments will substantially strengthen the statistical rigor and overall interpretation of the study.
  
  (2) The central STC conclusion requires stronger controls
  
  The most important result is that TBS supports STC in primates but not rats (Figures 1F-G). However, several alternative explanations are not excluded. For example, only a single interval (30 min) between TBS and WTET is examined. Classical STC studies characterize tag duration, PRP availability window, and temporal asymmetry. The current work does not determine whether primates exhibit longer tag persistence, increased PRP synthesis, altered capture efficiency, or merely a shifted temporal window. A temporal series (e.g., {plus minus}15, {plus minus}30, {plus minus}60, {plus minus}90 min) would substantially strengthen the mechanistic interpretation.
  
  We thank the reviewer for this insightful comment regarding the mechanistic interpretation of the STC findings. In the present study, we selected the 30 min interval based on well-established classical STC paradigms in rodents, where this interval reliably falls within the effective tagging and capture window. Using this experimentally validated interval allowed us to directly compare whether TBS is sufficient to support STC in primates versus rats under equivalent experimental conditions. Accordingly, the primary objective of this study was to determine whether TBS-induced STC varies across species, rather than to comprehensively define the temporal dynamics of the tagging window.
  
  We agree, however, that the current experiments do not distinguish whether the primate-specific effect reflects prolonged tag persistence, enhanced plasticity-related protein (PRP) synthesis, altered capture efficiency, or a shifted temporal window. Addressing these possibilities would indeed require systematic temporal interval analyses (e.g., ±15, ±30, ±60, and ±90 min), which represent important future directions. Such experiments are particularly challenging in non-human primates because the availability of primate tissue and experimental resources for large-scale electrophysiological studies remains limited and is currently beyond our experimental capacity due to substantial ethical, logistical, financial, and technical constraints.
  
  Nevertheless, we fully agree with the reviewer that these experiments are important for advancing the mechanistic interpretation of the findings. Similar temporal analyses have recently proven informative in our rodent studies (Chong YS, Ang SR, Sajikumar S. Commun Biol. 2025;8:553). Importantly, we are currently in the process of obtaining additional non-human primate samples and plan to extend the present work by examining an additional 60 min temporal interval to further characterize the temporal properties of synaptic tagging and capture in non-human primates.
  
  (3) Species differences may reflect tissue quality or preparation differences
  
  The manuscript compares 5-7 week-old rats with 5-7 year-old monkeys. These are very different developmental stages. Moreover, euthanasia methods, extraction procedures, and post-mortem handling are different. These factors can affect BDNF expression, protein synthesis, LTP magnitude, and transcriptional responses. The authors should discuss these caveats more explicitly.
  
  We thank the reviewer for raising this important and insightful point. We agree that differences in developmental stage between the experimental groups represent an important consideration when interpreting potential species-dependent effects. In the present study, rat experiments were performed in 5-7 week-old animals, whereas non-human primate (NHP) tissues were obtained from 5-7-year-old monkeys. This difference largely reflects the practical, ethical, and logistical constraints associated with NHP research and tissue availability. We acknowledge that these ages are not developmentally equivalent and that maturation state may influence BDNF signaling, protein synthesis capacity, synaptic plasticity thresholds, and transcriptional responses relevant to late-LTP and STC mechanisms.
  
  We also recognize that differences in euthanasia procedures, tissue extraction, slice preparation, and postmortem handling between rodent and primate tissues may influence tissue physiology and electrophysiological properties. Although extensive care was taken to optimize tissue viability and maintain stable recordings within each species, these variables cannot be completely excluded as contributing factors to the observed differences.
  
  Accordingly, we will revise the Discussion section to more explicitly acknowledge these limitations and clarify that our findings support potential species-dependent differences under the present experimental conditions, rather than definitive intrinsic species-specific mechanisms. Nevertheless, despite the inherent challenges associated with NHP electrophysiological studies, we believe that the present findings provide an important initial framework for understanding the translational relevance of synaptic tagging and capture mechanisms across species.
  
  (4) Statistical reporting is incomplete
  
  Many comparisons report exactly Wilcoxon p = 0.0313 and U-test p = 0.0022, across numerous experiments. This suggests very small sample sizes and discrete nonparametric distributions. The manuscript should report exact n values for each comparison, effect sizes, and confidence intervals.
  
  Second, many genes and proteins are tested. No correction for multiple testing is described. The authors should state whether corrections were applied, and if not, justify this choice.
  
  We thank the reviewer for this important comment regarding statistical reporting and interpretation. We agree that the repeated occurrence of identical exact p-values in several nonparametric analyses reflects the relatively small sample sizes and the discrete nature of the statistical distributions. This issue is particularly relevant for the NHP experiments, where biological replication is inherently limited because of the substantial ethical, logistical, financial, and technical challenges associated with obtaining and processing primate tissue.
  
  In the revised manuscript, we will provide exact n values for all comparisons, including the number of biological replicates (animals) and slices where applicable. We will also include additional statistical details, including effect sizes and confidence intervals where appropriate, to improve transparency and facilitate interpretation of the reported findings. Furthermore, we are currently in the process of obtaining additional NHP samples and will attempt to include more biological replicates in the revised version to further strengthen the robustness of the analyses.
  
  We also agree that the issue of multiple testing should be addressed more explicitly, particularly because multiple genes and proteins were examined. In the revised manuscript, we will clearly state the statistical correction methods applied for multiple comparisons where appropriate. For analyses in which corrections were not applied, we will provide justification, noting that several experiments were based on hypothesis-driven candidate targets rather than exploratory large-scale screening analyses. These statistical considerations will be clarified in the Methods and Results sections.
  
  (5) Interpretation and significance
  
  The study addresses an important and understudied question: whether associative synaptic plasticity mechanisms differ between rodents and primates. The finding that TBS can support STC in the primate hippocampus is potentially novel and impactful. However, the mechanistic evidence remains incomplete, the molecular analyses are underpowered, and several key controls are missing. At present, the data support the conclusion that under the specific experimental conditions tested, TBS-induced plasticity in primate hippocampal slices exhibits greater associative persistence than in rat slices.
  
  The stronger claims regarding evolutionary specialization, fundamentally distinct plasticity rules, altered STC thresholds, and redundant BDNF/PKMζ architecture require additional experimental support.
  
  We thank the reviewer for this thoughtful and balanced assessment of our work. We agree that the present data primarily support the conclusion that, under the specific experimental conditions examined, TBS-induced plasticity in primate hippocampal slices exhibits greater associative persistence than that observed in rat slices. We also agree that broader interpretations regarding evolutionary specialization, fundamentally distinct plasticity rules, altered STC thresholds, and potentially redundant BDNF/PKMζ-related mechanisms require additional mechanistic investigation and experimental validation.
  
  Accordingly, we will moderate these interpretations throughout the revised manuscript and clearly state that these conclusions remain preliminary. We will further emphasize that additional experiments, including increased biological replication, expanded temporal analyses, and further mechanistic investigations, will be necessary to more conclusively define the basis of the observed species-dependent differences. Within our current experimental capacity, we are actively working to obtain additional non-human primate samples and plan to incorporate additional biological replicates and key follow-up experiments in the revised version to further strengthen the robustness of the findings.
  
  At the same time, we believe the present study provides an important initial contribution to an understudied area by directly examining synaptic tagging and capture mechanisms in the primate hippocampus. Given the limited availability of non-human primate electrophysiological data in the field, these findings may offer a valuable framework for future studies investigating the translational and evolutionary relevance of associative synaptic plasticity mechanisms across species.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this manuscript, the authors have undertaken an investigation of differences between two mammalian species, the brown rat and the crab-eating macaque, in the mechanisms supporting a well-established model of long-term Hebbian synaptic plasticity, Schaffer collateral to CA1 Long-term potentiation (LTP) in the hippocampus. LTP has been long-studied and deeply characterised due to its potential importance in modeling a strong candidate process for the central mechanism of learning and memory. LTP was first discovered in lagomorphs (rabbits), but has since been much more widely studied in rodents (mostly rats and mice), and there has been some complementary work revealing LTP in non-human primates and even in humans, revealing largely overlapping canonical mechanisms of induction, expression, and maintenance. More specifically, this study puts a particular focus on the fascinating associative features of this form of lasting synapse-specific modification, in which a synaptic input can be stimulated with a relatively weak induction protocol that will not produce lasting plasticity on its own, but can undergo lasting LTP if paired with stronger stimulation on a separate synaptic input to the same neuron. This associativity mechanism is particularly attractive within the Hebbian synaptic plasticity framework as it provides a candidate mechanism for associative forms of learning in which stimulus-stimulus, stimulus-reward, stimulus-punishment, or action-outcome associations are formed. A particularly attractive feature of this associative LTP is that there can also be a substantial time-lag between the strong stimulation of one pathway and the weaker stimulation of the other synaptic input, which only undergoes lasting LTP by hijacking the proteins synthesized as a result of strong stimulation elsewhere. This observation has led to the famous tagging and capture hypothesis as an explanation of how such synapse-specific change can be achieved on both stimulated inputs but not on other synaptic inputs, given the potential requirement for cell-wide protein synthesis. This theory, for which there is very strong experimental evidence, posits that a protein tag is left at synapses that have been stimulated with sufficient vigor in recent history, serving as a key mechanism to ensure that those weakly stimulated synapses will undergo change when a larger-scale LTP event occurs due to stronger stimulation elsewhere within a relevant time window. Again, this idea is attractive as it can explain how we might form associations between events that occur slightly separated in time. The manuscript goes on to show that an induction protocol that is particularly physiologically relevant, theta burst stimulation, produces this tag and capture associative effect in ex vivo slices of Macaque hippocampus, much more readily than in side-by-side ex vivo slices of rat hippocampus. Moreover, the manuscript delves into the importance of well-characterised LTP maintenance mechanisms, including PKMzeta and BDNF, which are key factors that ensure that altered synaptic change is maintained for long periods of time despite substantial molecular turnover in the neuron. The observation in this manuscript is that a degree of redundancy for these mechanisms exists in the primate species but not the rodent species, as both mechanisms need to be inhibited to return LTP to baseline in the Macaque, but only one needs to be inhibited to have that effect in the rat. A major emphasis of this study is that there may be a step-wise difference in associative learning mechanisms between rodents and primates that may contribute to their differing cognitive capacities, although I believe a lot more evidence would be required to reach that conclusion.
  
  Strengths:
  
  The strengths of this study are that it is technically very proficient and is from a laboratory that has a long history of seminal work on synaptic tagging and capture. The cross-species comparison, particularly involving non-human primates, is also very hard to achieve, and a major strength here is the side-by-side comparison of slices from rat and monkeys. Further strengths of the study are the use of a number of experimental strategies, including both observation and intervention, to demonstrate differential involvement of LTP maintenance mechanisms. A final major strength is conceptual, as it is undoubtedly useful not only to identify shared mechanisms of plasticity between commonly used model organisms and either humans or much more closely related species such as old world monkeys, but also to reveal differences that have the potential to contribute to differences in memory/cognition.
  
  Weaknesses:
  
  The findings of this study are a very useful building block for understanding how generalisable mechanisms of LTP are. However, arriving at really substantial conclusions from these findings is challenging, as there are a number of variables that are unaccounted for in this study that may explain the differences that have been observed between rats and monkeys. One example of a potential confound to these interpretations is that rats are nocturnal/crepuscular animals, and macaques are diurnal animals. Thus, to undertake a like-for-like comparison, it would be necessary for the rats to be on a reversed light-dark cycle to ensure that the wake cycle of the rat (dark) is being compared with the wake cycle of the monkey (light). It is possible that the authors have done this, but it is not mentioned in the methods section. The reason this is important is that there is a substantial body of work indicating that different mechanisms are at play in hippocampal LTP during wake and sleep. Transcripts and proteins related to synaptic function are dramatically differentially regulated during sleep-wake cycles, and phosphorylation states of key proteins involved in plasticity are also altered. Moreover, synaptic tagging and capture are specifically disrupted by sleep deprivation. Perhaps the authors have already considered this factor and appropriately reversed the light-dark cycle of their rat subjects, in which case a clarification in the manuscript would be useful. Nevertheless, I have used this as an example because there is a variety of potential confounds that may explain the difference between SC-CA1 TBS LTP in rats and monkeys, e.g., circadian rhythms, degree of enrichment, natural light vs indoor lighting, diet, degree of inbreeding, strain, etc. Thus, to make strong conclusions about the potential for differences in plasticity rules/mechanisms and how those may contribute to differences in cognition, I think it would be necessary to compare a wider variety of species, including a good representation of each order (e.g., nocturnal rats and diurnal squirrels, new and old world primates) and not just a single exemplar. I understand, of course, that this is really pushing the boundaries of practicality, but I see no other way to make a strong conclusion or to generalise to mechanisms or properties of plasticity in rodent’s vs primates. Thus, while I believe the manuscript presents really admirable work, I am not sure the findings are at all easy to interpret.
  
  We thank the reviewer for this thoughtful and insightful comment, as well as for the encouraging appreciation of our long-duration plasticity recordings and associative plasticity experiments, which are both technically demanding and time-intensive. We fully agree that interpretation of cross-species differences in synaptic plasticity requires careful consideration of multiple biological and environmental variables, including circadian state, enrichment conditions, strain differences, diet, lighting conditions, and species-specific behavioral ecology.
  
  Regarding the specific concern related to circadian phase and sleep-wake state, the reviewer raises an important point. Rats are nocturnal animals, whereas macaques are diurnal, and hippocampal plasticity mechanisms are known to be influenced by circadian rhythms and sleep-dependent regulation of synaptic proteins and signaling pathways. Previous studies have demonstrated modulation of LTP, synaptic tagging and capture and protein synthesis in rats across normal sleep-wake cycles. We therefore agree that these factors may influence plasticity outcomes and should be carefully considered in comparative studies.
  
  Studies have further shown that theta frequency is highly sensitive to sleep-related manipulations. Specifically, theta frequency decreases immediately after sleep, remains elevated during sleep deprivation, and rapidly declines following recovery sleep. In aged animals, these effects appear comparatively attenuated, suggesting reduced sleep-dependent modulation of theta dynamics with aging. Therefore, disruption of normal circadian or sleep-wake patterns may significantly alter theta activity and associated plasticity mechanisms within a species and may not accurately reflect physiological baseline states (Utku Kaya et al., 2026).
  
  In our experiments, recordings from rats and macaques were performed during their respective active phases under standardized laboratory housing conditions, and we will further clarify these details in the revised Methods section. Nevertheless, we acknowledge that circadian state and related physiological variables cannot be completely excluded as contributing factors to the observed differences between species.
  
  More broadly, we agree with the reviewer that the present study does not permit definitive conclusions regarding universal “rodent versus primate” rules of synaptic plasticity. Our intention was not to propose a generalized dichotomy between rodents and primates, but rather to report that, under the experimental conditions used here, SC-CA1 TBS-LTP and associated synaptic tagging mechanisms differed between rats and macaques. We agree that broader evolutionary or cognitive interpretations would require systematic comparative analyses across multiple species, including both nocturnal and diurnal rodents as well as diverse primate species. Such studies would provide a stronger framework for distinguishing conserved versus species-specific mechanisms of plasticity.
  
  At the same time, we believe the present findings remain important because they provide one of the first direct experimental comparisons of SC-CA1 TBS-LTP-associated plasticity mechanisms between rodents and non-human primates under controlled ex vivo conditions. Although the interpretation should be done cautiously, the observed differences raise the possibility that certain metaplastic or protein synthesis-dependent mechanisms may not be fully conserved across species. Accordingly, we will revise the Discussion section to better emphasize the exploratory and comparative nature of the study, while explicitly acknowledging the limitations and potential confounding factors highlighted by the reviewer.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.05.18.725835v1
www.biorxiv.org www.biorxiv.org

Estimating the replicability of Brazilian biomedical science

1
1. Public_Reviews 09 Jul 2026
  
  in eLife
  
  Author response:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This article describes a very ambitious metascience project aimed at testing the reproducibility of a corpus of publications conducted in Brazil. The strength of the approach lies in its systematic, multicenter replication design. The authors focus on three commonly used experimental paradigms in biology: the MTT assay, RT-PCR, and the elevated plus maze.
  
  The effort is commendable and reveals a rather low rate of reproducibility, in line with findings from fields considered less reproducible in the life sciences, such as cancer biology.
  
  Strengths:
  
  The study is supported by a substantial dataset, incorporating multiple independent replication attempts and the use of stringent, well-defined protocols, which strengthens confidence in the overall conclusions.
  
  We thank the reviewer for the comments.
  
  Weaknesses:
  
  (1) Being neither an expert in metascience nor in statistics, I cannot fully judge the methodological aspects of the article or its extensive supplementary material. I will therefore focus my comments on readability. I found the manuscript difficult to digest. The authors should improve readability if they wish to reach a broad audience of experimental biologists. In particular, they should simplify the description of protocols and highlight the key findings more clearly, using accessible language. See specific points below
  
  We can try to simplify the description of protocols at specific points for example, by providing an overarching description of the study design in the beginning of the Methods, rather than citing our previous eLife paper (Amaral et al., 2019), as suggested below. The methods are indeed quite extensive, but the this may be inevitable in a large-scale project such as this and we note that Reviewer #2 thought that part of the supplementary material should be incorporated back in the main text, which is a suggestion in the opposite direction. It may thus be hard to strike a balance between readability and comprehensibility that can address both reviewers’ opinions.
  
  (2) The article appears to oscillate between:
  
  (i) a description of the approach and the inherent challenges of such a multicenter replication program
  
  (ii) an estimation of reproducibility.
  
  These could potentially form two separate articles: one aimed at a broad audience emphasizing key results, and another focused on methodological aspects for a more specific metascience audience. The Results section currently contains redundancies and is difficult to follow for non-experts in statistics. I also find it challenging to extract the main findings.
  
  There is a bit of redundancy between tables and text, but this was intentional to make both of them self-explanatory. We also think stating the results in the text can allow us to make each of the replication criteria clearer, a concern that was also mentioned by the reviewer.
  
  As for requiring particular expertise in statistics for understanding, we mostly disagree. The main results (Tables 1 and 2, Figure 2) are expressed as percentages, and the only statistical concepts needed for interpreting these results are understanding prediction and confidence intervals. For this, we could provide a bit more guidance on their interpretation in the Methods section. Beyond that, most of the secondary results (e.g. Figure 3 and Figure 4) involve linear correlations, which is about as simple as statistical analysis gets.
  
  Of the results presented in the main manuscript, only Table 3 contains anything beyond percentages and correlations. We do agree that the meaning of each ratio in this table could be more clearly described, but there are essentially no expert-level statistics involved in their calculations.
  
  Other than that, the main statistical issues are the ideal way to aggregate the results from different replications for which we use different strategies for robustness purposes. However, all of these results are already in the supplementary material, so we don’t feel they interfere to much with the readability of the main manuscript.
  
  A possible improvement would be to include an initial section clearly describing the protocol (replication of a single experiment, across several labs, for three types of assays), followed by a concise presentation of the main results regarding reproducibility in Brazilian science with subsections.
  
  This is indeed a good idea, and we plan to include an initial overarching description of the project in the Methods section of the revised manuscript.
  
  Methodological details could be moved either to a Supplementary Information or to a more specific article, while being summarized in the Discussion.
  
  Again, this is the opposite of what was suggested by Reviewer #2, so we would rather keep the Methods section more or less at its current level of detail.
  
  (3) This study evaluates the reproducibility of a single experiment from each article, taken out of its broader context. While this provides an estimate of reproducibility, it does not directly contribute to resolving uncertainties within a specific field. This may represent a limitation compared to other reproducibility projects that attempt to replicate multiple key claims within a given study (e.g., in cancer biology or Drosophila immunity). I found that a weakness is that it does play a role in cleaning a field of wrong statements.
  
  The reviewer is correct in his interpretation. Evaluating the main findings of articles or cleaning a field of wrong statements was never a goal of our study (and we were clear about this from the start). Our aim with the project was metascientific (i.e. evaluate the reproducibility of biomedical experiments with a set of common methods) rather than driven by a particular interest in the findings themselves. This is reflected by our choice of selecting experiments from a random sample of articles from multiple fields, rather than filtering by area of interest or importance. It also underlies our choice to evaluate experiments rather than claims, as this was more statistically tractable and potentially more objective as a meta-research goal.
  
  To be clear, we don’t feel this approach is inherently better or worse than evaluating claims in the literature, as in the Drosophila immunity article case (i.e. Westlake et al., 2026), which is also an important goal. They are merely approaches that answer different questions. Ultimately, we probably made our choice based on (a) our expertise/interest in meta-research rather than in the fields the replications stemmed from and (b) an attempt to engage Brazilian researchers in the project in a way that was non-confrontational and minimized backlash from their peers. We feel this was valuable for many of the lessons learned, although it also meant learning less about the research findings in question.
  
  Even though this was not a goal of the study, there is some knowledge obtained about the findings that is indeed largely absent from the current manuscript. We do not feel the current format allows for much discussion of 45 different findings, but we do have plans to address these in future articles (as outlined in our response to point 5). In the meantime, qualitative descriptions of each experiment can be found at https://osf.io/w5z9a. This is already mentioned in the Methods but could be reiterated in the results as well.
  
  (4) The observation that external observers can predict which experiments are likely to be reproducible is interesting and should be more clearly emphasized.
  
  We did not go too deep into that finding because we are publishing a separate article focused on the prediction project, which should look into factors that correlate with prediction accuracy, both at the level of predictors (e.g. research field, career level) and of individual predictions (e.g. information taken into account for each answer). We also feel that, given the multiplicity of predictors in the prediction analyses, these findings are a bit tentative, as the strongest predictors may be subject to effect size inflation from the “winner’s curse” effect (as outlined by Reviewer #2). We can try to emphasize it a little more in the discussion (although it already merits a whole paragraph on pages 23-24), but we feel we would be able to discuss it more critically in a follow-up article.
  
  (5) The manuscript frequently refers to future publications. It would be helpful to clarify what is included in the present article versus what is deferred to subsequent papers.
  
  Indeed, some of our results did not fit this overarching analysis and were left for future publications. One of them is already available as a preprint, while the others are currently in preparation. Specifically, other results from the project should be spread about across five different articles.
  
  (a) A narrative article focused on challenges and lessons learned with the project, already published as a preprint at https://osf.io/preprints/metaarxiv/8y3tg_v1 (Amaral et al., 2026).
  
  (b) An article analyzing the prediction survey and markets results in detail (following the pre-analysis plan detailed in https://osf.io/6av7k/files/pjhgd and adding some exploratory analyses on prediction rationales).
  
  (c) Three articles describing the results of specific experiments with each experimental method (MTT, PCR, elevated plus maze) along with a discussion of aspects inherent to the method that seem to influence reproducibility.
  
  We can add this information more explicitly to the Methods section, including the links to the papers that have already been published at the time the manuscript is revised.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This is an important contribution to science, not only because large-scale replication studies remain rare despite their value, but also because this one focuses on research that was underrepresented in previous large-scale efforts. The findings reveal concerningly low replicability in this field, pointing to a problem that warrants immediate attention. Particularly noteworthy is the study's sampling strategy: by randomly selecting experiments from a wide range of publications based on methods, rather than filtering by research area, importance, or citation counts, the authors have produced results that are potentially more representative of the broader literature than those of previous large-scale replication projects in this and other fields. Overall, this is a fantastic contribution that I will be recommending and using in all my open science talks, and from which I have learned a great deal. Congratulations to the team!
  
  Thanks!
  
  Strengths:
  
  A study of this scale inevitably requires an enormous amount of work and methodological care, and this one is clearly both robust and thoughtfully designed. I want to particularly acknowledge the considerable efforts the authors have made to ensure the robustness of their findings. The use of multiple approaches to estimate replicability, combined with a substantial battery of sensitivity analyses, including a multiverse approach on top of everything else, clearly reflects the authors' genuine commitment to understanding their results and the limits of their conclusions. The transparency and sharing of all protocols, materials, and challenges and limitations encountered is also outstanding.
  
  We once more thank the reviewer for the compliments.
  
  Weaknesses:
  
  There were several instances during my reading of the methodology where I felt the authors relied too heavily on the external supplementary materials, at the expense of basic detail in the main manuscript. I appreciate how overwhelming it can feel to integrate more into an already substantial paper, but without some minimum integration, the reading experience and overall comprehension are too often compromised, at times posing more questions than answers. And it is unrealistic to expect most readers to engage with the extensive supplementary materials provided. Please see the comments below for specific suggestions.
  
  We do acknowledge that the article currently includes a lot of supplementary material. This includes both supplementary figures/tables relating to the paper and many supplementary methods files (mostly hosted at the Open Science Framework). However, we also note that this is already a rather long paper as it stands and that Reviewer #1 has made the opposite suggestion of simplifying it. Thus, it may be hard to strike a balance that will suit all preferences, and we feel that maybe our attempt has landed somewhere in the middle of both reviewers’ ideal versions of the paper.
  
  Additionally, I found the discussion rather underdeveloped. There is relatively little engagement with the broader literature, not only with replicability studies from other fields, but more generally with relevant meta-research work on publication bias, blinding, risk of bias, citation practices, etc. Some of the most novel and interesting findings in the paper also receive less attention than they deserve, and the discussion at times reads as a repetition of the results section rather than a critical engagement with them. I would encourage the authors to engage more deeply here, as the study clearly has much more to say. Doing so would further highlight why this study is important for the answers it provides and the questions it can spur. Again, please see the comments below for specific suggestions.
  
  We can try to engage with some of the above-mentioned literature in more depth in particular replication studies from other fields (some of which have appeared after our preprint (e.g. Tyner et al., 2026) and with the risk of bias and transparency literature (e.g. Serghiou et al., 2021). That said, we note once more that the article (and the Discussion section) are already quite long, and that analyzing each of these articles in depth is likely to be unfeasible.
  
  Specific suggestions:
  
  Page 1, abstract: "while t values for replications were positively correlated with researcher predictions about replicability, and negatively correlated with the rate of publications by the original article's last author" - I need to address the question: why t values and not effect sizes, p values, or something else? Update after reading the study: although the authors used others, they seem to place more emphasis on t values, which is not well explained. Without a clear explanation, it just left me wonder why, given that effect sizes would, in principle, be more information.
  
  Our original plan was to use p values as a predictor (see protocol at https://osf.io/9rnuj), but we later realized this was inadequate as it did not account for effect direction (i.e. significant effects in the opposite direction as the original may yield low p values, but this should not count as replication success). We thus switched to t values to be able to assign positive and negative signs depending on effect size direction. We note that, as we are using non-parametric Spearman coefficients (in which the module of t correlates negatively with the p value), the two approaches are effectively equivalent when original and replication effects have the same direction. This change was accounted for and justified in our list of protocol deviations at https://osf.io/9hj7t.
  
  Effect size (in relative terms) is already being used in the second predictor in the analysis (i.e. effect size decrease), as our idea was to use one significance-based predictor and one effect size-based predictor, to match what was done for the replication rates). We feel that using relative effects (e.g. response ratios) by themselves may not be as adequate, as for experimental methods with large coefficients of variation and/or low sample sizes (especially PCR ones), one can find large relative effects that are nevertheless far from statistical significance. This also makes relative effects not very commensurable between methods.
  
  We do believe there is a fair argument, however, to use standardized effect sizes as an alternative to t values (i.e. difference measured in standard errors of the mean) to measure significance/evidence strength. As some replications ended up underpowered, low t values may sometimes be due to insufficient statistical power/low sample size rather than replication failures. Using standardized effect sizes is not devoid of pitfalls (e.g. they can be quite variable when sample size is low), but it is worth doing as a robustness analysis.
  
  That said, there are a few statistical issues to be decided on how to calculate this (e.g. whether studies should be meta-analyzed using standardized mean differences rather than relative ones for this purpose, or whether an analog of the standardized effect size should be calculated for the log ratio of means). We would have to look more carefully into the multiple possibilities to decide on the best approach (and we do accept suggestions!).
  
  In the meantime, we note that running the prediction analysis using only experiments with ≥80% power yields a slightly higher correlation of t scores with researcher predictions (ρ = 0.49, p = 0.005), so we do not think that these underpowered experiments affect the trend too much. If anything, they could be masking a higher correlation between researcher predictions and replicability.
  
  Page 2, paragraph 2: "reproducibility (defined here as reaching the same results when analyzing a set of data)" - In my opinion, this definition is vague enough that it encompasses not only reproducibility (same data, same methods) but also robustness (same data, different methods), and I would therefore recommend providing a more precise definition. The same applies to replicability (different data, same methods), since the definition used does not highlight the importance of using the same methods, and thus also encompasses generalisability (different data, different methods). Explicitly clarifying these distinctions is particularly important as the field grows and the terms become increasingly mixed up and confusing.
  
  We agree that we should make the description more precise (e.g. “reaching the same results when analyzing a set of data in the same way” for reproducibility and “finding similar results with new data collected under similar conditions” for replicability). We will update these definitions in the revised manuscript.
  
  Page 2, paragraph 3: "All of these issues raise concerns about the replicability of published results - something that has not been evaluated systematically in the country" - I would suggest providing more information about why those factors may lead to expected lower replicability, ideally with a couple of sentences supported by references. As it stands, less experienced readers may not follow the argumentation and may consider it speculative.
  
  We would argue that the reader would be correct in this case: the argument is a bit speculative. It does go in the direction of what is generally accepted within the field (i.e. that publication pressure can lead to lower reproducibility for a range of factors), but we’re not sure this connection has been demonstrated empirically, except for indirect evidence (such as the lower reproducibility in papers stemming from top institutions and “trophy journals” in, the higher frequency of positive results in US states with more researchers in Fanelli, 2010, or the higher number of problematic images for highly productive researchers in some countries in Fanelli et al., 2022. We could cite this evidence in the introduction and make the speculated connection more explicit, perhaps adding modeling work as well (e.g. Ioannidis, 2005; Smaldino & McElreath, 2016) to explain why this could be the case. But essentially, our opinion is that the connection remains a speculation.
  
  Page 3, paragraph 2: "We then opened a public call for Brazilian labs that could replicate experiments using these methods and models, advertised by email, social media and lectures in conferences and institutions, to which 73 labs initially responded" - Since recruiting is an important component of this study, I would recommend providing additional details so the reader can better assess how comprehensive and unbiased the recruitment process was. AND Page 5, paragraph 2: Please provide more information about this open call: how was it advertised, where, and when? This is needed so that the reader can assess its comprehensiveness and potential biases. Even the link provided is not specific enough to understand the process, as it only states: "Calls were open to participants > 18 years old with current or previous experience in experimental research in any field and were advertised via e-mails, lectures and social media."
  
  We can offer a more detailed description of the recruitment process (e.g. number and distribution of lectures, social media strategy used, etc.), although we would rather do this in a supplementary document so as not to make the Methods section even lengthier. We note, however, that we never aimed to recruit a “representative sample” of labs from the country: we were busy enough trying to get enough labs for the project to happen, and aware that the call would be inevitably biased by our own communication capabilities and personal networks.
  
  That said, the response rates for different regions of Brazil do generally match the distribution of research labs and graduate programs within the country (with some distortions likely caused by our personal networks, such as the large number of labs in Rio de Janeiro state), and seem to indicate a rather wide dissemination of the call. One way to visualize this would be to present the distribution of corresponding articles from the original studies selected for the replication (or even from the whole sample of articles obtained for experimental selection) along with the distribution of labs at different stages of the project in Figure S3, which generally show similar patterns. This would actually lend support to our statement that “the population of labs that performed replications was largely similar to the one that produced the original results” in the discussion.
  
  Page 3, paragraph 2: "Based on the expertise of respondents and a feasibility analysis by the coordinating team, we selected 3 outcome assessment methods for replication" - Since this choice determined what was ultimately studied and who could participate, I would like to see more information to understand it: was it based on the most common expertise among respondents? How was feasibility defined and estimated?
  
  We tried to find the combination of methods that would maximize the number of labs that would be included in the project. This is explicitly stated in our Methods Selection document at https://osf.io/qxdjt, but could be stated more explicitly in the paper as well.
  
  Page 3, paragraph 3: How was the manual screening performed? Was it done by one or more people? Was there double-screening to ensure reliability of the screening protocol? Did the authors use a specific decision tree or tool? How were conflicts between observers resolved? Were any other validation steps taken to ensure reliability? The same comments apply to the data extraction (who, how many, validation, protocol, etc.).
  
  We initially used single screening by three different reviewers (see https://osf.io/6av7k/files/u5zdq for criteria), as we were merely looking for a sample of experiments; thus, comprehensive inclusion of all eligible studies was not a priority. After this initial screening step, inclusions were confirmed in a consensus meeting with the three reviewers involved.
  
  Data extraction was also done by a single individual, but the resulting data led to a protocol that was later checked by two reviewers who had access to the paper and were explicitly oriented to judge whether the protocol consisted in a valid replication. Thus, discrepancies between what was in the paper and what was included in the protocol could potentially be flagged at these stages (as they were in many cases). We do note, however, that this is likely not as effective to prevent errors as having data extracted independently, as reviewers may overlook mistakes more easily when comparing two documents rather than extracting data anew. We did find that some errors in extraction slipped by, such as an MTT experiment where treatment concentration was inadvertently changed from mM to μM in a particular protocol step; this was picked up and corrected by 2 out of the 3 labs, but not by the third one, leading the latter replication to be invalidated.
  
  Page 3, paragraph 3: As a non-expert, I would need more context about the expected average cost of experiments in this field; otherwise, I cannot assess how representative this sample is or whether potential biases may exist (e.g., cheaper experiments perhaps being expected to be less replicable than more expensive ones). Could expected costs also have affected the reduction in geographical coverage eventually observed in this study (Figure S3)?
  
  As stated in the manuscript, we initially capped experiments at a predicted cost of R$ 5.000 (around USD 1336 at that time), considering reagent cost alone (as equipment and labor was provided by labs), as mentioned in the manuscript. Exclusion rates for that reason were 12/74 (16%) for MTT experiments, 36/132 (27%) for PCR ones and 4/40 (10%) for EPM ones. This is stated at
  
  This turned out to be an underestimation in many cases, especially as it did not account for pilot experiments, need for repetition, etc; thus, many experiments ended up costing considerably more than that ceiling. As we had included a contingency fund for those cases which we expected would occur , we avoided removing experiments from the sample for this reason as much as possible. Nevertheless, one elevated plus maze experiment ended up not being replicated for cost reasons, as the necessary rat strain was provided by a single facility in the country, meaning that a large number of rats would have to be acquired and transported to all labs at a cost that we were not able to cover.
  
  As these costs were covered by the coordinating team, we do not feel that this is likely to underlie the reduction in geographical coverage. Other reasons related to lab structure could have led to labs in less well-resourced regions to leave the project, but they probably has nothing to do with the experiments selected.
  
  That said, the cost cap does mean that the selection of experiments is not completely representative of the literature, but is enriched in relatively cheap and simple experiments which were able to perform (which was our next step for selecting the final sample of experiments. Exclusion rates due to lack of lab expertise and/or infrastructure to perform the experiment were 21/56 (37%) for MTT experiments, 67/89 (75%) for PCR ones and 7/34 (21%) for EPM experiments.
  
  We will try adding some of this information to the flowchart in Figure 1, as we agree it provides more context on the representativeness of the selected experiments.
  
  Page 6, paragraph 2: "(on a scale of 1 to 5)" - Could you clarify whether 1 means no deviations and 5 means everything deviated? Is that how it was phrased to participants? Was there a threshold used by the coordinating team to decide how many deviations were acceptable? (I would briefly clarify all scales mentioned below to allow easier interpretation throughout.)
  
  The scale ranged from 1 (No relevant differences) to 5 (Very relevant differences that prevent considering the study as a direct replication). This scale was used for both the lab and the validation committee scores, and is described at https://osf.io/xgth2 (debriefing protocol) and https://osf.io/e3fjg (validation protocol).
  
  For the validation committee, we did use a threshold (any score of 4 or a sum of scores of 10 or more among 3 evaluators) to decide what had to be discussed to decide on inclusion, as mentioned on Page 7 of the Methods. For the labs, we used no threshold labs answered the protocol deviation question as a scale, but the decision of whether to consider the study a valid replication or not was not tied to this score.
  
  We can make both of these points (meaning of the scale and connection to lab’s decision to consider the replication valid) clearer in the Methods section.
  
  Page 6, paragraph 4: How were long-text answers (e.g., justifications) reviewed? Was this done manually by one or more members of the coordinating team, or using any text interpretation tool? What steps were taken to ensure the interpretation of these answers was as objective as possible?
  
  For the initial analysis of justifications, one reviewer read all answers and flagged those that seemed to concern reproducibility of the methods (e.g. “we replicated the protocol exactly as planned”) rather than results reproducibility (e.g. “effects went in the opposite direction”). We then revised these answers among the whole coordinating team to decide whether we should contact the lab asking them to revise them. We can add this information to the Methods section.
  
  For classifications of the justification into categories (i.e. Table S7), justifications were classified by two independent reviewers based on categories created after an initial inspection of the data, and discrepancies were resolved by consensus. We can add this information to the table legend.
  
  Page 8, paragraph 1: "If issues were found, the lab and coordinating team reviewed them via email until the sources of errors were identified and corrected (see https://osf.io/58vsx for details)." - Could you please provide information about how often these disagreements arose and briefly explain their causes? I am struggling to understand why these discrepancies occurred and how frequently. Without more detail, the error rate presented in the next paragraph is a little concerning.
  
  After we extracted data from the lab spreadsheets and summarized the results by code, labs received the results by e-mail and were asked to fill in a form on whether the results were in agreement with what they had found (see details at https://osf.io/nfr6y). Discrepancies in results at least 1 experiment were noted by 36% of the 53 (out of 56) labs that responded. Many of these stemmed from the coordinating team misunderstanding issues such as group identity or experimental unit identification in the spreadsheet. Others had to do with different ways to perform calculations (e.g. relative gene expression or % time spent in open arms). In some cases, simple errors in data transcription or typos caused the discrepancy.
  
  We were also surprised (and concerned) by the number of experiments in which we later found data errors that were not detected by this process (e.g. 18% of total). Our best understanding of this is that not every lab checked the results with the necessary care, as some errors were quite obvious, as in experiments in which sample size was different, or in which group labels were reversed. Ultimately, agreeing with a form that says “did you find any discrepancies?” may have been performed as a box-ticking exercise with little attention, and was probably not the ideal way to check data which led us to start reviewing results in live meetings afterwards. This is discussed in more detail in our challenges article (Amaral et al., 2026)
  
  Page 8, paragraph 4: Please provide the version of any package or software used throughout, and make sure to cite R appropriately (R Core Team XXX).
  
  R 4.5.1 was used for the analysis. We can add this information (which was present in the data repository in the R session info.txt file) and provide the R reference in the manuscript as well.
  
  In addition, did the authors calculate the log ratio of means (ROM/lnRR) using escalc()? If so, please report this.
  
  If not, I would recommend doing so, as escalc() implements recommended small-sample adjustments that produce slightly different values compared to a simple manual calculation of log(mean1/mean2).
  
  Yes, we did use the escalc() function for this calculation (for both the replications and the original effect sizes). We can mention this in the manuscript.
  
  Page 10, paragraph 1: "Coefficients of variation from the original study were compared to the mean coefficient of variation of its replications using Wilcoxon's signed rank test" - I wonder how these CVs were calculated - whether simply as SD/mean or using escalc() from the R package metafor, which includes a correction for small-sample size. This may affect the fairness of the comparison, particularly since CVs from original studies are expected to be slightly overestimated given their smaller sample sizes relative to the replications.
  
  We calculated the coefficients of variation as the pooled SD divided by the mean of both group means. The reviewer is correct about the possibility of small-sample effects in this case (which we were not aware of). We will thus look into the possibility of implementing this via the escalc () function in the analysis of the revised manuscript.
  
  We also acknowledge that this could be a source of bias in the comparisons between original and replication CVs (albeit likely a minor one). That said, we note that sample sizes are not always larger in the replication for some experiments with large original effects, power calculations sometimes yielded lower sample sizes in the individual replication, albeit infrequently. On average, though, replication sample sizes were indeed larger.
  
  I also have concerns about using the mean CV of all replications and comparing it to a single CV value, as this ignores the uncertainty around that mean.
  
  This is indeed the case; that said, the CV of the original effect also has random error relative to the true population CV and in that case, there is no way to estimate the uncertainty, as we have a single measure of that parameter. So there is probably no way around ignoring uncertainty in this case.
  
  We also note that we are looking for evidence of systematic CV inflation across all experiments (rather than for a statistically robust comparison between the CVs of any individual replication). For the sake of measuring this systematic inflation, the use of multiple experiments does allow us to estimate variability at the experiment level which should incorporate the lower-level variability between individual replications if this is not included in the model. Thus, we do not feel that our procedure introduced a systematic bias in the analysis at the experiment-level (although one could argue that it may lead to less precision).
  
  An additional check could involve calculating the log coefficient of variation ratio (lnCVR; Nakagawa et al. 2015, Methods in Ecology and Evolution; implemented in escalc()) between the original CV and each replication CV, and running a random-effects (or multilevel) meta-analysis that accounts for shared-control non-independence. I believe this would provide a more robust approach, as it does not ignore the uncertainty around the mean CV of the replications - uncertainty that, if neglected, is expected to increase the likelihood of false positive findings. This concern would also apply to the subsequent analysis on absolute means.
  
  We thank the reviewer for this suggestion, which indeed seems like an option in this case. We will look into this possibility, although we cannot guarantee at the moment that we will implement it, as we were not previously familiar with the method and will have to study it in more detail.
  
  Page 10, paragraph 2: The change in geographical distribution shown in Figure S3 appears rather striking, with western states disappearing step by step. Should the reader be concerned about the eventual geographical representability of the sample?
  
  Yes, but there are likely different reasons for that. Labs leaving after being included may have been due to those in less privileged regions of Brazil (e.g. the northern and western regions of Brazil, generally speaking) having more difficulty in persisting in the project. That said, most of the “disappearance” happens between registration and inclusion which usually has to do with the labs not working with the methods that were ultimately included in the project. We also note that most of the states that lose representation were those that had a single lab to begin with, which may make the visual pattern more striking than the actual trend (as states in the South/Southeast also lose labs, but don’t disappear from the map).
  
  We note again that we never planned to achieve geographical representativeness when recruiting the labs on the contrary, we were aiming to maximize the number of available labs to run the project. That said, we do agree that for the sake of examining whether the population of labs is similar to the one that generated the original experiments (a claim that we do make in the discussion), this representativeness is important to assess. Once more, to allow the reader to evaluate this, we plan to add an additional map to Figure S3 to describe the Brazilian states where the original experiments came from (based on corresponding author affiliations) in which a similar bias towards the South and Southeast Region can be observed.
  
  Page 15, Figure 3A: I wonder whether adding 95% CIs calculated from the sampling variance of each ratio would improve interpretation and help readers appreciate the real differences between the dots (i.e., means) - along the lines of a forest plot.
  
  We agree that this would be useful information, and can experiment with the possibility, but our feeling is that the figure will likely become too noisy in cases where the 95% CIs overlap (which are quite frequent). If this is indeed the case, an option to allow the reader to examine this would be better to add an explicit link to the forest plots for each individual experiment (https://osf.io/sx9gv) in the figure legend.
  
  Page 17, section "Predictors of replication success": It is unclear to me how the decision was made about which results from Figure 4 to present in the text. Intuitively, given that correlations were calculated for both t values and lnRR (and other metrics), I would have expected that whenever a result is highlighted in the text, the authors also report how it changes depending on the metric used - for example, the interesting result regarding the 5-year number of publications, whose correlation is notably lower when using lnRR (−0.31 vs. −0.18). Presenting this nuance in the text would reduce the risk of inadvertently giving the impression of cherry-picking.
  
  We selected the highest correlation values for each continuous outcome (t score and lnRR) and presented these separately in the text. This is a systematic way to perform the selection, but is obviously subject to the “winner’s curse” effect. We agree that adding both metrics for each predictor would be a fair way to keep this in perspective for the reader, but we would have to think about how to do this without sounding too confusing (as results for the two main outcomes are quite different).
  
  We do note, however, that the outcomes are indeed different and are expected to vary independently in some cases. For the correlation with replication probability predictions, for example, the effects in opposite directions would likely be expected, as larger original effect sizes will likely lead to larger probabilities to be assigned, but also to a higher possibility of effect size decrease. This low correlation between outcomes is probably something that should be pointed out and discussed in the revised manuscript.
  
  Page 23, paragraph 1: (this comment should have come during the first % reported, but only in the discussion I realized how important this would be for comparing estimates) I wonder whether the authors should calculate 95% confidence intervals for all their percentages (and those of Errington et al.) using the Wilson method via the function binom.confint() in R, which handles extreme proportions (0% or 100%) more gracefully. This would ensure that uncertainty around these percentages is not neglected and would aid interpretation when comparisons are made.
  
  We had given this some thought when writing the manuscript – but ultimately opted not to include confidence intervals for our replication percentages and to use the replication rates as descriptive measures only (as done in other replication studies such as (Errington et al., 2021).
  
  Even though we aimed for our sample of original experiments to be as systematic as possible, it is ultimately constrained by many factors (the choice of methods, the particular expertise of the labs, etc.) thus, adding confidence intervals represents the uncertainty around the replication rate of a very specific population of experiments, which is not directly comparable to those included in other replication efforts in any case.
  
  We will reconsider whether we should include confidence intervals for replication rates: although doing this for every replication rate in Table 1 and Table 2 may end up being too much information, it could probably be done at least for the replication rates of the main analysis in the text. We note that calculating confidence intervals for percentages is straightforward, requiring only the numbers that are in the table thus, any reader that wants to estimate uncertainty for those rates should be able to do it easily.
  
  We will also point out the uncertainty around the percentages mentioned in the discussion when comparing our replication rates with those of other studies, which we agree is an important issue to touch on.
  
  In addition, in the next sentence, the authors are comparing correlation coefficients, at least verbally, these could in principle be transformed into Pearson's r and assigned 95% confidence intervals following meta-analytic workflows, which would better allow us to assess whether these correlations are meaningfully larger or smaller, and help avoid potentially misleading arguments.
  
  Both correlations in that case are non-parametric (e.g. Spearman’s ρ), so they cannot be directly transformed into Pearson’s r without making assumptions about the distribution (which we would probably avoid doing given the very marked outlier in our own). We can calculate a non-parametric confidence interval for our own correlation coefficient by resampling, but we will have to investigate whether this can be done using the available data from (Errington et al., 2021) (which is probably the case if effect sizes for all experiments have been shared).
  
  Page 24, paragraph 2: The following result is really interesting and I would love for the authors to expand on it a little. There must be other meta-research studies that, despite not studying replicability directly, have explored a similar predictor: "Other features of the original article were generally uncorrelated with replication outcome, although large rates of publications by the last author were associated with lower replicability, suggesting that incentivizing publication volume may be counterproductive for the reliability of results."
  
  It is indeed interesting, and seems to confirm an intuition that has long been present in the reproducibility field, but actually has little evidence to support it: if anything, there is evidence in the opposite direction in psychology (Youyou et al., 2023), although they looked at cumulative publication number, while we used number of publications in a fixed interval.
  
  We can expand a bit further on that finding: that said, we do note that the correlation is relatively weak and has a p value of 0.04. Thus, given the multiplicity of predictors would not be that unlikely to occur by chance, even though it seems intuitive. Thus, even though the relationship seems intuitive, we think it should be considered tentative at best and would refrain from discussing it in too much detail.
  
  Page 25, paragraph 1: I believe the authors could explore if there is evidence for "incorrect labeling of error bars (Cumming et al., 2007; Vaux, 2004)" by plotting log(SD) vs log(mean) across all original studies, and exploring if large outliers (i.e., points largely deviating from the positive regression) exist. That should provide some insights into whether some values reported as SD in the original studies were indeed SE, which I am assuming is what the authors of the study are referring to when they say "incorrect labelling of error bars" here.
  
  Yes, that is what we mean by “incorrect labeling of error bars” (as can be grasped from the cited references).
  
  We can perform this regression, which seems relatively straightforward to do. That said, we note that another likely cause for outliers at least for cell line studies would be the use of different (and eventually inadequate) experimental units (e.g. having error bars that represent technical replicates of the same measurement rather than truly independent experiments). We suspect that this may have an even greater effect in terms of causing error bars not to express the same thing and the regression will not help in differentiating the two causes.
  
  We should also note that different types of experiments may be expected to have very different SDs, so the regression is likely to have a lot of error associated with it. In particular, it’s probably worth doing separate regressions for each method, to account for the likely difference in CVs between animal and cell line experiments, for example. This could also help tease apart the two causes above, as the experimental unit problem mentioned above will likely only be observed for cell experiments.
  
  Code: I could not engage with the data and code, but I would like to highlight that the organisation and clarity of the GitHub repository is of high quality.
  
  Thanks!
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors conducted a large-scale replication effort of lab-based biomedical experiments with an emphasis on the country of origin and who conducted the replication experiments. The authors aimed to understand this context in both the outcomes produced, but also in the approach. Finally, the authors aimed to conduct multi-lab replications to provide richer data from the replications. Overall, the authors find replication rates that are like other large-scale replication efforts in the biomedical space. The authors provide rich detail into the three experimental techniques that were the focus of this effort, potential moderators of replication success, and challenges in conducting replications and coordinating a large-scale crowd-sourced effort.
  
  Strengths:
  
  The paper is outstanding in being transparent and calibrated in how the results are presented. While the authors were challenged by mundane aspects (e.g., difficulty with logistics), unexpected aspects (e.g., COVID pandemic), and very insightful aspects unique to conducting replications (e.g., experimental issues). The authors also provide variation in how they present the results, including confirmatory, multiverse, and exploratory analysis. A unique strength for this study is the rich in-depth insights about the process and interpretation of conducting replications, including predicting replication success in the lab-based biomedical space.
  
  We thank the reviewer for the compliments. Again, a more extensive list of insights can be found in our challenges article (Amaral et al., 2026), which we will cite in the revised version.
  
  Weaknesses:
  
  The study has weaknesses that the authors acknowledge in their discussion, such as lower number of replications than originally planned that limited the intended effort to compare multiple experiments with multiple attempts against a single original experiment. Another weakness is the limited discussion connecting these findings to the Brazilian research ecosystem.
  
  We acknowledge the missing replications as a weakness, and we hope we have made that point clear in the discussion.
  
  Concerning the Brazilian research ecosystem, we could try to explore this in more detail in the introduction. In particular, we believe that a better understanding of the Brazilian academic system, including its regional disparities and the general composition of its workforce (which is largely composed of undergraduate and graduate students), can be useful in interpreting some of the findings.
  
  We can try to provide a bit more context at the end of the introduction (perhaps between the last 2 paragraphs, which would also address a point made by Reviewer #1), and also in different points of the discussion including those comparing replication rates with other studies or discussing infrastructural difficulties, some of which may be specific to the Brazilian context (such as difficulties in acquiring specific reagents or licenses). Still, we reiterate that, due to the lack of studies with comparable samples in other regions, we cannot tease apart the factors that are specific to Brazil from those affecting lab biology as a whole from the data alone.
  
  References:
  
  Amaral OB, Neves K, Wasilewska-Sampaio AP, Carneiro CF. 2019. The Brazilian Reproducibility Initiative. eLife 8:e41602. DOI: https://doi.org/10.7554/eLife.41602
  
  Amaral OB, Valério B, Carneiro CFD, Mota GPS, Neves K, Abreu M, Tan PB. 2026. Challenges for building up confirmatory science in lab biology: lessons learned from the Brazilian Reproducibility Initiative. MetaArXiv, DOI: https://doi.org/10.31222/osf.io/8y3tg_v1
  
  Errington TM, Mathur M, Soderberg CK, Denis A, Perfito N, Iorns E, Nosek BA. 2021. Investigating the replicability of preclinical cancer biology. eLife 10:e71601. DOI: https://doi.org/10.7554/eLife.71601
  
  Fanelli D. 2010. Do pressures to publish increase scientists’ bias? An empirical support from US states data. PLoS One 5:e10271. DOI: https://doi.org/10.1371/journal.pone.0010271
  
  Fanelli D, Schleicher M, Fang FC, Casadevall A, Bik EM. 2022. Do individual and institutional predictors of misconduct vary by country? Results of a matched-control analysis of problematic image duplications. PLoS One 17:e0255334. DOI: https://doi.org/10.1371/journal.pone.0255334
  
  Ioannidis jpa. 2005. why Most Published Research Findings Are False. PLoS Medicine 2. DOI: https://doi.org/10.1371/journal.pmed.0020124
  
  Serghiou S, Contopoulos-Ioannidis DG, Boyack KW, Riedel N, Wallach JD, Ioannidis JPA. 2021. Assessment of transparency indicators across the biomedical literature: How open is open? PLOS Biology 19:e3001107. DOI: https://doi.org/10.1371/journal.pbio.3001107
  
  Smaldino PE, McElreath R. 2016. The natural selection of bad science. R Soc Open Sci 3:160384. DOI: https://doi.org/10.1098/rsos.160384, PMID: 27703703
  
  Tyner AH, Abatayo AL, Daley M, Field S, Fox N, Haber NA, Hahn KM, Struhl MK, Mawhinney B, Miske O, Silverstein P, Soderberg CK, Stankov T, Abbasi A, Aberson CL, Aczel B, Adamkovič M, Albayrak N, Allen PJ, Andreychik M, Awtrey E, Axxe E, Azevedo F, Bader MD, Bago B, Bailey J, Bakker M, Banik G, Banks GC, Baskin E, Batruch A, Beatteay A, Behr SM, Berente N, Berry Z, Białkowski J, Bodroža B, Boeschoten L, Bognar M, Bokhove C, Bonfiglio D, Bouwman R, Brady TF, Braithwaite SR, Briceño Jiménez G, Brick C, Bricka T, Briker R, Brown AN, Brown GDA, van Aert RCM, Caldwell K, Capitan S, Capitán T, Chandler J, Charles T, Chartier CR, Chawdhary R, Cheng KJ, Chopik WJ, Clark B, Colvin VE, Comer CC, Costantini G, Coupé T, Cummins J, Czernatowicz-Kukuczka A, de Leeuw J, Dobolyi D, Druckman JN, Duan J, Dujmović M, Dunleavy DJ, Durkee PK, Emery C, Esterling KM, Evans TR, Fedor A, Fernández-Castilla B, Fiala N, Field JG, Fong N, Fonseca MA, Freeman ALJ, Freese J, Geiger SJ, Geng J, Getz LM, Geven LM, Gleibs IH, Gonzales DP, Gooty J, Gourdon-Kanhukamwe A, Greculescu C, Griffin SM, Grigoryan L, Grunow M, Gunby N, Hall B, Hanel PHP, Hannon EE, Harper S, Held MJ, Hickman L, Higgins NC, Hippel S, Hoeppner S, Hong S, Hostler TJ, Inzlicht M, Izydorczak K, Jaeger B, Jankowsky K, Jarke-Neuert J, Jensen M, Jokić B, Jolles D, Jolly P, Jones AM, Juanchich M, Kačmár P, Kapoor H, Keljanovic A, Koirala S, Kołczyńska M, Kouroupaki D, Kühnen U, Landgrave M, Larson MJ, Laulié L, Lawrence ACE, Le Forestier JM, Leahy KE, Lee S, Leslie J, Lewis SC, Limnios C, Lin H, Liu A-C, Lloyd JW, Ludvig EA, Lynott D, MacDonald J, Mallik P, Mallinson DJ, Marinazzo D, Martarelli CS, Matacotta J, McBride A, McHugh C, McMillan G, Méndez E, Metzger M, Michaelides MP, Michalak J, Micheli L, Miller JK, Milyavskaya M, Molden DC, Monjaras AG, Moreau D, Morrow A, Moya C, Mudrik L, Mulder LB, Munt KA, Nandi A, Nason K, Nast C, Nave G, Nax HH, Neubauer F, Nguyen PLL, Nichols AL, Nilsonne G, O’Boyle E, Oettinghaus J, Oh J, Oshana A, Ostermann T, Ostrowski RP, Oyebanjo A, Panczak R, Patrianakos J, Pavez I, Pavlov YG, Persson S, Perugini M, Peters K, Pieters C, Ponizovskiy V, Porter ND, Prenoveau JM, Purić D, Purol MF, Puthillam A, Quinn KA, Ramljak M, Reed WR, Ritchie M, Ritzau M, Roche SP, Rodela R, Röer JP, Ropovik I, Rothschild J, Saal J, Safadi H, Samaha J, Sanchez M, Sankaran S, Santos D, Sargent AC, Sauter M, Schmidt K, Schnabel L, Schroeder AN, Schuetz SW, Schuetze BA, Schulte-Mecklenbeck M, Schütz A, Sevigny EL, Shackleton E, Shafranek RM, Shaki S, Shakya S, Sirota M, Sisco MR, Sitnikov MM, Slevc LR, Smalarz L, Smith CT, Snyder JS, Sommet N, Sonmez F, Spellman BA, Stanulewicz-Buckley N, Stock G, Street CNH, Strømland E, Sundelin T, Syed M, Szabelska A, Szaszi B, Szumowska E, Tagat A, Täuber S, Tay L, Thapa S, Thatcher J, Tsaklakidou D, Tummers L, Turkovich E, Tutor MV, Urbanska K, van ’t Veer AE, van Assen M, van de Ven N, van den Goorbergh R, Vargo EJ, Vaughn LA, Vazire S, Vermeulen JM, Vo DTH, Volkman V, Wagenmakers E-J, Wagner D, Walasek L, Walter F, Warmelink L, Wei L, Weißflog MI, Weller N, Wichman AL, Wilbiks J, Williams JR, Wolfe K, Wort F, Wright R, Wulff JN, Xue X, Yan VX, Yang Y, Yoon S, Žeželj I, Zhang Y, Ziano I, Zogmaister C, Zupan Z, Zwaan RA, Nosek BA, Errington TM. 2026. Investigating the replicability of the social and behavioural sciences. Nature 652:143–150. DOI: https://doi.org/10.1038/s41586-025-10078-y
  
  Westlake H, David F, Tian Y, Krakovic K, Dolgikh A, Juravlev L, Bournonville TE de, Carboni A, Melcarne C, Shan T, Wang Y, Mu Y, Kotwal A, Pirko N, Boquete JP, Schüpfer F, Rommelaere S, Poidevin M, Liu Z, Kondo S, Ratnaparkhi GS, Chakrabarti S, Liu G, Masson F, Xiaoxue L, Hanson MA, Jiang H, Cara FD, Kurant E, Lemaitre B. 2026. Reproducibility of scientific claims in Drosophila immunity: A retrospective analysis of 400 publications. eLife 15. DOI: https://doi.org/10.7554/eLife.108404.1
  
  Youyou W, Yang Y, Uzzi B. 2023. A discipline-wide investigation of the replicability of Psychology papers over the past two decades. Proceedings of the National Academy of Sciences 120:e2208863120. DOI: https://doi.org/10.1073/pnas.2208863120
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.02.645026v5
www.biorxiv.org www.biorxiv.org

Systematic identification of oscillatory gene expression in single cell types

2
1. EMBOpress 08 Jul 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  This interesting manuscript uses single cell RNAseq of developing C. elegans larvae to identify temporal pulses or oscillations in gene expression within glia and many other epithelial cell types - mostly in genes related to cuticle synthesis or remodeling. It identifies different sets of genes that oscillate within different cell types, and identifies many apparent oscillatory genes that were missed in prior studies because they are expressed in smaller populations of cells (whereas bulk data mainly report on oscillations within the major hypodermis).
  
  A second major contribution of this manuscript is to pioneer analysis methods for detecting oscillatory gene expression in scRNAseq datasets. That said, it's important to state that the methods for estimating phase coherence, GAM, perplexity, etc. make sense to me intuitively but I can't assess the math and other details, which are outside of my expertise.
  
  Most of my comments are minor ones about suggested clarifications to the text or figures. Some may require additional analyses, but none should require additional data collection.
  
  The manuscript focuses much of its analysis on one specific glial cell type (ILso), yet the authors tell us almost nothing about this cell type or why they would care about it. It would be helpful to include just a little more background on glial biology and the epithelial-like characteristics of socket glia.
  
  We added the following to the second paragraph of Results:
  
  "To this end, we used C. elegans strains expressing GFP specifically in ILso glia or in all glia (grl-18pro::GFP or mir-228pro::GFP, respectively). In C. elegans, all glia are found in sense organs. Most sense organs consist of one or more sensory neurons – each of which is specialized to detect different types of stimuli – and exactly two glia, called the sheath and socket. The sheath and socket glia form an epithelial tube continuous with the skin, through which the ciliated dendritic endings of sensory neurons protrude to sense cues in the external environment. In prior work we found that, in some sense organs, the socket glia produce cuticle specializations around specific sensory neuron cilia, but how these are coordinated with general cuticle synthesis was unknown (Fung et al. 2023)."
  
  Many transcriptomic studies of epithelia (including the Purice et al study of adult glia) use single NUCLEI RNAseq rather than single cells because of the challenges in separating cells connected by tight junctions. In C. elegans there are also various epithelial syncytia to contend with. In text or Methods, the authors should comment on why they think cells were appropriate to look at in this instance, and whether there are certain cell types that were missed or could only be obtained as cell fragments based on that choice.
  
  We added the following to the Methods section:
  
  "Presumably, fine cellular projections such as axons or glial processes are lost during cell dissociation, leaving mainly cell soma with nuclei. There is a risk that some cell types could be undersampled in cell sorting, as compared to sorting isolated nuclei, due to differences in how readily they undergo dissociation. On the other hand, retention of cytoplasmic material in this approach may better represent the total mRNA complement of the cell"
  
  Related to above, the authors do not mention any detection or exclusion of likely doublets. Is there reason to think that doublets were not present in any substantial numbers? I'm not super concerned about this since doublets containing hyp7 fragments should have worked against them in detecting glia-specific oscillations, but I do think the issue should be addressed in the text or Methods.
  
  We added the following in Methods:
  
  "Ambient RNA was subtracted using SoupX (Young and Behjati 2020). Potential doublets were assessed using DoubletFinder (McGinnis et al. 2019), but no cells were excluded on this basis. "
  
  p. 4 "previously unappreciated local differences in cuticle patterning." This statement should be tempered since many stage- or tissue-specific differences in cuticle patterning have been described previously (including in papers from the Heiman lab and others that are cited here). This study uncovers many additional examples but it's not a completely new finding.
  
  We have revised this:
  
  "Surprisingly, most pulsatile genes are specific to small sets of cell types, suggesting that previously unappreciated local differences in cuticle patterning are more widespread than previously recognized."
  
  Table 1 and text: the distinction between pulsatile and oscillatory should be explained more at the outset. These terms sometimes seem to be used interchangeably, but then Table 1 seems to make a distinction, not discussed until the final "limitations" section.
  
  We added the following definition to the Introduction:
  
  "Within a single larval stage, oscillatory genes display a characteristic sharp single peak of expression and we define rigorous metrics for identifying this signature, which we call "pulsatile expression."
  
  *
  
  We also added a further clarification in the Results section under "De novo identification of pulsatile genes":
  
  "We reasoned that for individual genes, if gene expression in a given cell type were plotted as a function of pseudotime, oscillatory genes would display a distinct peak because they are expressed at a particular pseudotime (Fig. 4A). We refer to this transcriptional signature as "pulsatile" when viewed in a single developmental stage; genes with pulsatile expression are predicted to be oscillatory when viewed across all of larval development, but there may be important exceptions (see "Limitations of the study")."
  
  Figure 1 and Figure 3A,B. These UMAPs look very unusual, with no discernable individual dots. Is this just a resolution issue? Or, if relevant, please add info to legend and/or Methods explaining what data smoothing was done here to make them look this way and why.
  
  We have reduced the size of the dots (to point size 1 from point size 2) in the UMAPs in Fig. 1 and Fig. 3 to make individual dots more apparent. The noted effect is due only to the size of the dots; the UMAPs are plotted in the conventional way. The effect of different point sizes on the Fig. 3 UMAP is shown below [IMAGE CANNOT BE ATTACHED HERE]
  
  Figure 2C and Figure 6B. In the pseudotime plots, it would be natural for readers to assume that 0 is the beginning of the larval stage and 360 is the end, but that is not actually the way the Meeuse 2020 phase angles work - instead the beginning of the larval stage falls around 160. Please make sure this is made clear, especially when referring to "early and late groups" of TF targets. In Fig 6B, Early and Late categories appear reversed because of the way the data are plotted.
  
  We have replotted Fig. 6B using percent of larval stage progression rather than phase angles in degrees, with 0% corresponding to the peak of dpy-6 expression, to make the timing more intuitive. We have revised the description of the early and late groups in the Discussion.
  
  As Fig. 2C compares our data directly with the phases defined by Meeuse et al., we prefer to keep it consistent with that publication.
  
  Figure 3B and Figure 5D-G. The authors group many unidentified clusters into the catchall "skin" category but don't clearly define it in the main text. Table S2 suggests this category includes anterior and posterior skin cells but possibly also other cuticle-lined tubular epithelia that aren't properly referred to as skin (e.g. vulva cells, excretory socket or pore cells). It may also include things like rectum, buccal cavity, excretory duct. Please define your criteria for "skin" more precisely in the main text (any cuticle-lined cell type that is not glia?), and perhaps a more general term such as external epithelia would be more appropriate.
  
  We have changed this in the text to "skin-related cell types" to clarify that it includes hyp, seam, and some unidentified skin-related clusters (which may include some of the cell types you mention, for example the "skin_5" cluster may include vulval epithelia or their precursors as shown in Table S2).
  
  Also related to cluster assignments: please specify if "excretory" category includes canal, duct, pore, gland all together, or only a subset of these. Only the duct and pore are cuticle lined and therefore expected to have oscillatory matrix gene expression.
  
  We have changed this to "excretory cell" (or "exc cell") for clarity. We did not examine markers for the excretory duct, pore, or gland.
  
  Figure 5. This figure feels disjointed and could be broken up into two figures (panels A-C and panels D-G). The first 3 panels seem more related to Figures 3 & 4 - identifying which cell types have strong pulsatile gene expression - whereas the later panels get into the degree of cell type specificity in matrix gene expression.
  
  We appreciate the merit of this point and in fact we strongly considered splitting up this figure (in various ways) while writing. While we agree that the figure covers a lot of ground in this format, we feel that the subparts do not hold up as their own independent figures on equal footing with the other figures in the manuscript.
  
  Figure 5D-E. The very low degree of sharing is fascinating but could be an underestimate that depends on the thresholds chosen for calling a gene "pulsatile". It may be helpful to test a range of thresholds to see how much this matters. For those ~2,500 genes that appear pulsatile in just one cell type, are they called expressed but non-pulsatile in other cell types? That would seem odd to me biologically and most likely a threshold artifact.
  
  We have added the following caveat to the Results:
  
  "Put another way, 45% (2,390 of 5,268) of the genes we identified were expressed and pulsatile exclusively in a single cell type while only 17% (915 of 5,268) were pulsatile in five or more cell types (Fig. 5E). A potential caveat to this conclusion would be if some genes are not categorized as pulsatile in particular cell types due to lower expression (e.g., falling into Cluster 7 with high peak amplitude in one cell type, and Cluster 8 with low peak amplitude in other cell types; see Fig. 4B-C). However, if this occurs, it affects only a minority of cases: among genes categorized as pulsatile in only one cell type, 82% are not detected as expressed in any of the other oscillatory cell types, indicating that the apparent specificity most likely reflects cell-type-specific gene expression rather than thresholding effects."
  
  Figure 5F and p.21 Methods, the authors analyze only 140 collagen genes and 38 ZP domain genes retrieved from InterPro, but there are at least 173 cuticle collagen genes and 43 ZP domain genes described in the literature. Therefore, their lists are incomplete and the Methods should say so.
  
  Thank you for pointing this out. We changed the gene list to the cuticular collagens listed in Teuscher (2019). This did not affect the figure in a major way. We retrieved 44 ZP domain genes from InterPro, which match the ones listed by Cohen (2019) with the addition of cutl-19.
  
  If most oscillatory gene expression is truly a function of the molt cycle, as suggested by the matrix gene families in Figure 5, then one might expect that most of the detected oscillatory genes would no longer be expressed in adults, or at least wouldn't appear "pulsatile" in adults. Is this true? There now are a variety of published adult data sets, including the Purice et al data on glia, that could be examined to address this.
  
  PCA of adult cells does not exhibit the circular structure necessary to assess pulsatile expression. Previous work showed that most oscillatory genes are not expressed in adults, as expected (Meeuse et al. 2020).
  
  **Referees cross-commenting**
  
  I agree with the other reviewers' critiques, including the point of Reviewer #2 that orthogonal confirmation methods (such as by imaging) could have been nice but are not necessary. The question of Reviewer #3 about tissue synchrony/asynchrony is a very important one but I am not confident it can be addressed with these types of data.
  
  Reviewer #1 (Significance (Required)):
  
  As a molting invertebrate, C. elegans must build and shed its protective cuticle at multiple times across its life cycle, and this requires temporal control of many genes involved in matrix structure and processing. Although temporal oscillations were already well documented from bulk RNAseq data, this manuscript extends those prior findings by showing that different sets of genes oscillate within different cell types (including sensory glia), and by identifying many apparent oscillatory genes that were missed in prior studies because they are expressed in smaller populations of cells (whereas bulk data mainly report on oscillations within the major hypodermis). These data about cell-type specific temporal programs and gene sets emphasize the exquisite specificity of apical matrix and will be broadly useful to researchers in the C. elegans community.
  
  A second major contribution of this manuscript is to pioneer analysis methods for detecting oscillatory gene expression in scRNAseq datasets, even where bulk temporal data may not exist. This will be valuable for others doing sRNAseq studies in nematodes but also in other systems where cells may have molt cycle- or circadian-regulated oscillations.
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  SECTION A - Evidence, reproducibility and clarity
  
  Summary:
  
  Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).
  
  The authors use single cell sequencing (scRNA-Seq ) of cells obtained from larval stages of C elegans -- primarily the L4 stage, but also the L2. Worms are disrupted and individual cells are sorted by expression of fluorescent markers specific to glial cells, a cell type that is relatively rare in the population, and of particular interest to the focus of the study. In this fashion, the samples were enriched for glial cells but, bedcause the soting is not perfect, also contain representations of other cell populations, including hypodermal (skin) and other epithelial cells, muscles, and neurons. 2D representation of the scRNA-Seq data in principle component (PC) space reveals sets of cells of the same cell type (for example glia or skin cells) arranged in roughly circular patterns, indicative of rhythmic gene expression in those cell types. Much of this circular PC behavior is shown to be driven by genes that had previously been shown, by bulk RNAseq of staged larvae, to undergo rhythmic gene expression in conjunction with the larval stages and the molts that punctuate the larval stages. Based on the previously published relative timing of expression of these cycling genes, and the pattern of peak expression of each gene in the scRNA-Seq 2D PC space, the authors could calculate a phase angle of expression of each gene relative to its peak in each cell, and thereby calculate an average phase angle of all cycling gene for each cell, and place that metric in register with the roughly circular pattern of cell types in the 2D PC plot.
  
  The authors show that many of these rhythmic genes encode extracellular matrix (ECM) proteins or other proteins related to cuticle synthesis and assembly, or molting. Cell types exhibiting cyclic gene expression included skin, pharyngeal epithelial, as well as several types of glia, notably socket glia, which synthesize a specialized ECM that surrounds and protects sensory neurons.
  
  Finally, the authors analyze the patterns of cyclic gene expression in several cell types with respect to the expression of transcription factors (TFs) that are expressed in the cell type, including TFs that appear to likewise cycle, and whose predicted targets are enriched for cycling genes. From this computational analysis, the authors derive sets of hypothetical transcriptional regulatory circuits underlying phased expression of cycling genes.
  
  Major comments:
  
  Are the key conclusions convincing?
  
  1) Yes, the data support the conclusion that the authors' approach and methodology can take a list of genes known to cycle in expression level at larval stages and identify the cycling gene expression profiles of those genes in single cell sequencing datasets. It is also convincing that the authors' data analysis methods can identify cycling genes from the scRNA-Seq data that had not been previously identified as cycling from bulk RNAseq. Furthermore, the enrichment of genes encoding collagens and other ECM components is clear from the data.
  
  2) The above being said, it is noteworthy that the conclusions of the manuscript - including the sets of predicted novel cycling genes, and the predicted transcription factor-target circuits -- were not confirmed experimentally using independent samples or orthogonal methodology. I think it is OK for the authors to leave these predictions for later experimental confirmation, but it would be appropriate for the authors to discuss this caveat about the need for strategic experimental tests to confirm the more novel findings presented here, while at the same time pointing out predictions from their analysis that fit with previous experimental findings (for example cases such as NHR-85 and NHR-23 where previous studies support that the relevant TF is involved in regulating molting-associated transcriptional activity.)
  
  We have added the following sentence to "Limitations of the study":
  
  "Further, while our results are consistent with other studies (Meeuse et al. 2020; Gaidatzis et al. 2025) and successfully identify known regulators such as NHR-23 and NHR-85, it will be important in future work to test expression of the novel oscillatory genes and the roles of novel regulators we have predicted."
  
  3) There is an issue of concern that is perhaps about terminology, and not necessarily conceptual: Throughout the manuscript the authors variously use the terms, "oscillatory", "transient", and "pulsatile" to refer to cyclic gene expression. It seems that each of these terms could have distinct meanings, based on their English usage: The term "oscillatory" gene expression would seem to be a general term for gene expression that varies in a regular, rhythmic fashion. "Transient" gene expression seems like a general term for ON/OFF dynamics, albeit not necessarily oscillatory. "Pulsatile" gene expression implies oscillatory dynamics where the rise and fall of gene expression is relatively abrupt and might also imply ON/Off dynamics (between zero to some positive value). These terms are used seemingly interchangeably in the early parts of the manuscript, and then later, "pulsatile" is used increasingly, so the reader starts to wonder why. The authors should define these terms precisely and use the terminology deliberately and consistently.
  
  We have clarified the important point about oscillatory vs. pulsatile in the text. Please see our response to Reviewer 1, Point 5. Additionally, we have removed the use of "transient" except in the context of the phrase "transient aECM" that has been established in the literature.
  
  4) Related to the above, the authors should address how lowly-expressed genes behave in scRNA-Seq data, where the transcriptome is not fully sampled in each cell, and how that phenomenon could affect the apparent variation of gene expression within a population. My understanding is that if the expression level of a gene goes below some threshold percentage of the total transcriptome, it may not show up at all in the reads from that cell, even though the gene may still be expressed. Therefore, a gene can display apparent on/off behavior within a population of cells whilst the underlying variation in mRNA levels for that gene may be far less abrupt. How might this phenomenon affect the interpretation of a gene's dynamics as "pulsatile"?
  
  We added the following to clarify that sampling variation among cells was mitigated by applying a smoothing function based on each cell's five nearest neighbors in PCA space:
  
  "We then fitted the expression pattern of each gene with a generalized additive model (GAM) to obtain smoothed expression profiles. Because the GAM is fitted across many cells ordered along pseudotime, it captures the underlying expression trend even when individual cells show zero counts due to incomplete transcriptome sampling (e.g., Fig. 4A, black dots at y = 0)."
  
  As further described in Methods, our pipeline also incorporates several features that mitigate this valid concern:
  
  First, before fitting gene-level dynamics, we retain only genes detected in at least 20 cells and in at least 5% of cells of a given cell type (Methods). While this filter may exclude some genuinely low-expressed oscillating genes, it ensures that pulsatile calls are made on genes where expression is reliably measurable.
  
  Second, we apply two levels of smoothing. Prior to PCA, k-nearest-neighbor smoothing ensures that each cell's expression profile reflects a local average of transcriptionally similar cells rather than a single noisy measurement. When modeling gene expression along pseudotime, we fit a generalized additive model (GAM) with cyclic cubic splines, pooling information across many cells. The curves we score as pulsatile therefore reflect averaged expression across neighborhoods of cells, rather than raw per-cell counts subject to dropout.
  
  Critically, dropouts arising from incomplete transcriptome sampling are independent of pseudotime (e.g., see dnj-1 in Fig. S3A). Our pulsatility criterion explicitly requires a low baseline combined with a well-shaped, high-amplitude peak in a specific pseudotime window, which dropout noise alone cannot generate. Indeed, as shown in Fig. 4A, the method readily identifies peaks even when many individual cells have zero detected reads (black dots at y = 0), demonstrating that the smoothed fit recovers the underlying dynamics from sparse data.
  
  Finally, during development we also tested a logistic GAM that models the probability of detecting at least one read per cell, rather than read counts directly, which produced comparable results, though it saturated for highly expressed genes.
  
  Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?
  
  5) Page 12: "Taken together, our results suggest that cuticle formation is the main commonality among pulsatile genes, and that distinct cell types use very different gene expression programs during this process. Thus, while cuticle aECM is typically perceived as a single homogeneous meshwork, our results suggest that the cuticle is actually a patchwork matrix with different patterning and composition contributed by distinct cell types."
  
  It is not necessarily surprising that the cuticle made by skin cells could have composition non-identical to the cuticle made by glial cells or pharyngeal cells. But by describing the cuticle as a 'patchwork' elicits in the reader's mind an image of the skin of the animal (seam + Hyp) being mosaic for distinct cuticle compositions. Is that what the authors intend to say? It would be interesting if there were differences in composition of cuticle between skin cell types, and so it would be helpful if the authors could comment on how the transcript profiles compare for hypodermal seam cells vs multinucleate Hyp cells.
  
  We have expanded on this idea:
  
  "Taken together, our results suggest that cuticle formation is the main commonality among pulsatile genes, and that distinct cell types use very different gene expression programs during this process. Classical work showed that the cuticle exhibits regionalized specializations – for example, alae are present only over seam cells; annuli and struts are present over hyp7 but not near the nose; the vulval cuticle is thought to present structural or chemical signatures for recognition during mating; and the pharyngeal cuticle exhibits three short projections in the buccal cavity, sieve-like fingers between the metacorpus and isthmus, and grinder elements in the posterior bulb. However, the extent to which these structural differences correspond to distinct molecular composition was not known. Thus, while cuticle aECM is typically perceived as a single homogeneous meshwork, our Our results suggest that the cuticle is actually a patchwork matrix with different patterning and molecular composition contributed by distinct cell types."
  
  Would additional experiments be essential to support the claims of the paper?
  
  6) The data here are mostly from L4 stage larvae, with a possible (but unknown) contribution from L2 larvae. It would be helpful, in terms of broader understanding of their roles in larval progression, if some of the oscillatory genes identified here (especially the novel ones) were tested by orthogonal methodology (such as fluorescent protein tagging) for oscillatory expression at other stages. However, these experiments are arguably beyond the scope of this paper, and as long as the authors note the importance of such confirmatory experiments in their Discussion, I don't think that further experimentation is critical for this paper.
  
  We agree about the importance of these confirmatory experiments, and have added a comment in the Discussion (see response to Point 2 above).
  
  Are the data and the methods presented in such a way that they can be reproduced?
  
  7) In general, yes.
  
  Are the experiments adequately replicated and statistical analysis adequate?
  
  8) Yes.
  
  Minor comments:
  
  Specific experimental issues that are easily addressable.
  
  9) Page 4: Regarding the single cell sequencing approach, the authors should comment on the extent to which mRNAs are efficiently recovered from hypodermal syncytial cells (Hyp), which are multinucleate. Could the data from Hyp be chiefly from nuclear transcripts? If so, how might that affect the interpretation of the data?
  
  We have added a comment in Methods related to caveats of cell sorting vs. nuclei sorting (see response to Reviewer 1, Point 2). As the proportion of immature (unspliced) mRNA and reads corresponding to the mitochondrial genome are not noticeably different in the hypodermal cells than in other cell types, we do not think the data are chiefly from nuclear transcripts.
  
  10) It is confusing that Table S1 lists male-enriched samples that were apparently sequenced, but only hermaphrodite data were analyzed for the paper. To prevent confusion, the male samples should not be listed.
  
  We have clarified in the Methods that these samples are included in Table S1 because we wanted to share the datasets with the community:
  
  "(Supp. Table S1; note this table includes related samples that were not used in the present analysis but that are deposited in the Gene Expression Omnibus (GEO) repository as a public resource)"
  
  11) Page 5, bottom: The following analysis requires clarification (at least for this reader): "To test if such oscillatory gene expression is present in ILso glia, we computed the average phase of each cell (Fig. 2B). Specifically, for each cell, we computed a weighted circular average of the peak phases of oscillating genes (derived from the previous bulk RNA-Seq data), using the gene expression levels in that cell as weights."
  
  In reading this part of the main text, this reader struggled to understand how one can compute the phase angle for a given gene in a cell by comparing its level of expression in that cell to measurement of the level of that gene in previous bulk RNA-Seq data. Of course, there is far more to the analysis than that, which the Methods and Materials section on page 18 describes in more detail, where one learns that the level if each gene in each cell is scaled to its maximum expression across all the cells analyzed, and that the previous bulk sequence analysis is used to simply provide a phase angle for the gene's peak expression relative to an arbitrary framework (which corresponds to a larval stage, one assumes). The presentation of this analysis on Page 5 in the main text should be revised to include a full description of what was done so the reader can follow along and understand it without having to read the Methods section. But moreover, the Methods section treatment of this analysis is still not entirely clear; for example, certain variables (W, s, and c) are not defined. The presentation of the mathematics should be clarified so that the reader can understand the analysis without having to look up scTransform-normalization.
  
  We have expanded and clarified this section:
  
  "To test if such oscillatory gene expression is present in ILso glia, we computed the average phase of each cell (Fig. 2B). Specifically, for each cell in our dataset, we considered its expression level of each of the 3,739 previously described oscillatory genes (Meeuse et al. 2020). To avoid biasing towards inherently highly-expressed genes (e.g., those encoding structural proteins), the expression of each gene in a given cell was scaled to its maximum expression across all cells. We then computed the average phase of each cell by taking the known phase for each gene (Meeuse et al. 2020) and calculating a weighted circular average, using the scaled expression of each gene in that cell as weights (Fig. 2B; each colored line represents one oscillatory gene with its angle representing its known phase and its length representing its scaled expression in that cell). we computed a weighted circular average of the peak phases of oscillating genes (derived from the previous bulk RNA-Seq data), using the gene expression levels in that cell as weights. This average results in a vector whose direction reflects the average phase of genes expressed in that cell, and whose length reflects how consistently the genes’ peak times align in that cell (Fig. 2B, black arrow)."
  
  In the Methods, we have moved the definitions of W, s, and c so that they precede the formula for the average angle .
  
  Are prior studies referenced appropriately?
  
  12) yes
  
  Are the text and figures clear and accurate? - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?
  
  13) Figure S3 Panel A: What does the green line mean? Figure S3 Panel C: The use of the "predictors" tem is confusing because on page 8, the part of the narrative referring to Figure S3, the term used is "descriptors". Is that an meningful switch in terminology?
  
  We have expanded and clarified the Supp. Fig. S3 legend. For simplicity, we now use the term "metrics" to refer to the parameters used for hierarchical clustering of pulsatile expression (peak amplitude, baseline, fit, shape). This replaces our previous uses of "predictors" and "descriptors".
  
  14) The legend to Figure S3 requires more details to enable the reader understand the Figure. The same critique applies to most of the Supplemental Figure legends, where more details are required to allow the reader to understand each Figure without having to refer back to the main text.
  
  Thank you for pointing this out. We have revised and expanded all of the Supplemental Figure legends.
  
  15) Page 19: "The cells were grouped by cell type independent of the stage of collection (L2 or L4), and each cell type was processed individually."
  
  Why were L2s and L4s pooled? How does this affect the analysis and/or the outcomes? Could there be confounding effects from pooling the samples that could affect the analysis or the conclusions?
  
  We added the following clarification in the main text:
  
  "Because we found that cells clustered together based on their cell type rather than developmental stage, L2 and L4 cells of the same cell type were pooled for all downstream analyses (see Methods)."
  
  as well as the following explanation in the Methods:
  
  "The cells corresponding to the same cell type at different stages were then merged for subsequent analysis. After annotation, cells of the same cell type from L2 and L4 datasets were pooled for downstream analysis, such that each cell type is represented as a single combined cluster across stages. This provides two advantages: it increases statistical power by increasing the number of cells, and it favors genes that are oscillating in both larval stages. Because L2 representation is more limited (Table S1), the pooled pseudotime is dominated by L4 dynamics, ensuring that L2 cells are anchored on the L4-defined trajectory."
  
  **Referees cross-commenting**
  
  There is substantial agreement amongst all three reviewers, regarding the signifcance of the findings and that the conclusions are well enough supported by the data such that no additional experiments are required. We all recommend revisions to clarify or expand the description of the experiments and/or analysis. Many comments are reiterated by more than one Reviewer. I agree with all the other reviewers' critiques.
  
  Reviewer #2 (Significance (Required)):
  
  SECTION B - Significance
  
  Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.
  
  The finding of oscillatory and molting-related gene expression patterns in glial cells emphasizes the importance of molting-related ECM/cuticle production by these sensory-accessory cells and will serve as a platform for further studies and further understanding the structural and molecular basis of glial cell support functions, especially in the context of changing roles for sensory neurons during developmental progression.
  
  The methodology and data analysis of C. elegans scRNA-Seq data presented here offers several significant advances, especially since it had been known that thousands of genes cycle in rhythm with the C. elegans molting cycle, yet that was based on previous bulk sequencing, so it was not possible to resolve cell-type specific expression. This paper presents methods for analysis of cycling gene expression in specific cell types.
  
  The manuscript derives hypothetical TF-target regulatory interactions that are proposed do underly cyclic gene expression in specific cell types. This is a significant resource for future work to explore and delineate upstream oscillator mechanisms, and answer questions such as, Is there a central oscillator for all of larval stage rhythmic gene expression? and, How are different genes expressed with different phases of the larval stages? etc.
  
  Place the work in the context of the existing literature (provide references, where appropriate).
  
  The authors cite important previous studeis that used scRNA-Seq to profile gene expression in specific C. elegans cell type in various developmental and physiological settings, and previous studies that used bulk RNAseq to identify genes whose transcripts cycle along with the larval stages. This manuscript reports the first study to examine cyclic gene express in C. elegans on the single-cell level.
  
  State what audience might be interested in and influenced by the reported findings.
  
  Moderately broad audience of biologists interested in biological oscillators; developmental biologists interested in gene regulatory control of developmental cell fate timing and reiterative developmental processes; neurobiologists interested in glial cell function in developmental contexts.
  
  Define your field of expertise with a few keywords to help the authors contextualize your point of view.
  
  C. elegans larval development; temporal control of cell fate progression. Are there are any parts of the paper that you do not have sufficient expertise to evaluate.
  
  Honestly, some of the mathematical analysis is beyond my ability to judge whether the chosen approach is the best choice for the particular setting.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  This study describes both new scRNA-seq data from C. elegans, targeting glia/epidermal cell types and especially the ILso glial cell, and analytical approaches to identify periodically expressed genes in the dataset. Overall the data appear of high quality so have value as a resource, and the analysis provides a substantial improvement in our understanding of how different cell types vary in their cyclic expression across the molt cycles. While I have make many suggestions, overall this is a very nice study as is and definitely seems likely to be an impactful publication.
  
  Major
  
  Figure 2:
  
  The dataset includes cells from multiple stages (L2 and L4 mentioned in the text, adult as well listed in Supplemental Table 1). There is a superficial display in Figure S1 which seems to imply that whether the same cell type clusters together across stages vs making stage specific clusters might be complex. But this isn't really discussed at all in the paper. For Figure 2 specifically it seems critical to know whether stages were pooled or separated for this analysis, and the question of whether the cyclic program varies across stages (at least for well sampled cells like ILso) is important.
  
  Please see our response to Reviewer 2, Point 15.
  
  Figure 3
  
  This approach overall is good for cells that cycle in a way their signature comes up in Meeuse but would it detect rare cell type cycles? Maybe the PCA space velocity approach could be a way to screen for cell types that cycle in a way that isn't detected in the whole organism time course data (or rule out the presence of large cycling gene sets)? For example "pharyngeal gland" seems to have a weak cycling signature using the Meeuse gene set (Fig. 3D) but fairly clear "circular UMAP" structure (Fig 3B).
  
  We added the following to emphasize that our rationale for developing the perplexity metric is to identify oscillatory cell types de novo, i.e. without relying on the Meeuse et al. dataset:
  
  "We hypothesized that this [perplexity] metric would distinguish between pulsatile cell types (corresponding to relatively high perplexity) and non-pulsatile cell types (corresponding to low perplexity), without relying on prior bulk annotations that may be insensitive to rare cell types."
  
  Consistent with the reviewer's intuition, this approach does identify cell types missed by the Meeuse-based local phase coherence analysis, specifically coelomocytes and PHsh glia, which have perplexity >30 but did not reach significance by phase coherence (Fig. 5C).
  
  Regarding the pharyngeal gland, this cell type has intermediate scores by both metrics and falls below our conservative thresholds (Fig. 5C). It is possible that it has a genuine but weak oscillatory program that our methods are underpowered to detect given the number of cells recovered for this cell type.
  
  We considered using RNA velocity, but we have not succeeded in developing a satisfying quantitative score; therefore, the perplexity metric serves this role in our current analysis.
  
  Do these data say anything about the question of whether all cell types in an organism are synchronized in the same phase as each other, or whether some might be systematically earlier or later in the cycle at a given time? It seems like if individual samples have enough stage bias (as an illustrative but made up example, if sample "230421_AM" has mostly early L4 while "230421_PM" has mostly late L4), then these data could be used to see if for example ILso cells tend to have earlier or later phases in the same sample compared to hyp cells. In my view, this is an important enough general question in the field to be worth addressing if the data are sufficient. And it could also provide an independent way to support/refute the presence of additional cycling cells (see Fig. 5 comments)
  
  This is an interesting idea, but unfortunately our synchronization at the population level is not sufficiently precise, and each sample contains cells spanning nearly the full phase range (shown below [IMAGE CANNOT BE ATTACHED HERE]). Under these conditions, between-sample phase offsets are dominated by within-sample dispersion, and we cannot reliably estimate systematic phase differences between cell types.
  
  Figure 5
  
  This seems a nice approach to address the earlier question about detecting cell type specific oscillations. But then only results for the cell types previously identified as oscillatory are reported. It seems important to report the potential cycling genes for the other cell types (PHsh, Coelomocytes, maybe Pharyngeal Gland) so their cycling status could be tested by others in the future.
  
  We revised the text to highlight that these gene lists are in Supp. Tables S3 and S4.:
  
  "We limit our subsequent analyses to the 17 high-confidence cell types that appear oscillatory using both approaches (local phase coherence and perplexity), which together contain 5,268 pulsatile genes. Pulsatile gene lists for these and other cell types are provided in Supp. Tables S3 and S4 to facilitate independent assessment of their cycling status. A summary of pulsatile genes in these 17 cell types is shown in Table 1."
  
  Regarding the last section ("only 17% were pulsatile in five or more cell types", "only 10 genes were pulsatile in all 17 oscillatory cell types" etc) - thresholding of a dataset like this can lead to false negatives resulting from incomplete (and cell type specific differences in power) which is a common source of technical non-overlap in this type of comparison. Indeed it is notable that the highest overlap was with ILso (specific sort target, likely to be especially well powered) and CEPso. There are various approaches to estimate not just confident overlap but also confident non-overlap, for example the "irreproducible discovery rate" (IDR) approach commonly used for ChIP-seq data. While clearly based on gene set enrichment there is cell type specificity, I'd suggest toning down the interpretation of the fractional overlap in the text if this can't be resolved.
  
  We toned down the interpretation of the fractional overlap:
  
  "To what extent are the same sets of pulsatile genes shared between cell types? To address this question, we examined the overlap between the pulsatile genes we identified in each cell type (Fig. 5D), noting that because power to detect pulsatile expression varies across cell types, the overlap values we report are likely to underestimate the true sharing between cell types.
  
  […]
  
  Put another way, 45% (2,390 of 5,268) of the genes we identified were detected as expressed and pulsatile exclusively in a single cell type while only 17% (915 of 5,268) were pulsatile in five or more cell types (Fig. 5E)."
  
  Minor
  
  Figure 1:
  
  I was a little unclear about the coloring in Fig 1C (are the colors by annotated tissue or something else like clusters?) - suggest specifying in the legend.
  
  We updated the legend: "UMAP of the same cells as in B, with each cell colored by its annotated tissue identity."
  
  More details on clustering and annotations approaches in the methods would be useful.
  
  We have substantially expanded the corresponding section.
  
  I have mixed feelings about the word "skin" in the figure panels - while more accessible to a broad audience, hypodermis or hyp subset labels (hyp 7 etc) might be more precise.
  
  We have changed many of these to "skin-related." We cannot use the anatomical terms because we cannot confidently distinguish, for example, hyp1 vs hyp2, due to the lack of known markers for each cell type. We therefore refer to skin-related cluster 3 as "skin 3," because calling it "hyp 3" would lead to confusion with the anatomical term.
  
  Table S2 would benefit from including the number of cells annotated with each cell type name
  
  We have added the number of cells per cell type to Supp. Table S2, with separate columns for L2, L4, and the total.
  
  Figure 2
  
  Fig 2B is nice - clearly shows the difference in expression of phase specific genes in the two example cells and conceptual framework for averaging. I was struck by the relatively broad range of phase values though (For example the bottom cell has highly expressed genes with phases ranging from ~100 degrees to ~280). It seems this could reflect technical noise in the single cell data or imprecision in the phase calls in Meeuse. But there is also the interesting possibility that there is biological flexibility in the order/expression of phased genes at this single cell level. Not sure if there is an obvious way to address this or whether it should be in the scope of this work but maybe at least worthy of a mention
  
  A parsimonious explanation for the broad range of phase values in a single cell is the shape of the peak: examining the data from Meeuse et al, oscillatory genes do not generally display a sharp peak, but rather elevated expression over a span of ~3h (out of a larval stage of ~8h), which would correspond to expression over 100°. Indeed, the decentered genes in Fig. 2B correspond to the genes F53F4.2 and cutl-10 which have their peak expression at ~135° (26 h of larval development in the Meeuse dataset) but are still expressed at ~180° (28 h of larval development in the Meeuse dataset). Importantly, expression peaks tend to be roughly symmetric around the cell's true phase and therefore reduce the length of the phase vector but do not affect the average phase itself.
  
  Figure 3
  
  The class Alter et al SVD paper https://www.pnas.org/doi/full/10.1073/pnas.97.18.10101 was the first use case of SVD/PCA in genome wide expression data and used (cell cycle) periodic expression as the main use case. The plots in Figures 2 and 3 are very similar to that approach, which basically used the relevant (~sin and ~cos correlated) principle components to define phases of both samples and genes. I mention this mostly in case it is useful to see how they approached the question and maybe as a relevant citation.
  
  We added the citation.
  
  Figure 4
  
  Minor method clarification - how was DTW adapted to deal with circular data, specifically to identify cases where the peak is centered at pseudotime == 0/1? It seems from the figures that maybe some approach was used to center the raw data on the peak but I didn't see a description of how this was done (apologies if I missed it)
  
  We edited the Methods to make the connection with the previous section more explicit:
  
  "We used the trained model to predict expression of each gene along a grid of 128 regularly spaced pseudotime values, resulting in a smoothed expression profile. Further, for each gene, we shifted the pseudotime values to center the maximal expression value, and fitted a GAM as described above. To facilitate comparison of profile shapes across genes with different peak times, we additionally produced a centered version of each profile. For each gene, we identified the pseudotime at which the uncentered profile reached its maximum, then circularly shifted the pseudotime values so that this maximum fell at the center of the range (pseudotime 0.5). A new GAM was fit on the shifted data as described above, and used to predict expression along the same regular grid. This yielded a centered, smoothed expression profile for each gene in which all genes have their peak at the center of the pseudotime axis. These centered profiles were used in the subsequent section to compute both the baseline and shape metrics of each gene.
  
  […]
  
  We then scaled the curve by its maximum value and centered it around its maximal value. We then scaled the centered smoothed expression profile (defined in the previous section) by its maximum value. The Dynamic Time Warp distance between the scaled and centered expression and an ideal sharp peak defined as the density of a normal distribution of mean 0.5 and standard deviation 0.01 was computed with the dtw package."
  
  The 2-PC view of ILso seems to align well with phase, but some of the other cell types (such as Seam in Figure 3C) are more complex - and also it seems possible that there could be cell types where the phase information is in e.g. PC2 and 3 instead of 1 and 2; how customizable is the approach and how dependent is it on a clean circular pattern in the PC plot?
  
  *
  
  We have expanded the Discussion to include this point:
  
  "This could indicate either a genuine absence of oscillatory programs; the presence of oscillations driven by only a few genes that are insufficient to shape PCA structure; or oscillations that are present but reside in higher principal components dominated in PCs 1-2 by other sources of cell-to-cell variation."
  
  By using an Elastic Principal Cycle (ElPiGraph) to fit pseudotime rather than relying on angle from the origin (as is common for this type of data), we accommodate trajectories within PCs 1-2 that deviate from perfect circularity, including elongated or asymmetric shapes such as in seam cells (Fig. 3C). However, when phase information resides in higher-order PCs, in the absence of an independent timing reference there is no principled way to identify which PCs carry oscillatory signal versus other gradients of cell-to-cell variation. Recovering oscillations in such cell types would therefore require complementary approaches, such as synchronized time-course sampling, rather than a modification of the current pipeline.
  
  It would be useful to annotate the examples (Fig 4A, lower panels) with whether they were newly identified or known from the bulk time course. And consider a larger supplemental figure with a sampling of newly identified genes in a similar format across a range of amplitudes etc
  
  We added Supplementary Figure S4C with examples across a range of amplitudes.
  
  The examples are all relatively tight peaks (width We developed an approach to quantify the width of peaks in the Meeuse data (Methods); we display the distribution of peak width for genes expressed in ILso and seam cells in the new Supplementary Figure S4A. Our approach did not display a systematic bias to detect narrow or wide peaks. We added the following in the Results:
  
  “More generally, our classification captures a range of expression profile morphologies without apparent bias (Supp. Fig. S4).”
  
  Figure 5
  
  There are important caveats in the interpretation of perplexity. For example a cell type that oscillates but with the vast majority of genes expressed uniformly or at one specific phase, would get a low perplexity, while a cell with multiple distinct states that don't cycle (this may be why body muscle has a modestly elevated sore) might achieve high perplexity. Worth addressing at some point.
  
  We added these caveats:
  
  "Potential caveats are that some non-oscillatory cell types might have high perplexity, for example if there are other sources of complex transcriptional heterogeneity among cells, while some oscillatory cell types might have low perplexity, for example if oscillating genes do not dominate the PCA structure."
  
  Fig 5C raises the question of how power (for each of these metrics) relates to number of cycling genes in a cell type and the density of its sampling across time (for example is glia 4 just a poorly sampled cell type, or is it qualitatively different in what fraction of its transcriptome is cycling?). Just recoloring this plot by number of single cells per annotation might touch on this, or could try a subsampling approach.
  
  We added this with the new Supp. Fig. S5C:
  
  "To test whether differences in perplexity could be explained by differences in the number of sampled cells, we recomputed perplexity after subsampling to progressively smaller numbers of cells for several representative cell types. Perplexity values were largely stable across subsample sizes, indicating that the classification of cell types as oscillatory or non-oscillatory is not driven by differences in statistical power (Supp. Fig. S5C)."
  
  The identification of pharyngeal muscle and epithelial oscillatory genes is a nice resource aspect of this paper given past work by EM showing these cells changing across the life cycle; it appears these cells have distinct enrichments (Fig 5G) and I think talking about these differences more explicitly could add to the closing paragraph in this section about aECM heterogeneity
  
  We have added the following:
  
  "In addition, the nematode astacin (NAS) metalloproteases appeared enriched in pulsatile genes in glia and pharynx, but not hypodermis (Fig. 5G, Supp. Table S6), consistent with ultrastructural observations that the pharyngeal muscle becomes secretory during molts and that the protease NAS-6 is required to digest the old pharyngeal cuticle (Sparacio et al., 2020)."
  
  Figure 6
  
  This section is great and a very useful resource for future work. A detailed analysis may be beyond the scope of this work, but for Fig 6B I wondered whether the TF oscillation phase matched/preceded the timing of its predicted targets (for the subset of TFs that were themselves oscillatory in that cell type)? Even a qualitative analysis of this would be informative.
  
  We added a new supplementary Figure S7 and commented on it in the text:
  
  "__For the subset of TFs that are themselves pulsatile, we asked whether their peak expression coincides with or precedes that of their predicted targets. We found that pulsatile targets are modestly enriched in a temporal window around the TF's own peak (Supp. Fig S7), consistent with near-simultaneous expression of TFs and their targets, as previously observed for nhr-23 (Johnson et al., 2023). This temporal enrichment was most consistent for nhr-23 and nhr-25, which showed significant enrichment across most cell types, while other TFs showed more variable patterns (Supp. Fig. S7)."__
  
  Open-ended/discretionary
  
  A general challenge in single cell data analysis is that standard methods like clustering can give misleading or hard to interpret results when multiple processes occur simultaneously. For example, cells can have signatures of cell fate and cell cycle and depending on the genes used for clustering and strengths of those signals, naïve clustering may cause them to group by either fate of cell cycle phase. This is a long-winded way to say an application of the approach reported here would be to identify cycling genes shared between cell types that could be removed from the "variably expressed genes" lists prior to clustering to improve cell type separation, or used exclusively to allow clustering by phase rather than cell type. (definitely discretionary to consider this but could be mentioned in Discussion as a possible application)
  
  **Referees cross-commenting**
  
  I agree with all of this, including Reviewer #1 that asynchrony may be hard to address with current data, and with both reviewers that the dataset stands on its own.
  
  Reviewer #3 (Significance (Required)):
  
  This paper addresses the problem of how to identify cycling genes in single cell data, using the C. elegans larval/molt cycle as a model system. The system has emerged as a powerful model for understanding regulation of periodic gene expression, with past bulk RNA-seq time course have identified 1000s of cycling genes. However, how cyclic gene expression varies across cell types was not known. This study uses single cell RNA-seq and develops new analysis approaches to identify cycling genes across dozens of C. elegans cell types. Strengths are the generation of a new single cell data enriched for larval glia, identification of cyclic gene expression across many C. elegans cell types, an improved analytical framework for identifying cycling genes that could be applied in other datasets, and substantial analysis of pathways and regulators involved. Weaknesses are limited, and include minor overinterpretations of the data and missed opportunities for additional analyses. The work should be of interest to a broad audience including not just C. elegans researchers but also the single cell and chronobiology communities.
  
  PeerReviewed
2. EMBOpress 08 Jul 2026
  
  in Review Commons
  
  Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Referee #2
  
  Evidence, reproducibility and clarity
  
  Summary:
  
  Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).
  
  The authors use single cell sequencing (scRNA-Seq ) of cells obtained from larval stages of C elegans -- primarily the L4 stage, but also the L2. Worms are disrupted and individual cells are sorted by expression of fluorescent markers specific to glial cells, a cell type that is relatively rare in the population, and of particular interest to the focus of the study. In this fashion, the samples were enriched for glial cells but, bedcause the soting is not perfect, also contain representations of other cell populations, including hypodermal (skin) and other epithelial cells, muscles, and neurons. 2D representation of the scRNA-Seq data in principle component (PC) space reveals sets of cells of the same cell type (for example glia or skin cells) arranged in roughly circular patterns, indicative of rhythmic gene expression in those cell types. Much of this circular PC behavior is shown to be driven by genes that had previously been shown, by bulk RNAseq of staged larvae, to undergo rhythmic gene expression in conjunction with the larval stages and the molts that punctuate the larval stages. Based on the previously published relative timing of expression of these cycling genes, and the pattern of peak expression of each gene in the scRNA-Seq 2D PC space, the authors could calculate a phase angle of expression of each gene relative to its peak in each cell, and thereby calculate an average phase angle of all cycling gene for each cell, and place that metric in register with the roughly circular pattern of cell types in the 2D PC plot.
  
  The authors show that many of these rhythmic genes encode extracellular matrix (ECM) proteins or other proteins related to cuticle synthesis and assembly, or molting. Cell types exhibiting cyclic gene expression included skin, pharyngeal epithelial, as well as several types of glia, notably socket glia, which synthesize a specialized ECM that surrounds and protects sensory neurons.
  
  Finally, the authors analyze the patterns of cyclic gene expression in several cell types with respect to the expression of transcription factors (TFs) that are expressed in the cell type, including TFs that appear to likewise cycle, and whose predicted targets are enriched for cycling genes. From this computational analysis, the authors derive sets of hypothetical transcriptional regulatory circuits underlying phased expression of cycling genes.
  
  Major comments:
  
  Are the key conclusions convincing?
  
  1) Yes, the data support the conclusion that the authors' approach and methodology can take a list of genes known to cycle in expression level at larval stages and identify the cycling gene expression profiles of those genes in single cell sequencing datasets. It is also convincing that the authors' data analysis methods can identify cycling genes from the scRNA-Seq data that had not been previously identified as cycling from bulk RNAseq. Furthermore, the enrichment of genes encoding collagens and other ECM components is clear from the data.
  
  2) The above being said, it is noteworthy that the conclusions of the manuscript - including the sets of predicted novel cycling genes, and the predicted transcription factor-target circuits -- were not confirmed experimentally using independent samples or orthogonal methodology. I think it is OK for the authors to leave these predictions for later experimental confirmation, but it would be appropriate for the authors to discuss this caveat about the need for strategic experimental tests to confirm the more novel findings presented here, while at the same time pointing out predictions from their analysis that fit with previous experimental findings (for example cases such as NHR-85 and NHR-23 where previous studies support that the relevant TF is involved in regulating molting-associated transcriptional activity.)
  
  3) There is an issue of concern that is perhaps about terminology, and not necessarily conceptual: Throughout the manuscript the authors variously use the terms, "oscillatory", "transient", and "pulsatile" to refer to cyclic gene expression. It seems that each of these terms could have distinct meanings, based on their English usage: The term "oscillatory" gene expression would seem to be a general term for gene expression that varies in a regular, rhythmic fashion. "Transient" gene expression seems like a general term for ON/OFF dynamics, albeit not necessarily oscillatory. "Pulsatile" gene expression implies oscillatory dynamics where the rise and fall of gene expression is relatively abrupt and might also imply ON/Off dynamics (between zero to some positive value). These terms are used seemingly interchangeably in the early parts of the manuscript, and then later, "pulsatile" is used increasingly, so the reader starts to wonder why. The authors should define these terms precisely and use the terminology deliberately and consistently.
  
  4) Related to the above, the authors should address how lowly-expressed genes behave in scRNA-Seq data, where the transcriptome is not fully sampled in each cell, and how that phenomenon could affect the apparent variation of gene expression within a population. My understanding is that if the expression level of a gene goes below some threshold percentage of the total transcriptome, it may not show up at all in the reads from that cell, even though the gene may still be expressed. Therefore, a gene can display apparent on/off behavior within a population of cells whilst the underlying variation in mRNA levels for that gene may be far less abrupt. How might this phenomenon affect the interpretation of a gene's dynamics as "pulsatile"? - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?
  
  5) Page 12: "Taken together, our results suggest that cuticle formation is the main commonality among pulsatile genes, and that distinct cell types use very different gene expression programs during this process. Thus, while cuticle aECM is typically perceived as a single homogeneous meshwork, our results suggest that the cuticle is actually a patchwork matrix with different patterning and composition contributed by distinct cell types."
  
  It is not necessarily surprising that the cuticle made by skin cells could have composition non-identical to the cuticle made by glial cells or pharyngeal cells. But by describing the cuticle as a 'patchwork' elicits in the reader's mind an image of the skin of the animal (seam + Hyp) being mosaic for distinct cuticle compositions. Is that what the authors intend to say? It would be interesting if there were differences in composition of cuticle between skin cell types, and so it would be helpful if the authors could comment on how the transcript profiles compare for hypodermal seam cells vs multinucleate Hyp cells. - Would additional experiments be essential to support the claims of the paper?
  
  6) The data here are mostly from L4 stage larvae, with a possible (but unknown) contribution from L2 larvae. It would be helpful, in terms of broader understanding of their roles in larval progression, if some of the oscillatory genes identified here (especially the novel ones) were tested by orthogonal methodology (such as fluorescent protein tagging) for oscillatory expression at other stages. However, these experiments are arguably beyond the scope of this paper, and as long as the authors note the importance of such confirmatory experiments in their Discussion, I don't think that further experimentation is critical for this paper. - Are the data and the methods presented in such a way that they can be reproduced?
  
  7) In general, yes. - Are the experiments adequately replicated and statistical analysis adequate?
  
  8) Yes.
  
  Minor comments:
  
  Specific experimental issues that are easily addressable.
  
  9) Page 4: Regarding the single cell sequencing approach, the authors should comment on the extent to which mRNAs are efficiently recovered from hypodermal syncytial cells (Hyp), which are multinucleate. Could the data from Hyp be chiefly from nuclear transcripts? If so, how might that affect the interpretation of the data?
  
  10) It is confusing that Table S1 lists male-enriched samples that were apparently sequenced, but only hermaphrodite data were analyzed for the paper. To prevent confusion, the male samples should not be listed.
  
  11) Page 5, bottom: The following analysis requires clarification (at least for this reader): "To test if such oscillatory gene expression is present in ILso glia, we computed the average phase of each cell (Fig. 2B). Specifically, for each cell, we computed a weighted circular average of the peak phases of oscillating genes (derived from the previous bulk RNA-Seq data), using the gene expression levels in that cell as weights."
  
  In reading this part of the main text, this reader struggled to understand how one can compute the phase angle for a given gene in a cell by comparing its level of expression in that cell to measurement of the level of that gene in previous bulk RNA-Seq data. Of course, there is far more to the analysis than that, which the Methods and Materials section on page 18 describes in more detail, where one learns that the level if each gene in each cell is scaled to its maximum expression across all the cells analyzed, and that the previous bulk sequence analysis is used to simply provide a phase angle for the gene's peak expression relative to an arbitrary framework (which corresponds to a larval stage, one assumes). The presentation of this analysis on Page 5 in the main text should be revised to include a full description of what was done so the reader can follow along and understand it without having to read the Methods section. But moreover, the Methods section treatment of this analysis is still not entirely clear; for example, certain variables (W, s, and c) are not defined. The presentation of the mathematics should be clarified so that the reader can understand the analysis without having to look up scTransform-normalization. - Are prior studies referenced appropriately?
  
  12) yes - Are the text and figures clear and accurate? - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?
  
  13) Figure S3 Panel A: What does the green line mean? Figure S3 Panel C: The use of the "predictors" tem is confusing because on page 8, the part of the narrative referring to Figure S3, the term used is "descriptors". Is that an meningful switch in terminology?
  
  14) The legend to Figure S3 requires more details to enable the reader understand the Figure. The same critique applies to most of the Supplemental Figure legends, where more details are required to allow the reader to understand each Figure without having to refer back to the main text.
  
  15) Page 19: "The cells were grouped by cell type independent of the stage of collection (L2 or L4), and each cell type was processed individually."
  
  Why were L2s and L4s pooled? How does this affect the analysis and/or the outcomes? Could there be confounding effects from pooling the samples that could affect the analysis or the conclusions?
  
  Referees cross-commenting
  
  There is substantial agreement amongst all three reviewers, regarding the signifcance of the findings and that the conclusions are well enough supported by the data such that no additional experiments are required. We all recommend revisions to clarify or expand the description of the experiments and/or analysis. Many comments are reiterated by more than one Reviewer. I agree with all the other reviewers' critiques.
  
  Significance
  
  Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.
  
  The finding of oscillatory and molting-related gene expression patterns in glial cells emphasizes the importance of molting-related ECM/cuticle production by these sensory-accessory cells and will serve as a platform for further studies and further understanding the structural and molecular basis of glial cell support functions, especially in the context of changing roles for sensory neurons during developmental progression.
  
  The methodology and data analysis of C. elegans scRNA-Seq data presented here offers several significant advances, especially since it had been known that thousands of genes cycle in rhythm with the C. elegans molting cycle, yet that was based on previous bulk sequencing, so it was not possible to resolve cell-type specific expression. This paper presents methods for analysis of cycling gene expression in specific cell types.
  
  The manuscript derives hypothetical TF-target regulatory interactions that are proposed do underly cyclic gene expression in specific cell types. This is a significant resource for future work to explore and delineate upstream oscillator mechanisms, and answer questions such as, Is there a central oscillator for all of larval stage rhythmic gene expression? and, How are different genes expressed with different phases of the larval stages? etc. - Place the work in the context of the existing literature (provide references, where appropriate).
  
  The authors cite important previous studeis that used scRNA-Seq to profile gene expression in specific C. elegans cell type in various developmental and physiological settings, and previous studies that used bulk RNAseq to identify genes whose transcripts cycle along with the larval stages. This manuscript reports the first study to examine cyclic gene express in C. elegans on the single-cell level. - State what audience might be interested in and influenced by the reported findings.
  
  Moderately broad audience of biologists interested in biological oscillators; developmental biologists interested in gene regulatory control of developmental cell fate timing and reiterative developmental processes; neurobiologists interested in glial cell function in developmental contexts. - Define your field of expertise with a few keywords to help the authors contextualize your point of view.
  
  C. elegans larval development; temporal control of cell fate progression.
  
  Are there are any parts of the paper that you do not have sufficient expertise to evaluate.
  
  Honestly, some of the mathematical analysis is beyond my ability to judge whether the chosen approach is the best choice for the particular setting.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2025.09.02.673125
www.biorxiv.org www.biorxiv.org

The cell cycle variant in multiciliated cells incorporates 2 centriole biogenesis cycles

1
1. Public_Reviews 08 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We have carefully addressed the insightful comments provided by the reviewers which thoroughly increased our comprehension of the dynamics of centriole amplification. The manuscript has been revised accordingly and put in the context of the two papers we published since our last submission, showing that MCC differentiation is a genuine cell cycle variant. A point by point answer to all reviewer comments is provided below.
  
  Briefly:
  
  We have streamlined terminology and nomenclature in text and figures / better define experimental conditions with nocodazole
  
  We have tested the role of dyneins in the dynamics of centriole amplification
  
  We have done correlative light and electron microscopy on the early stages of centriole amplification
  
  We have analyzed a new single cell RNA seq dataset comparing canonical and MCC cell cycle variants in mouse brain progenitors
  
  Collectively, this allowed us to make a clearer parallel with what occurs during centriole duplication and to demonstrate that centriole biogenesis in the MCC cell cycle is marked by the superimposition of 2 canonical centriole cycles.
  
  We believe the manuscript will interest a broader readership since it now provides more fundamental insights on the mechanism of centriole biogenesis.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  The manuscript by Boudjema et al. describes the cellular events underlying centriole amplification and apical migration to allow the assembly of hundreds of motile cilia in multi-ciliated cells. For this, they use cell culture models in combination with fixed and live cell imaging using antibody staining and fluorescence from endogenously tagged centriole and deuterostome markers, respectively. The work is largely descriptive and functional analyses are restricted to treatment with the microtubule depolymerizing drug nocodazole. The imaging is state-of-the-art including confocal microscopy, live imaging with optical sectioning and high optical and temporal resolution, as well as super-resolution imaging by ultra-expansion microscopy.
  
  The study does a good job of providing a very detailed description of the dynamics of centrioles and deuterostomes that lead to centriole amplification and apical migration in multiciliated cells. This detailed view was missing in previous work. It also reveals the involvement of microtubules at multiple steps: the formation of a cloud of deuterostome precursors, the nuclear envelope tethering of newly formed centrioles, their separation, and their migration to the apical surface.
  
  It would have been useful to expand the analysis of the role of microtubules by including analyses of the requirement for specific microtubule motors, for a better understanding and additional evidence that microtubule-based transport is involved. A weak point is that there is no visualization of microtubules together with deuterosomes and centrioles at the different steps of centriole amplification and migration, to directly address how these structures may interact with and move along microtubules.
  
  Overall, apart from experimental aspects and since this is largely a descriptive study, the manuscript would benefit from more precise language and a better description of the complex events underlying centriole amplification and movements.
  
  We have streamlined terminology and nomenclature, clarified the description of the complex events, and test the role of dyneins in centriole amplification. Microtubules density in MCC does not allow to extract information from imaging. In addition, we have done correlative light and electron microscopy on the early stages of centriole amplification and analyzed a new single cell RNA seq dataset comparing canonical and MCC cell cycle variants in mouse brain progenitors. We also replied points by points to the reviewer specific comments.
  
  Altogether, our new data allowed to demonstrate that centriole biogenesis in the MCC cell cycle is marked by the superimposition of 2 canonical centriole cycles. We believe the manuscript will interest a broader readership since it now provides more fundamental insights on the mechanism of centriole biogenesis.
  
  Reviewer #2 (Public Review):
  
  This important work will be of interest to centriole and cilia cell biologists. It describes in detail how microtubules control multiple aspects of centriole amplification in brain multiciliated cells. This study provides a greater time-resolved and molecular proteomic mapping of the different steps involved, with or without microtubule disruption. Boudjema et al. show that microtubules are important throughout the centriole amplification process, from the early stages, where the procentrioles emerge from a pericentriolar "nest", through the growth stage where microtubules maintain the perinuclear localisation, to the detachment stage, where microtubules assist in perinuclear disengagement and apical migration. The results are generally well supported by the evidence, but the manuscript would benefit significantly from some heavy editing to introduce more niche terms, standardize abbreviations in text, and labels on figures to help bring the readers, especially non-specialists, along with them - increasing the accessibility of their work.
  
  We thank the reviewer for his/her enthusiasm. We have streamlined terminology and nomenclature and clarified the description of the complex events to increase the accessibility of our work. We also replied points by points to his/her specific comments.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  In this manuscript, Boudjerna and Balagé et al. aim to elucidate the spatial origin of centriole amplification and the mechanisms behind the formation of an apical-basal body patch in multiciliated cells (MCCs). To this end, they focused on the role of microtubules and developed new tools for spatiotemporal and high-resolution analysis of different stages of centriole amplification, including the centrosome stages, A-stage, G-stage, and MCC-stage. Among these tools, the MEF-MCC cells grown on micropatterns stands out for its versatility as it is not tissue-specific and does not require epithelial cell-to-cell contact for differentiation. Additionally, the CEN2-GFP; mRuby-DEUP1 knock-in mouse model was used to study different stages of centriole amplification in physiological brain MCCs. This model offers an advantage over the previously described CEN2-GFP model by enabling the resolution of early events in centriole amplification through the visualization of DEUP1-positive structures and their dynamics. Finally, the authors leveraged powerful imaging techniques, including super-resolution microscopy, the U-ExM, and high-resolution live cell imaging in order to detect and track centriole amplification, elongation, disengagement, and migration.
  
  By combining the MEF-MCC and knock-in mouse model with spatiotemporal imaging in control and nocodazole-treated cells (treated acutely or chronically), the authors define the sequence of events during centriole amplification, revealing the critical roles of microtubules for the first time. Initially, the centrosome-mediated microtubule network forms, organizing a pericentrosomal nest from which procentrioles and deuterosomes emerge. Their findings indicate the importance of microtubules in recruiting and maintaining pericentriolar material clouds that contain DEUP1, PCNT, SAS6, PLK1, PLK4, and tubulins. Following the amplification stage, the procentrioles mature, leading to cells displaying numerous MTOCs, as demonstrated by regrowth experiments. Mature centrioles then disengage from deuterosomes, attach to the nuclear envelope, and migrate to the apical surface facilitated by microtubules.
  
  Strengths:
  
  The manuscript provides new insights into the regulatory function of microtubules in centriole amplification. Addressing the role of microtubules during different stages of centriole amplification required the development of new tools to study brain MCCs, which will be useful in future studies of MCCs. A notable strength of this manuscript is the authors' thorough and quantitative analysis of highly dynamic processes in MCCs. The precision and detail in describing these dynamic events are impressive. This comprehensive analysis advances our understanding of MCC biology.
  
  Weaknesses:
  
  The role of microtubules and other molecular players during different stages of centriole amplification in brain MCCs can be further studied and strengthened using the tools developed in the manuscript. A more quantitative description of some of the analysis performed in the manuscript is required to strengthen the conclusions.
  
  We thank the reviewer for his/her enthusiasm. We have tested the role of dyneins in the dynamics of centriole amplification, done correlative light and electron microscopy on the early stages of centriole amplification and analyzed a new single cell RNA seq dataset comparing canonical and MCC cell cycle variants in mouse brain progenitors. We also replied points by points to the reviewer specific comments.
  
  Recommendations for the authors:
  
  As you will see, all reviewers felt that the analyses of the involvement of microtubules should be strengthened by including controls and additional experiments. Also, they agree that significant text editing would help to improve the manuscript's accessibility and readability.
  
  Specifically, they would suggest (1) streamline terminology and nomenclature in text and figures; (2) better define experimental conditions with nocodazole (concentrations used, effect on microtubules, effect on canonical centriole duplication); and (3), in the absence of other complementary genetic perturbation experiments, add a limitations paragraph in the discussion about conclusions drawn from nocodazole treatment alone.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Main issues:
  
  (1) The authors use variable terminology to describe the same or similar events/structures. For example, in Figure 1 they refer to "centrosome stage" where they observe a pericentrin "cloud", which they later refer to as a "nest". In all other figures the first stage is not referred to as the "centrosome stage" but as the "cloud stage". Again, they also describe the "cloud" as a "nest" occasionally, but not always. In the cartoon, the nest is termed "centrosome cradle". The variable and inconsistent use of terms is confusing and the authors do not provide any explanation for the use of one vs. another.
  
  The text is now corrected. The centrosome stage corresponds to the stage preceding the beginning of centriole amplification in MCC progenitor. The pericentrosomal cloud of centriole and deuterosome elements forms later on, during the amplification A-stage. The formation of this cloud marks the beginning of A-stage, and persists up to G-stage where it dissolves. When we show that the cloud hosts the first stages of centriole biogenesis, we defined it as a “nest”. We do not use anymore the term craddle.
  
  (2) What prompted the authors to use the term "nest"? It gives the impression that they describe aspecific physical entity/structure (also depicted in this way in Figure 3P, with microtubules outside of this structure), but what is the evidence for this?
  
  The cloud is the spatial entity and the term “nest” is used to define a function of this transient compartment. We decided to keep the term “nest” as we now identified it with correlative light and electron microscopy, in addition to U-ExM, and show that the accumulation of centriole and deuterosome elements is accompanied by the formation of immature procentrioles, deprived of MT walls, as well as immature and empty deuterosomes. The scheme with MT outside the cloud/nest is misleading as we see MT organized by the mother centriole. We have now changed this.
  
  (3) The "nest" may simply be a dynamic accumulation of precursor particles around the centrosome, similar to what has been described for centriolar satellites. Rather than proposing a new entity, I suggest testing whether the "nest" particles may colocalize with PCM1 and thus may be related to centriolar satellites. Based on the data, the nest would simply be the centrosomal MTOC that organizes a radial microtubule array on which particles move around its center. In the absence of other evidence, I am not convinced that a new term is needed.
  
  We totally agree with the reviewer: the centrosome, as MTOC, concentrates centriolar and deuterosome components. This cloud is consistently dissolved when MT are depolymerized or dyneins inhibited. So, the physical entity is a “cloud”. We used the term “nest” to propose one function for this cloud which is to form deuterosomes and centrioles, before they move away for maturation. In fact, deuterosome and centriole formation are hindered when the cloud is dissolved. We have tried to edit the text all over the manuscript to make it clearer.
  
  (4) Role of MTs: are microtubules required or do they just facilitate some of the investigated events?
  
  The reason why the role of MT has not been tested yet during centriole amplification is probably because MT not only constitute the cell cytoskeleton on which molecular motors ride to transport cargos or distribute forces, they are also the core component of the structures we are studying. This is why we have tested a range of nocodazole concentrations and used concentrations where MT are perturbed but not entirely depolymerized, allowing centrioles to be produced (Fig. 4 Supplementary 1A-B). This may lead to an underestimation of the role of MT but we cannot study the role of MT on centriole amplification if centrioles cannot be formed.
  
  Does multi-ciliation in these models eventually occur normally under the concentrations and treatment conditions used here? This should be tested and discussed in the context of whether microtubules are indeed required and at what step of the entire process (amplification, migration, ciliogenesis) they may be critical.
  
  We did both chronic and acute treatments.
  
  Chronic treatments were done to test the overall efficiency of centriole amplification when MT (or dyneins) are perturbed. Chronic treatments were used to assess the role of MT (or dyneins) on the global efficiency of centriole and deuterosome formation (number of cells able to amplify, number/size/loading of deuterosomes, final number of centrioles (Fig. 4H-I, Fig. 4 Supplementary 2 B-D). In these chronic treatment, we focused on centriole amplification and not ciliation since it was the scope of this study. Also, we did not take ciliation as a readout of amplification because ciliation is relying on MT polymerization.
  
  Then, we also did acute treatments to test the role of MT (or dyneins) at each stage of amplification (A-amplification, G-growth, D-disengagement, M-migration; Fig. 4, 5, 7, 8 and associated supplementary figures). Since one stage is dependent on the precedent one, this enabled us to decipher the direct role of MT (or dyneins) on each single stage. We have now edited text, methods, legends and pictograms to be clear on whether acute or chronic treatment was done.
  
  (5) Can the authors include control (non-amplifying) progenitors in their analyses? It would be useful to know what the signal and distribution of each specific marker are before differentiation begins (before the cloud stage).
  
  Non amplifying progenitors are analyzed and constitute the so-called “centrosome stage”. We have now precised it and called it the “progenitor stage”.
  
  (6) Figure 2: Again, the terminology is confusing, since the authors describe that DEUP1 forms a "cloud" with centrin during the A stage.
  
  Corrections have been done as explained in point 1.
  
  (7) Description Figure 3: the authors introduce yet another term: "halo" A-stage. Is this the early A stage? Again, this is not explained and confusing. More systematic and consistent description is needed.
  
  Corrections have been done as explained in point 1. The term halos is used un the lab as it was the first term we used in our Nature paper in 2014 in reference to the halo described by Erich Nigg when they overexpressed Plk4. It was an error to use it in the manuscript.
  
  (8) Nocodazole treatments: the used concentrations are quite high.
  
  MCC develop a very dense and stable MT network that is not comparable to cycling cells. MT are very difficult to depolymerize entirely (Fig. 4 Supplementary 1A-B).
  
  (a) To avoid non-specific effects the authors should test what the minimal concentration is that completely depolymerizes microtubules in their cell model and perform analyses at this concentration.
  
  We have of course tested a range of nocodazole concentrations at the beginning of the study (Fig. 4 supplementary 1A-B), and used concentrations where MT are perturbed but not entirely depolymerized, allowing centrioles to be produced (see answer to point 4). In case it was not clear, we refer to this now several time and more clearly in the text and methods.
  
  (b) They should demonstrate depolymerization of microtubules by microtubule staining in the acute and chronic noc treatments and at the different noc concentrations used.
  
  This is, and was, in supplementary material (same, Fig. 4 supplementary 1A).
  
  (c) The authors should demonstrate that the used nocodazole concentrations do not impair normal centriole biogenesis during the cell cycle in these cells; if so, impaired assembly of centriole wall MTs may contribute to the observed effects in Figure 4.
  
  As mentioned in point 8b, we have of course tested a range of nocodazole concentrations at the beginning of the study (Fig. 4 supplementary 1A), and used concentrations where MT are perturbed but not entirely depolymerized, allowing centrioles to be produced (see answer to point 4). The ability of the cells to form centrioles during chronic treatments were always assessed using immunostainings of SAS6 and/or CEN2-GFP signals (now exemplified in Fig. 4 Supplementary 1B). We also did EM analysis on cells treated with the highest doses of nocodazole (Nocodazole 10 uM for 24h) and this showed that centrioles can form with, what seems to be MT walls, in cells totally deprived of cytoplasmic MT fibers (Fig. 4 Supplementary 3-4). However, this does not show that all the cells can, because the number of cells that can be analyzed by EM are not sufficient to conclude. Also, one cannot assess whether MT walls are properly polymerized. However, the absence of MT walls should not change the results of the Figure 4, which are based on DEUP1, SAS6 or CEN2-GFP signals for deuterosomes and centrioles. Also MT depolymerization affects the formation of deuterosomes, which should not be altered by MT wall defects as it is not affected, even when centriole formation is blocked (LoMastro et al., 2024). Last but not least, we now show that blocking dyneins, as a comparable and even greater effect, on the formation of the cloud, deuterosomes and centrioles (Fig. 4C-I and Supplementary Fig. 4), which confirms that MTOC function, rather that MT wall formation, explain the centriole biogenesis alteration shown in Figure 4.
  
  (9) The authors repeatedly refer to the centriole-to-centrosome conversion of amplified centrioles and how this resembles centriole-to-centrosome conversion during the cell cycle. However, they incorrectly claim that this occurs at the G2/M transition. PLK1-dependent modification occurs at this stage, but conversion and PCM recruitment only occur after mitosis (see original work by the Tsou lab, which needs to be cited here).
  
  We agree with the reviewer. We have now added additional data to show clearly that centriole biogenesis, which requires two cell cycles to proceed in cycling cells, is accelerated during the MCC cell cycle variant where the elongation and maturation cycles are superimposed. This is now clearly shown in Fig. 3, 5, 9 and discussed.
  
  (10) Figure 6H-J: the authors claim that at low noc concentration, more D-stage cells showed incomplete disengagement than in controls, but the effect is shown only for the highest 10 µM concentration. Do any eof the phenotypes in Figure 6 also occur at the lowest noc concentration (assuming it depolymerizes MTs)? Again, it is crucial to demonstrate this, to exclude unspecific effects not linked to MT depolymerization.
  
  An error was made on the figure (but not in the legend). In Figure 6, chronic treatments are at 1 or 5 µM. Only acute treatments were done using 10 µM. In both cases, MT are not entirely depolymerized in these experiments (Fig. 4 supplementary 1A).
  
  (11) Disengagement, Figure 7: The authors describe that DEUP1 signal spreads all over the cytoplasm and becomes diffuse during this process, but one cannot see a diffusive signal throughout cells in the figures.
  
  We pushed the contrast to make it clearer but the deuterosomes are still bright at this stage and it is difficult to have both signal clear (now in Fig. 6B). We have also changed the example in video (now video 19) to show it more clearly with DEUP1 channel alone.
  
  (12) Figure 7: localization of disengaged centrioles at microtubule "nodes" is not clear from the images. There are many centrioles and random colocalization may be expected simply based on the high number. Higher resolution and/or magnification and quantification would be needed.
  
  We have edited and now say that centrioles “colocalize” with MT which, since centrioles nucleate MT, seems normal. We agree that it could be random, but given the density of MT, and the number of centrioles, it does not seem opportune to us to quantify. We can just say that we never see centrioles is regions that are deprived of MT.
  
  (13) The term "diffusive" to describe slow centriole movements in Figure 8 suggests that it is not motor or force-dependent, but there is no evidence for that. Movement based on opposing forces could produce a similar result, but would not be considered diffusive.
  
  We agree. We have changed “diffusive” by “diffusive-like”.
  
  (14) The manuscript would greatly benefit from the analysis of some candidate motor activities that may drive the movement and migrations of centrioles in this system. This would support the importance of the microtubule network for the specific steps in these processes, and better define its role beyond "being required". Dynein may be a candidate or minus end-directed kinesins. Since chemical inhibitors are available, these types of experiments would be straightforward.
  
  We formerly tested ciliobrevin but had hard time because of the small stability of the drug. Since our submission to eLife, we tested dynapyrazol and dynarestin and found dynapyrazol very efficient in dissolving the Golgi, a good readout of dynein inhibition. We sought to test the role of dyneins, using dynapyrazol, on (i) the formation of the pericentrosomal cloud in A-stage, (ii) the oscillation of DEUP1+ structures during A-stage, (iii) the number, size, loading of deuterosome, (iv) the final number of centrioles, (v) the migration to the nuclear membrane and (vi) the final apical migration of centrioles. The results are now inserted in main and associated Fig. 4, 5, 7, 8, 9.
  
  (15) Discussion:
  
  "the role microtubules" lacks "of"
  
  This is now edited.
  
  "This lack is..." Lack of what?
  
  This is now edited.
  
  "reflexive link" - meaning of "reflexive" is not clear in this context
  
  We have removed it.
  
  In my opinion, the study does not identify a nest composed of DEUP1, PCNT, and Centrin2; it only shows that these components accumulate as particles around the centrosome, which functions as MTOC. Consequently, it seems that the "nest" does not exist when MT is depolymerized. One could consider the center of the centrosomal MT array as a nest in this context, but there is no evidence of a specific new structure as suggested by the way the term is used in the manuscript.
  
  This is what we want to say: the center of the MT array become a nest in this context. We do not state that there is a specific new structure. We just say that MT and dynein dependent concentration of centriole and deuterosome components exists and that this region nests the birth of centrioles and deuterosomes. Also, this compartment is restricted in time and space, which justifies to use a specific term. The MTOC exists in the progenitor cell, while this compartment, marked by DEUP1, Centrin, PCNT accumulation, appears at the beginning of amplification and grows during A-stage to be dissolved at G-stage when all the deuterosomes and centrioles have moved away.
  
  What is the evidence that "DEUP1 is a centrosomal protein before building deuterosome structures"? It would be good to refer to the specific experiment. Does DEUP1 localize at centrioles also in the absence of microtubules? If not, I would not consider it a centrosomal protein.
  
  We have removed this statement to avoid misinterpretation.
  
  "This reminds the centriole-to-centrosome conversion..." the sentence is missing an "of"; also, again the authors confuse the order of events during the cell cycle, where centrosome conversion occurs after completion of mitosis, not at G2/M transition.
  
  We have removed this statement to avoid misinterpretation. Also, see Point 9.
  
  "microtubule dependent nuclear migration" should be rephrased; it sounds as if the nucleus migrates.
  
  This has been changed
  
  The following discussion of disengagement being linked to association with the nuclear envelope and resembling the process in cycling cells is misleading. In cycling cells movement of centrioles along the nuclear envelope occurs at G2/M and drives centrosome separation (separation of centriole pairs) in preparation for mitosis, not centriole disengagement.
  
  We are now clearer. We compare centriole-loaded deuterosome organization around the nuclear membrane to the migration of new centrosomes during early prophase (Fig. 5F-H, Fig. 5 Supplementary 2G-K).
  
  Regarding the possibility that forces by microtubules generated by the daughter centriole drive disengagement also in cycling cells, I would argue that this is unlikely since the daughter centriole can only nucleate microtubules after disengagement has occurred (and conversion to centrosome/PCM recruitment). Once this happens, it may physically separate the disengaged centrioles, which is a different type of activity. Indeed, originally the term "disengagement" was coined to specifically describe the loss of the perpendicular engagement of daughter centrioles with their mothers (Tsou and Stearns, Nature, 2006).
  
  We have removed this statement to avoid misinterpretation. The perpendicular engagement is difficult to assess on deuterosomes but we do see by live imaging, that attachment changes during D-stage, before centrioles detach clearly from deuterosomes.
  
  "high resolutive" should be "high resolution"
  
  Edit done.
  
  "splitted" should be "split"
  
  Edit done.
  
  "Consistently, when the mitotic oscillator is dis-inhibited and cells enter pseudo-mitotic events, centrioles show clear and rapid cell-cycle like clustering" This sentence is not understandable without further explanation; what does mitotic oscillator refer to? What are pseudo-mitotic events? What is cell cycle-like clustering?
  
  We have removed this statement.
  
  Minor:
  
  (1) Abstract: "Centriole number must be restricted to two..." Since cells are born with two centrioles and have 4 centrioles (2 pairs) when they enter mitosis, this sentence is inaccurate.
  
  The sentence has changed.
  
  (2) Abstract: "reflexive link"; I am not sure what the term "reflexive" refers to?
  
  We have removed this statement to avoid misinterpretation.
  
  (3) Figure 1C, D: it should be described better that the larger magnification panels represent overlays of many cells and what marker they show. This is not obvious since the smaller single-cell panels always show two different markers. Also, it would be more useful to show also single cells in the magnified view. The overlay does not allow us to see if a marker forms a cloud or a single dot, which is as important as the cell-to-cell variation in distribution.
  
  We have clarified this in the text and the legend. The cell-to-cell variation cannot be estimated with the overlay, but the projection from several cells (number precised) allows to see that the signal is confined in a restricted region. Or not. Which is what we wanted to analyze.
  
  Related to the above, the authors say that pericentrin forms a cloud at the top left in panel D, but there is only one confined centrosomal dot in the single-cell panel.
  
  The sentence has changed.
  
  (4) Results, Figure 2F; video 4: The authors claim connection and disconnection of DEUP1 aggregates with centrosomal centrioles; can the authors comment on the spatial resolution including in z in this movie to support this claim? Can they exclude that the structures are in proximity of each other rather than "connected"?
  
  This is a single z-section of 500nm. The resolution in xy is 128nm/pixel. Given the sizes of deuterosomes and a mature centriole, and given the fact that we observed this dynamics in several cells in live, we can state that the structures are connected. This is consistent with deuterosomes frequently observed “kissing” the daughter centriole by EM in the present manuscript (Fig. 2D, Fig. 2 supplementary 3 and 4 and Fig. 4 Supplementary 3-4). One has to look carefully at the daughter centriole (marked “dc”) and span in on the serial sections to see the connected deuterosome (marked by a star): this is at very early stage and therefore it is small. We have not zoomed in since previous manuscript have already described this at later stages with bigger deuterosomes. You can refer to main or supplementary figures in previous manuscripts (Al Jord 2014, Khoury Damaa 2024) where serial sections span the entire deuterosomes and daughter centrioles and show, with nanometric resolution, that both structures are frequently sticked to each others on tens of nanometers.
  
  (5) The term "dynamics" as used in the manuscript should be plural.
  
  It has been used plural, except when for “dynamic microtubules” and “dynamic attachment to the nucleus”, which we think is ok? We have not found any other singular uses in our manuscript.
  
  (6) Figure 5: what does "YL1/2 procentriole intensity" refer to in panel F? This should be the intensity of microtubule asters.
  
  This has been modified.
  
  (7) Figure 6 - supplement 1B: contrary to the claim in the text, one cannot see tight colocalization with the nuclear pore marker. This seems to be a very small subset of particles and even in those cases colocalization is not tight. Also, what is the relevance of nuclear pore colocalization?
  
  We edit and change the phrasing as ‘colocalization with NPC’ is not the good term. What we want to say is that there is a tight connection with the nuclear envelope as shown by the localization of NPC on the same z-section as centrioles. This is why we present a single z, to show that centrioles and NPC are on the same z-plane of 500nm. NPC are stained to outline the nuclear membrane. This is also clearly visible for G-stage centrioles in the XY plane. We have now added an entire z-stack on video 18.
  
  Reviewer #2 (Recommendations For The Authors):
  
  To improve accessibility of their manuscript, we would suggest making the following edits:
  
  (1) Define 'specialist' or 'niche' terms each time you introduce them, such as 'pericentrosomal nest', or 'flower-like structures'.
  
  This has been clarified.
  
  (2) Have a think about abbreviations, again ones that work for people outside the project- this paper uses 'PC' for 'procentriole' but for many 'PC' is 'Parental centriole' or Figure 6J talks about 'D total' or 'D partial', leaves readers confused.
  
  This has been clarified.
  
  (3) Standardize your abbreviations throughout particularly for your treatments- sometimes Noco sometimes, NOCO, or your imaging experiments sometimes Cen-GFP, sometime CEN2-GFP (Figure 7A, D vs. Figure 6) or DEUP1- mRuby, DEUP1-mRuby3 or mRuby3-DEUP1?
  
  We now use Nocodazole or Noco in the text and the figure respectively, CEN2-GFP and mRubyDEUP1.
  
  (4) About 10% of the population, including several key figures in this field, are red-green color blind. Although 4 colour fluorescence is difficult to get right for everyone, choosing palettes (especially for two colour panels) is inclusive. More so, greyscale or inverted monochrome images make it easier for everyone to visualize changes in localization, size, and intensity. Red on black small foci is particularly difficult to discern. For example, Figure 3 - more individual channels in grayscale with arrows to mc, dc, and cilia would be helpful - difficult to distinguish stainings.
  
  We thank the reviewer for this comment and for this recommendation of being more inclusive. We have done the changes.
  
  To improve the conclusions drawn, we suggest some revisions below:
  
  (1) Since the paper really hangs on it, a clearer description of the rationale for when, how long and how much nocodazole treatment was done is needed. The logic currently is difficult to follow seemingly random jumps 10x concentration are used. Microtubules control many aspects of cell biology and could be impacted. For example, I particularly found Figures 6D and H difficult to follow i.e. the timing for 6H seems off.
  
  MCC develop a very dense and stable MT network that is not comparable to cycling cells. MT are very difficult to depolymerize entirely. We have of course tested a range of nocodazole concentrations at the beginning of the study and shown the extent of MT depolymerization under each treatment. We used concentrations where MT are perturbed but not entirely depolymerized, allowing centrioles to be produced (see answer to point 4 reviewer 1). The level of perturbation of MT and consequences on centriole formation at the different timings and doses were done for each experiment and are exemplified in Fig. 4 supplementary 1A-B. This figure was already present in the first version of the manuscript but we have now edited text, methods and pictograms to clarify this.
  
  (2) Perhaps an extension of this point- in general how interdependent are the processes? If there is a defect at the nest stage, how much are the later defects secondary to this, or do MTs genuinely play direct roles at all stages or are these knock-on effects? How do the authors rule this out? Defects in the nest, lead to smaller and more DEUP1+ foci, with defects in concentrating procentriole factors and centrin, which lead to... For example, Figure 4B looks like centrin is reduced upon noco treatment? Does noco treatment affect Cetn2GFP levels globally? Individual channels grayscale would help visualise this better.
  
  See also our answer to reviewer 1 point 8c.
  
  The stages are indeed interdependent. This is why we did both chronic and acute treatments. Chronic treatments were done to test the overall efficiency of centriole amplification when MT are perturbed. We typically used low dose of 1µM because nocodazole remains 48h in the culture medium. Acute treatments were done to test the role of MT at each stage of amplification (A-amplification, G-growth, D-disengagement, M-migration). Most of the acute treatments were done live and nocodazole was applied after the first time point of live monitoring. We used 10µM to have a rapid effect, and because nocodazole remains only several hours in the culture medium. This allowed to monitor the stage “n”, in cells where the stage “n-1” was completed without any drug which allowed to analyze a stage without having perturbed the precedent one.
  
  We now also test the consequences of dynein inhibition using both acute and chronic dynapyrazole treatments. We show that except for centriole migration, dynein inhibition phenocopies MT depolymerization (centriole number, perinuclear organization and disengagement as well as deuterosome number/loading/size).
  
  Nocodazole chronic treatments do affect intensity of CEN2-GFP at G-stage centrioles suggesting an altered A-to-G transition. In D-stage, CEN2-GFP signal seems normal. We now mention this in the text and in the Fig. 4 Supplementary 1B.
  
  (3) The authors nicely show the importance of MTs in the structure of the nest from which procentrioles and DEUP1 positive structures emerge. They suggest this nest may be what supports procentriole generation in the absence of DEUP1 and parental centrioles. Firstly how does this nest look in the absence of DEUP1 and/or parental centrioles (centrinone treatment)? This may be what they are trying to show in Figure 5 Supplement 1 but it currently is very difficult to digest what it is showing relative to controls and whether this is significant in the way it is plotted.
  
  The nest is conserved in the DEUP1KO with or without centrosomal centrioles, as shown by accumulation of Centrin and PCNT at the center of the self-organised MT network (Mercey et al., 2019). This is in fact what motivated our study on the role of MT in centriole amplification. We have edited the legend to precise the quantification done, which is not related to this question. In this quantification, we show that the increased propensity to accumulate PCNT by centriole-loaded deuterosomes between A and G-stage is maintained in the absence of deuterosomes, indicating that centrioles themselves accumulate/recruit PCNT.
  
  (4) Can you do CLEM on DEUP1-Ruby and these early foci at the cloud stage to see if they are visible at the ultrastructural level, relative to procentrioles, microtubules, and other electron-dense structures?
  
  We thank the reviewer for this question. We have done CLEM on the pericentrosomal cloud during very early steps of centriole amplification. This showed that DEUP1 early accumulation at the centrosome corresponds to a region rich in fibro granular aggregates, suggesting that DEUP1 may be translated here, through locally concentrated centriolar sattelites, known to be involved in local translation. Then, small deuterosomes and immature centrioles are formed, within this cloud of sattelites, confirming that the pericentrosomal cloud is a nest for centriole biogenesis (Fig. 2C-D + Fig. 2 Supplementary 2-6 for control and Fig. 4 Supplementary 3-4 for nocodazole treated cells). This also shows that immature deuterosomes are not necessarily round shaped, and can be deprived of centriole loading.
  
  (5) Check the scale bars- see Fig 4E. Check throughout.
  
  Done.
  
  (6) Figure 3 Supplement 1 and 2 don't match the legend and are likely reversed - which one is right?
  
  Done.
  
  (7) Technical issue - I couldn't play videos 6 or 16? Check these work.
  
  Done.
  
  (8) Nomenclature mammalian proteins- mouse or human- should be all caps DEUP1, PLK4, SAS6,etc. Watch your units- space between number and unit.
  
  This has been done.
  
  (9) Many of the graphs involve three biological replicates but why not plot the mean of each of the three experiments and do stats? The number of events measured may conflate the significance. Try using Superplots.
  
  Here is how we proceed: we count the number of occurrence of the phenotype we monitor, and the total number of cells. We apply a X<sup>2</sup> to test whether there is a significative difference between our replicates in each condition. If not, we pool the number of occurrence of the phenotype we monitor and the total number of cells for the 3 replicates, and for each condition. Finally we apply a X<sup>2</sup> between the different conditions. This is how we usually proceed to avoid comparing a mean of percentages. This is now explained in the methods.
  
  Minor points:
  
  (1) "DEUP1 is a centrosomal protein and assembles deuterosomes in the pericentrosomal region in brain MCC". I am not sure you have evidence that DEUP1 is a centrosomal protein. You don't seem to study the relationship between centrosomes and DEUP1? Rewrite this title and tone down this claim.
  
  This has been modified.
  
  (2) Why the crossbow micropattern (versus some other shape) - seems very specific but not discussed?
  
  We wanted a shape where centrosome is not localized at the center of mass of the nucleus. Among the corresponding patterns, the crossbow was the one where differentiating cells had less propensity to detach.
  
  (3) Figure 2 - are the foci of DEUP1 at the cloud stage smaller than at A stage? How do they grow? Measure the diameter at cloud stage, just after they leave the cloud and then once they move away from centrosomal cloud and each other. If so, and they do indeed grow in size from the cloud stage to the growth stage which I think your images suggest - do you envision this happening with the gradual addition of DEUP1 rather than fusion?
  
  Early deuterosomes are not easy to detect by light microscopy, because of accumulation of DEUP1 in the cloud. We did CLEM on the cloud of early A-stage cells to resolve the earliest deuterosomes which are often very small (see Fig. 2D, Fig. 2 Supplementary 2-6) suggesting that they grow, either by fusion, which we never observe in our movies at later A-stage, or by accretion of DEUP1. However, by light microscopy, we can detect very early but big deuterosomes, which we see splitting later on into smaller ones. So, we cannot conclude on the mechanism that regulate deuterosome size. This is now discussed in the discussion of the manuscript.
  
  You say in the discussion:
  
  "Consistently, we never observed fusion events of DEUP1 condensates in our time-lapse experiments. More importantly, we did FRAP experiments on endogenously tagged mRuby-DEUP1 in cells at the different stages of centriole amplification, and did not find significant recovery, supporting that centrosomal DEUP1+ foci and deuterosomes are not liquid-like structures (Figure 8 Supplementary 2)." How do you prove there is no fusion of deuterosomes?
  
  It is always difficult to prove the absence of something, we agree! But we did tens of movies with high temporal resolution and never observed fusion events. But, as we say in the previous question, the very early deuterosomes can be very small and we do not distinguish them from the DEUP1+ cloud by live imaging. So at this stage, we cannot say. But later on, during A- or G-stage and when deuterosomes are outside the cloud to be easily observed, we very often observe deuterosomes bumping into each others and stay in close contact for minutes, but then moving away. This, for us, supports the lack of fusion properties. But the question remains open. We now explain this in the manuscript and have added an example in video 28.
  
  If they are getting bigger as I think your imaging suggests from cloud to growth stage, then how is this happening?
  
  MT depolymerisation and dynein inhibition leads to the formation of very small deuterosomes. Dynein inhibition can even lead to a block in the formation of new deuterosomes suggesting that DEUP1 concentration is a crucial parameter for condensation into deuterosomes. Deuterosome growth may happen through oligomerization of DEUP1 molecules allowed by their dyne-independent concentration. Sorokin in 1968 proposed that a supersaturation of deuterosome components may lead to their solid crystallization into deuterosomes. Deuterosome size can also be regulated by a more complex molecular cascade, involving post-translational modifications of DEUP1 or PCM, such as phosphorylations driven by the cell cycle machinery. This would be consistent with the fact that deuterosomes are very big in the absence of CCNO, a cyclin required for entering the MCC cell cycle variant. This will need further investigations.
  
  I'm not sure FRAP actually proves fusion doesn't happen.
  
  Agreed, this is not what we wanted to say, we clarified. The FRAP experiment just suggests that it is not liquid-like.
  
  It is technically difficult to laser ablate individual or only subsets of deuterosomes...
  
  This is what was done but anyway, FRAP does not firmly show that deuterosome compartments are not liquid-like as we now precise.
  
  (4) How do you fix your cells for expansion as you have no preservation of cytoplasmic microtubules? You are saying that there is a "nest" of MTs but beta tubulin ONLY stains the cilia and centriole - why is this? Tyrosinated tubulin on regular confocal shows strong cytoplasmic staining. See Figure 3.
  
  Cytoplasmic microtubules do not preserve well through the expansion process. We did try a few different fixations and pre-extraction methods but they come at a trade-off to preserving centrioles. i.e. we could either preserve cytoplasmic tubes or centrioles but not both with the same processing method.
  
  (5) "PCNT puncta partially overlap with centrin (Figure 3 Supplementary 2C). At this stage, PLK4, the master regulatory kinase, and SAS6, one of the first centriolar components are either absent or present as small foci within the cloud, often on the wall of the parent centrioles (Figure 3B-C)." some arrows to highlight this would be useful - difficult to see?
  
  We have tried to make arrows on what is now Fig. 3 Supplementary 1 G, but there is to many CENTRIN colocalizing with PCNT. We have enhanced the contrast of the merge to make it more visible.
  
  (6) Figure 3I legend - what are the arrows pointing at? Yellow and white on inserts? ". Around the same time as tubulin, centrin is also recruited to procentrioles (Figure 3I). This stage is probably the stage that we previously documented as A"
  
  However you see centrin at DEUP1 foci in D, and you don't show any eg. SAS6 or PLK4 positive DEUP1+ structures lacking centrin specifically, centrin seems to be present on all the procentrioles in Figure 3I. Did I miss it where you show centrin negative procentrioles in the cloud?
  
  Fig. 3I (now Fig. Supplementary 1J), yellow arrows are pointing at centrioles with non-acetylated MT while white arrows point at acetylated MT. This is now indicated in the legend.
  
  Regarding CENTRIN, it is present as a diffuse staining around the centrosome since the very beginning of amplification (now in Fig. 3 Supplementary 1A with different contrasts), in addition to compose the parental centrioles. This staining can therefore overlap with DEUP1 staining when DEUP1 appears (Fig. 3 Supplementary 1B, E) but not necessarily. In live we observe that CENTRIN and DEUP1 foci can move independently at early stages (Fig. 2 Supplementary 1B, video 2). This is later on, as shown now in Fig. 3 Supplementary 1J (previously Fig. 3I), that procentrioles are all strongly positive for CENTRIN.
  
  A new paper (Laporte et al., Cell 2024) recently showed that the recruitment of CENTRIN on duplicating procentrioles first occurs at the distal end, visible by a small dot, and then appears gradually at the level of the inner scaffold when procentriole reach 160nm, the stage where POC5 appears, which corresponds to the A-to-G transition in our MCC progenitors (Al Jord et al., 2014). One can therefore consider that the same is happening in our cells, and that, with the CENTRIN cloud, we have difficulties to detect the distal CENTRIN dot. We have changed the text to add this reference and discuss CENTRIN apparition in MCC procentrioles.
  
  (7) " The DEUP1 asymmetry previously described at the centrosomal daughter centriole (Al Jord etal., 2014) becomes visible in some cells during the cloud stage (Figure 3B, N; Figure 3 Supplementary 2B) and in a majority of cells" difficult to see - maybe enlarge and single channel from Figure 3F-H in the supplemental Figure 3 to emphasise this?
  
  We have either changed the pictures or the contrast to be more representative with the quantifications. This is visible in Fig. 3A, D, E, G; Fig3. Supplementary 1E and now using correlative light and EM in Fig. 2 Supplementary 2, 3, 4 and Fig. 4 Supplementary 3-4. One has to look carefully at the daughter centriole (marked “dc”). We have not zoomed in since previous manuscript have already described this at later stages with bigger deuterosomes. You can refer to main or supplementary figures in previous manuscripts (Al Jord 2014, Khoury Damaa 2024) where serial sections span the entire deuterosomes and daughter centrioles and show, with nanometric resolution, that both strutures are frequently sticked to each others on tens of nanometers.
  
  (8) Do you have videos of DEUP1 oscillations with nocodazole to show a lack of oscillations?
  
  We have now added videos of DEUP1 oscillations under nocodazole and dynapyrazole treatments.
  
  (9) "In addition, co-staining of centrioles and nuclear pore proteins show a tight colocalization(Figure 6 Supplementary 1B)." I see the colocalisation in panel 1 but less obvious with panel 2 maybe have some more zoomed in panels and some quantification of the colocalization? Is it more striking at the G stage than the D stage?
  
  We edit and change the phrasing as ‘colocalization with NPC’ is not the good term. There is too many centrioles and NPC, they cannot do otherwise than colocalize… What we want to say is that there is a tight connexion with the nuclear envelope. This is why we present a single z, to show that centrioles and NPC are on the same z-plane. This is also clearly visible for centrioles that are loaded on deuterosomes that are around the nuclear membrane in the XY plane. We also added a video to show an entire z-stack of this kind of staining.
  
  (10) "Indeed, SAS6 normally disappears from procentrioles when centrioles are docked, just beforeciliation (Al Jord et al., 2014). This suggests that centrioles were able to degrade SAS6, a process also dependent on APC/C (Strnad et al., 2007), but failed to disengage from deuterosomes." Figure 6 Supplement 1E-F - are you sure it wasn't that Sas6 wasn't loaded correctly at the earlier stage and so is reduced recruitment rather than premature disengagement of Sas6? If it is indeed premature disengagement of Sas-6 - what about CP110 - does the CP110 get loaded and is it still present in noco treated cells arrested in the D phase?
  
  We do not observe SAS6-negative procentrioles on deuterosomes at G-stage but only on deuterosomes in D-stage cells (cells with partly disengaged procentrioles). This is why we hypothesize that, because of the long duration of D-stage and knowing that SAS6 is finally degraded at the end of amplification (Al Jord et al., 2014), we are in the presence of cells where SAS6 has been degraded but where centrioles did not manage to disengage. This is now clarified in the text.
  
  (11) Can you track deuterostome splitting live? Maybe not enough spatial or time resolution?
  
  One has to monitor in 3D (multiple z because deuterosomes move a lot), 2 colors, high temporal resolution (dt=2-5’; to be able to track a single deuterosome), and long duration (deuterosomes are sometimes touching each other and then moving away, giving the impression that they split). This eventually leads to the bleaching of the mRuby fusion protein… We have put an example of what we think is a deuterosome splitting in Fig. 6E (former Fig. 7D). But we decided to finally monitor with low temporal resolution (dt=40’) to avoid photobleaching, and analyze numerous deuterosomes and cells to quantify the number and size of deuterosomes over time in single cells.
  
  (12) The MT nodes - can you segment the tyrosinated MTs and define nodes and then quantify theDEUP1 presence on them?
  
  Please see answer to reviewer 1 regarding this point.
  
  (13) Figure 8 supp 1 (E): Representative XY distribution of CEN2-GFP+ centrioles at the end of migration (Sas6 negative) in brain MCCs treated with DMSO, Nocodazole 1µM and 5µM (48h). Scale bar, 5µm Bit more detail on how you define fully migrated vs still migrating centrioles in z. You say you are using Sas-6 negativity to define fully migrated cells in the legend, yet you say noco treatment leads to premature sas-6 negativity, and yet the apical migration takes longer upon noco treatment?
  
  Nocodazole does not lead to premature SAS6 negativity but to a partial disengagement which lead to SAS6 negative “mature” centrioles being still connected to deuterosomes. We define complete migration when all the centrioles are on the apical side of the nucleus. We now clearly define what “apical” migration stands for in the main text and changed the pictograms in Fig. 8G to clarify this.
  
  (14) Figure 8H and video 18 - it isn't obviously clear to me that the noco-treated cells are "more erratic" or how you decide what counts as apically migrated successfully. How do you control for drift in z? Can you track individual centrioles as you did in untreated and define what is "erratic about their movement?
  
  Erratic means that the centrioles are moving away from each others, and back, in a non-predictable way, instead of migrating up and gathering. The drift in z of the whole cell is visible because there is always some centrioles, that are apically located at the beginning, that remains on the apical membrane, probably because they are already docked.
  
  We have indeed followed the centrioles individually in the nocodazole condition. However, in the control, the XYZ coordinates of one of the centrioles of the centrosome, which normally don’t move, are substracted to the coordinates of all the other centrioles as explained in the method section. This allows to have a subcellular reference, and to circumvent the movements of the cell, which are non-negligible at all at this timescale. In the nocodazole treated cells, the centrosomal centrioles share the erratic movements of the other centrioles and can migrate up and down, which exclude them as a reference. Since the nucleus is also moving a lot, we were left with no reference point.
  
  (15) Figure 8 supplement 1E can you quantify the final area of centriole patch in XY upon noco treatment?
  
  It was in main Fig. 8J and is now in Fig. 8 Supplementary 1F.
  
  (16) Figure 8J legend- MBB is never defined as an acronym.
  
  Thank you for pointing this.
  
  (17) Define what is the frequency and how is it calculated - Figure 8J.
  
  This is the MBB patch area in µm<sup>2</sup>
  
  Text edits:
  
  (1) "Altogether, these results suggest that, in this non-tissue-specific proxy of MCC progenitors, microtubules organize the onset of centriole amplification in the pericentrosomal region."
  
  Sentences have changed.
  
  (2) "Increasing the temporal resolution to 5-15s reveals that DEUP1+ foci observe an exhibit oscillatory dynamics to at the centrosome (Figure 2E, colored arrows, Video 3, 5/10 cells observed for 1-4min)."
  
  Sentences have changed.
  
  (3) "stage procentrioles were involved in this perinuclear migration and distribution. In fact, this dynamic is reminiscent of the centrosome migration that occurs during the G2-to-M progression in cycling cells in preparation for mitotic spindle organization. In cycling cells, this" Grammar - maybe change to "stage procentrioles were involved in this perinuclear migration and distribution. This is reminiscent of the centrosome migration that occurs during the G2-to-M".
  
  Sentences have changed.
  
  (4) "We then wondered whether these microtubule-dependent dynamics was were required for an efficient subsequent centriole disengagement during the following D-stage."
  
  Sentences have changed.
  
  (5) "Then, monitoring tens of disengagement movies, we identified a transient stage during which disengaging procentrioles redistribute isotropically in the 3 dimensions, along the nuclear membrane (Figure 6A, 4:30, Video 7) before losing its contact to migrate to the apical surface (Figure 6A, 6:30 to 14:00)."
  
  Sentences have changed.
  
  (6) Discussion: "Since pioneer electron microscopy studies on basal body production in quail oviduct MCC 35 years ago (Boisvieux-Ulrich et al., 1987, 1990; Boisvieux-Ulrich et al., 1989), this work is the first to assess the role of microtubules in the now finely described centriole amplification process. This"
  
  Sentences have changed.
  
  (7) "Using live imaging on brain MCC, we highlight the existence of a nest composed of DEUP1, PCNT and Centrin2, pre-assembled before the onset of centriole amplification onset."
  
  Sentences have changed.
  
  (8) "Recently, formation of DEUP1 pure condensates in solution as well as FRAP experiments after overexpression of DEUP1 in MCC progenitors suggested that deuterosomes where are not liquidlike structures (Yamamoto & Kitagawa, 2019). Consistently, we never observed fusion events of DEUP1."
  
  Sentences have changed.
  
  (9) "This reminds is reminiscent of the centriole-to-centrosome conversion occurring at the G2-M transition followed by the associated microtubule dependent nuclear migration of new centrosomes at mitosis onset (Agircan et al., 2014)."
  
  Sentences have changed.
  
  (10) "Following individual trajectories requires high resolutive resolution spatio-temporal live imaging while avoiding excessive light exposure which disturbs centriole migration (Boudjema et al., 2024)."
  
  Sentences have changed.
  
  (11) "Using high temporal resolution microscopy, we further identify that individual dynamics is are complex and can be splitted between divided into the baso-apical migration, where centrioles move in a processive and more..."
  
  Sentences have changed.
  
  Reviewer #3 (Recommendations For The Authors):
  
  (1) Growing MEF-MCCs on micropatterns has successfully mimicked the dynamics of centriole amplification in brain MCCs, allowing the authors to study the spatial origin of procentrioles. Since this is a powerful system, a more quantitative description of the system will be informative and beneficial for future studies. For example: What is the efficiency of this system? Do the cilia that form in MEF-MCCs motile?
  
  The system of MEF-MCCs has been described in a previous paper from the Kintner lab. It seems that growing the MEF-MCCs on micropatterns did not ameliorate the ciliation which is partial, probably due to the absence of an apico-basal polarity.
  
  (2) Figure 2: The analogy drawn by the authors between DEUP1 oscillatory dynamics and centriolar satellites is intriguing. In early amplifying cells within the cloud, do these DEUP1 structures co-localize with the satellite marker PCM1?
  
  We have added immuno stainings of PCM1 in mRuby-DEUP1 / CEN2-GFP cells in Fig. Supplementary 2E. Within the centrosomal cloud, DEUP1 colocalizes with PCM1. Interestingly, this PCM1 concentration at the centrosome is dependent, at least in part, on dyneins. Then, PCM1 can localize around the deuterosomes, but it is never colocalized with deuterosomes (not shown). This is also showed by immuno-EM in Zhao et al., 2019. Although it was shown that PCM1 is a proximity interactor of DEUP1 (called ccdc67 at that time) by Firat-Karalar et al., 2014., absence of PCM1 staining on deuterosomes does not favor the hypothesis of PCM1 and DEUP1 being part of the same entities. One could hypothesizes that DEUP1 is transcribed locally within the satellites, explaining the colocalization of the 2 proteins and the + BioID results, and then form PCM1negative deuterosomes.
  
  (3) The authors propose a physical link between deuterosomes and centrosomes based on their oscillatory behavior. How are the oscillatory dynamics of DEUP1 affected by nocodazole treatment or inhibition of microtubule motors (i.e ciliobrevin treatment)?
  
  These oscillations are inhibited by nocodazole (Fig. 4D). They are also inhibited by dynapyrazole (Fig. 4D). We never succeeded in having a nice disruption of the Golgi apparatus with ciliobrevin and therefore we did not used it.
  
  (4) In addition to nocodazole treatment, it would be important to determine the consequences of microtubule stabilization by taxol and inhibition of microtubule motors during critical stages of centriole amplification where microtubules are reported to play a role for the first time in this manuscript. Another interesting area of investigation will be to study the extent to which microtubule PTMs contribute to these processes.
  
  We now blocks dyneins during the different stages of amplification. The results are in main and associated Fig. 4, 5, 7, 8. The role of microtubule PTM, is not in the scope of this manuscript.
  
  (5) Describing microtubule dynamics along with Centrin/DEUP1 dynamics will be informative in assessing whether these structures associate and/or move along microtubules? Have the authors performed their imaging experiments with SIR tubulin?
  
  Yes, we have tried hard! But we have encountered different obstacles:
  
  3-color video microscopy is phototoxic,
  
  siRTubulin is bleaching very rapidly
  
  The density of microtubules in MCC makes the observation hardly informative
  
  (6) Figure 5: The role of PLK1 in centriole-centrosome conversion and generation of multiple MTOCs can be tested with a PLK1 inhibitor for further confirmation.
  
  We have also tried but inhibiting Plk1 blocks the A-to-G and G-to-D transitions so it was not possible to uncouple the role of Plk1 in stage transitions versus centriole maturation.
  
  (7) Figure 6: The tight co-localization of nuclear pore proteins with centrioles poses questions about the role of nuclear pore proteins or other nuclear proteins that are associated with centrioles during centriole disengagement and migration. Considering the existing literature on centrosome-nucleus attachments, can there be a way to test this question within the scope of this manuscript?
  
  We have tried to deplete Nup133 but it’s killing the cells. Our additional experiments now show that the nuclear migration of centrioles during G-stage is dynein dependent, reinforcing the parallel with centrosome migration in prophase. We also added results from our scRNA sequencing (Fig. 5 Supplementary 1) showing that some key players of centriole migration to the nuclear membrane are conserved in the MCC cell cycle variant, and expressed with a comparable dynamics as to the canonical cell cycle.
  
  (8) Figure 8: Manually tracking a subset of migrating centrioles to define their dynamics during centriole migration and docking provides valuable analysis for determining the molecular mechanism of these processes. In addition to microtubules, does actin contribute to this process? Since centrioles eventually migrate to the apical side in nocodazole-treated cells, there should be other molecular players involved in this process.
  
  We did block actin polymerization but we found that the different stages were affected and that it would be better to dedicate a whole manuscript on the role of actin during each stage of amplification. We discuss the migration mechanism, and the putative role of actin, in the discussion.
  
  (9) The legends for Supplementary Figures 1 and 2 in Figure 3 are mixed and need correction.
  
  Figures have been remodelled.
  
  (10) In Figure 3P, the term "PLK4+" is labeled in bright green, which is not clearly visible. It maybe beneficial to change the color of this label for better visibility.
  
  We have tried to correct this.
  
  (11) Figure 6F quantifies "% tethered flowers" on the nuclear membrane. When quantifying, is the3D localization of DEUP1 flowers in both DMSO- and Noc-treated cells considered? A flower may appear to be on the nucleus in 2D, but it could be detached from the membrane in a 3D view.
  
  The quantifications are done in 3D. However, flowers that are below or above the nucleus are not quantified since the space is confined and the resolution in z to small to see whether they are connected or not. This is now precised in the legend.
  
  Before the editors proceed with an updated assessment, they've requested that we pass on some of the comments that have arisen as part of the evaluation of your revised manuscript. They feel that these concerns should be addressed before we proceed with issuing a formal assessment and publishing the revised Reviewed Preprint:
  
  We thank the reviewers and the editors for the corrections and insighfull comments. We apologize for our delayed answer and hope our corrections in the main text and some of the figures will give them satisfaction.
  
  The revised manuscript is greatly improved with nice new data regarding the role of microtubules. It also has changed quite a bit including the title. The new focus is on the cell and centriole cycle variants in MCC. While this helped to focus the study, there remains an important issue related to the interpretation of the data and the proposed 2-in-1 cycle model. Before providing the final updated assessment, we ask you to address the following points (which were raised already in the first round of review): The manuscript still contains statements that are not aligned with published work and the current view in the field regarding the timing of events during canonical centriole biogenesis. These timings are in conflict with your model that 2 centriole cycles are "superposed" in the MCC cell cycle variant, as currently presented. An alternative straightforward interpretation would be that multiciliogenesis uses an accelerated centriole duplication cycle where key steps occur concomitantly or in short succession instead of being separated by mitotic divisions as in the canonical cycle.
  
  We do agree with the acceleration of all steps into only one cycle, this is actually what we think we have proposed. When correcting our confusions as regard to centriole-to-centrosome conversion (as explained below) and putting the events in a scheme, this reveals that the events of the two canonical cycles nicely superpose, both in term of molecular composition and dynamics (corrected Fig. 9). We therefore maintain that the null hypothesis is that the acceleration is done through a superposition of events that; although driven by the same molecular machinery, are normally occuring in two consecutive cell cycle. We explain ourself briefly in two paragraphs, before answering point by point to the questions of the reviewers.
  
  As regard to centriole-to-centrosome conversion:
  
  We thank the reviewer for pointing out that we used “MTOC conversion” for what is normally called “centrosome maturation”. We have removed the term “centriole-to-centrosome conversion” during the first round of revision but we now realize that “MTOC conversion” leads to the same misinterpretation as regard to the literature on centriole duplication.
  
  The reviewer asks us to refer to the work of the Tsou lab (Wang 2011, reference now added in the manuscript) showing that daughter centrioles are “modified” (e.g. recruit PCM, become competent for MT nucleation and duplication) during late M/early G1. This “centriole-to-centrosome conversion” can’t occur for our procentrioles at this stage since they are not even born during the mitosis that precedes MCC differentiation. Also, in our cells, such modification does not include the capacity to become competent for duplication since we know that procentrioles become basal bodies without making any round of duplication (Al Jord et al., 2014).
  
  Also, we have not done the experiments to tackle the question on when our centriole become “modified-like”. What we can say is that during A-stage, they become progressively positive for PCM (Fig. 5 Supplementary 2) and a weak signal shows that some MT are seen emerging from them (Fig. 5 and Fig. 5 Supplementary 2, and see point by point answer).
  
  What we do see is that, at the A-to-G transition, they increase their PCM recruitment, show clear and strong MTOC ability (sometimes as strong as the centrosomal centrioles), and that this is associated with migration and separation of centrosome/deuterosomes around the nuclear membrane (Fig. 5). We therefore connect this to what occurs at the G2/M transition which is an increased recruitment of PCM protein, an increased ability to nucleate MT, associated with centrosome migration and separation at the nuclear membrane. Since this process in the canonical cell cycle is called “centrosome maturation”, we therefore should refer to this term in our study. However, centrioles in the MCC variants are not organized in centrosomes, so we now compare what we see to the “centrosome maturation” of the canonical cell cycle with an associated reference (Joukov et al., 2018), but name it “centriole maturation”.
  
  We have modified the text (track changes visibles) and the schemes (Fig. 5, Fig. 5 Supplementary 1 and 2, Fig. 9, Fig. 9 Supplementary S1; new versions uploaded) accordingly.
  
  As regard to 1.5 or 2 cell cycles
  
  Except for the “MTOC conversion” that we have now changed, as explained above, we think our work does suggest (depicted on Fig. 9) what the reviewer states for centriole duplication: “In the current view, centriole biogenesis starts in early S, elongation proceeds through G2/M and by early G1 it is complete. During M/early G1 centrioles disengage and newly formed daughters recruit PCM (centrosome conversion). Then these centrioles go through another complete cell cycle and when they reach early G1 again they have acquired DAs and SDAs. Key here is that biogenesis and disengagement/centrosome conversion are separated by the first mitosis (ensuring duplication occurs only once), and acquisition of DAs and SDAs is separated by another mitosis (ensuring that cells only form a single cilium)”.
  
  We feel that going from early S to a G1 phase, after 2 mitosis, is what one can call “2 cell cycles”. One of the paper that inspired us a lot when studying how the cell cycle machinery can drive centriole amplification in MCC is a paper from Jadranka Loncarek team (Kong et al., 2014) where they also state that “nascent centrioles gradually mature through 2 cell cycles”. Very interestingly, in this study they show that when they enhance Plk1 activation, they could erase centriole age and new procentrioles are able to recruit PCM and appendages within only 1 cell cycle, without mitotic progression, like what we see in MCC. We have added the reference in our discussion.
  
  Point by point answer
  
  (1) Original work on canonical centriole disengagement and centriole-to-centrosome conversion should be cited (e.g. PMID: 16862117, PMID: 21576395)
  
  As explained earlier, we used the wrong term since the begining. We do not speak about the centriole-to-centrosome (nor MTOC) conversion since we do not test when centriole modification (Wang et al., 2011) occurs in the MCC cell cycle variant. We know that PCNT is present on the procentrioles during A-stage (as shown in Fig. 5 Supplementary 2B), but we do not know when it is recruited (UExM did not work properly with this antibody). We quantify a weak MT staining in regrowth experiment during A-stage and see that procentrioles can be connected to MT in both brain MCC and MEFs (as shown in Fig. 5D, E for brain MCC and Fig. 5 Supplementary 2F for MEFs) , but we do not know when during A-stage they become competent for nucleation. We therefore did not speak about this process that we do not document. What we clearly document/quantify is the enhanced MT nucleation capacities at the A-to-G transition, concomitent with the nuclear migration (easily defined with Cen2-GFP or GT335 stainings) and that we compare to centrosome maturation occuring at the canonical G2/M transition.
  
  (2) The authors state in several places that canonical centriole formation and maturation takes two iterations of the canonical cell cycle. This is imprecise. Based on the above work and work by others, the broadly accepted view is that it takes 1.5 cell cycles. This difference matters for the final proposed model (see below). Reviewed e.g. here: PMID: 20869612; PMID: 30601682
  
  Our answer is in the preamble.
  
  (3) "Centriole maturation cycle superposes with centriole elongation cycle in the MCC cell cycle variant": Your description of the canonical cycle differs from the current view in the field. In the current view, centriole biogenesis starts in early S, elongation proceeds through G2/M and by early G1 it is complete. During M/early G1 centrioles disengage and newly formed daughters recruit PCM (centrosome conversion). All this occurs in 0.5 cycles. Then these centrioles go through another complete cell cycle and when they reach early G1 again they have acquired DAs and SDAs (total of 1.5 cell cycles). Key here is that biogenesis and disengagement/centrosome conversion are separated by the first mitosis (ensuring duplication occurs only once), and acquisition of DAs and SDAs is separated by another mitosis (ensuring that cells only form a single cilium).
  
  (4) Fig 5A, B and Fig. 9
  
  (a) Are 2 separate figures needed for the model? They seem redundant.
  
  We find it easier not to wait Fig. 9 to have the first part depicted.
  
  (b) The model shows loss of SAS6 throughout G1, but this already occurs during M/early G1
  
  Thanks. It was already ok in Fig. 9, we have modified for Fig. 5.
  
  The model shows "MTOC capacity/conversion" during S phase, but this occurs during early G1
  
  Thanks a lot, as explained earlier, we used the term MTOC conversion occurring in G1 for what is normally called centrosome maturation occurring in G2/M, as explained earlier. We do not speak anymore of MTOC conversion since we have not tackled this question (explained above). We have therefore removed MTOC conversion in the texts and the schemes and replaced it by “centrosome maturation” for the duplication cycle, and by “enhanced MT nucleation capacity” for the MCC cycle. To be clearer and schematize that procentrioles are competent for MT nucleation before G2/M or A/G transitions, we have added some MT nucleated from G1 procentrioles during the canonical cycle, and from late A-stage procentrioles during the MCC cycle.
  
  The model shows disengagement only in the second M phase, but this occurs already at the first M phase, directly following centriole biogenesis, right before centosome conversion.
  
  This is a big edition error in both Fig. 5 and 9. Of course the daughter centriole disengage during the first M-phase. This has been changed. Thanks a lot for spotting it. This, however does not contradict the hypothesis of superposition.
  
  We also added the acquisition of distal appendage which was written in Fig. 5 but not in Fig.
  
  9 for duplication during the second M-phase.
  
  When the correct timings are incorporated in the figure, the proposed superposition of two cycles is not an accurate description of the events. Instead, your data seem consistent with a model where MCC incorporates all steps in one cell cycle variant that lacks mitoses, so that disengagement and MTOC conversion occur together with centriole elongation, followed immediately by acquisition of DAs and SDAs.
  
  We do agree with the acceleration of all steps into only one cycle, this is actually what we tried to propose. When putting the events in a scheme, this reveals that the events of the two canonical cycles nicely superpose, both in term of molecular composition and dynamics (Fig. 9). We therefore maintain that the null hypothesis is that the acceleration is done through a super opposition of events that; although driven by the same molecular machinery, are normally occurring in two consecutive cell cycle. This is notably consistent with the findings of Kong et al., 2014 cited previously.
  
  (5) While all reviewers felt that there was no need to introduce the new term "nest", they leave it to the authors to keep it. However, the authors may want to consider that the term is still not introduced and explained properly, which may confuse readers. For example, while this section reads like an introduction to the term: "Correlative DEUP1 live-imaging and EM highlights the existence of a pericentrosomal "nest" in brain MCC", the term is already used two times before without explanation. The first mentioning is at the beginning of the results section and is followed by citations, which gives the impression that these studies describe the nest, which is not the case.
  
  The first mention of “nest” is in the end of introduction resuming the findings of the paper where the term is in the following context: “we found that centriole amplification emerges in a pericentrosomal “nest” concentrating core centriole/deuterosome elements”. We looked at nest definition in the Collins Dictionnary : “a structure or other place where creatures, esp. birds, give birth or leave their eggs to develop”, we felt this was clear. We added quotation marks around the term nest.
  
  Then, the result section opens with this sentence: “The origin of amplified centrioles in MCC remains controversial. Some live imaging experiments and electron microscopy suggest that the centrosome could constitute a nest for centriole and deuterosome biogenesis (Al Jord et al., 2014; Kalnins et al., 1972; Mori et al., 2017), but others have proposed that procentriole-loaded deuterosomes emerge independently from the centrosome location, all over the cytoplasm (Nanjundappa et al., 2019; Sorokin, 1968; Zhao et al., 2013, 2019).”. Here, the term nest is again used as a place of birth for centrioles and deuterosomes which is what is actually proposed in these papers. First, Kalnins el al., in 1969 (we made an error on the reference date, this has been changed), resume in their abstract “This observation suggests that all of the clusters may form initially in close association with the diplosomal centrioles”. Then, not to mention Al Jord 2014 which comes from our lab, the title of Mori et al. is “Cytoplasmic E2f4 forms organizing centres for initiation of centriole amplification during multiciliogenesis”, and in the paper, they show that E2F4 accumulates at the centrosome. This is now also proposed by collaborators for MCIDAS (Lu et al., 2025). We feel that these references, which are often omitted, are appropriated at this location.
  
  Then we continue with: “To test whether microtubules drive the organization of a centrosomal nest from which procentrioles emerge”, which keeps the notion of the place of birth.
  
  Then the title "Correlative DEUP1 live-imaging and EM highlights the existence of a pericentrosomal "nest" in brain MCC" arrives. In this section we first speak about a pericentriosomal cloud on which we zoom in using CLEM, to then conclude at the end of the section “Altogether live imaging mRuby-DEUP1/CEN2-GFP during early A-stage suggests that core deuterosome and centriole components are concentrated in a primordial cloud around the centrosome, which constitutes a nest where centrioles and deuterosomes concomitantly form before they move away from the centrosomal region (Fig. 2F)”.
  
  Finally, we begin the discussion section regarding the nest by: “We named this transitory compartment a “nest” since deuterosomes and procentrioles emerge specifically in this region and grow while moving away from it.”
  
  During the first revision, we tried to make it clearer. If this is still not the case after and the reviewer has another proposition of definitions/phrasing, we will be glad to consider it.
  
  As replied to the other reviewer, the term “nest” does not need to be retained as a new terminology. It is just a way for us to identify the transitory region and to best define one of its function/characteristic which is to host the birth of new deuterosomes and centrioles.
  
  The following comments from Reviewer #3 may also provide further context regarding the editors' remaining concerns:
  
  The authors have done an excellent job addressing the points I raised overall, and the revision is substantially improved in focus and clarity. That said, some concerns raised by other reviewers, particularly regarding terminology and statistical analysis, could have been addressed more fully. One issue remains insufficiently resolved. Several quantitative analyses (for example Fig. 5C and 5E) still appear to rely on pooled single-event measurements collected across three independent experiments. This approach can overstate statistical significance. The authors indicate in their rebuttal that they use chi-square tests to compare proportions and to justify pooling across replicates. However, I am not convinced this addresses the issue for the intensity-based and single event distributions shown in the panels specified above. I recommend that these key analyses be represented with biological replicates shown explicitly (superplot-style, with replicates distinguished).
  
  Our reply was for the comparison of proportions and not the intensity-based and single event distributions shown in the panels Fig. 5C and Fig. 5E. We have now changed our plots to represent biological replicates explicitly (superplot-style, with replicates distinguished). As for the statistical analysis: we evaluated differences in marker intensity between A-stage and G-stage samples using a linear regression model, with stages as the main effect and replicate as a fixed covariate, to account for batch variation. Statistical significance was assessed using Type II ANOVA.
  
  Separately, I continue to feel that some newly introduced terminology (for example, the "nest") may not be necessary at this stage. It may be sufficient to describe these structures and focus on their spatiotemporal behavior, composition, and measurable features, rather than assigning new names. Having read the authors' response, I understand that they would like to retain this terminology, which is acceptable; however, it may not be readily adopted by the field.
  
  The term “nest” does not need to be retained as a new terminology. It is just a way for us to identify the region and to best define one of its function/characteristic which is to host the birth of new deuterosomes and centrioles.
  
  Minor correction (remove "in MCCs" part from the following sentence):
  
  In MCC, PCM1 depletion alters deuterosome formation and centriole production in brain and airway MCC (Hall et al., 2023; Zhao et al., 2021).
  
  Done
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.02.09.579615v3
www.biorxiv.org www.biorxiv.org

Digital Polycomb regulation is predictive of functional iPSC heterogeneity

1
1. EMBOpress 08 Jul 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1
  
  The inverse relationship between PGCLC and DE efficiency is intriguing but under-explored. The observation that lines efficient for PGCLCs (Podx1, Kolf2) are poor at DE differentiation, and vice versa, is one of the key findings. Yet this is presented almost in passing. It would strengthen the paper considerably if the authors discussed whether their Polycomb-regulated gene set predicts DE efficiency with an inverse sign, and whether the logistic regression model can be tested on the DE data directly.
  
  As suggested by the reviewer, we have enhanced our analysis of definitive endoderm (DE) differentiation efficiency and discussed it more prominently in the manuscript in the section “A subset of Polycomb targets is predictive of differentiation properties”. In particular, we now examine the correlation between gene expression from RNAseq and DE differentiation. For this purpose, we took the genes used to predict PGCLC efficiency (all of which are regulated by H3K27me3), and examined the correlation between their expression and the efficiencies of PGCLC and DE differentiation. We found that that differentiation is not a binary outcome for DEs, with many intermediate cases observed. Thus, instead of using a logistic regression, we employed a sigmoidal regression scheme for DEs. For this analysis, we used the absolute difference between observed and predicted efficiency, resulting in a mean absolute error of 12% [95% CI: 8-22%].
  
  We have now included an extra panel (Figure 7C) showing the correlations between expression of the H3K27me3 genes and the differentiation efficiencies in both PGCLC and DE fates. As anticipated by the reviewer, this plot reveals an inverse correlation, which we highlight in the main manuscript. Further, we now mention that these genes can also be used to predict DE differentiation efficiency, with satisfactory accuracy (although the confidence interval is wide due to the small sample size, as in the case of PGCLCs).
  
  The mathematical model is elegant but the choice to vary parameter E across cell lines needs stronger justification. The model assumes that inter-line differences are driven by variation in the overall rate of H3K27 methylation (parameter E). This is a reasonable starting assumption, but the authors should discuss alternative scenarios more explicitly. Could variation in demethylase activity, PRC2 recruitment strength, or replication timing equally well explain the data? The fact that EPOP is differentially expressed is mentioned as a potential mechanistic candidate for modulating E, which is compelling, but the link remains correlative. The authors should be more cautious in their language here, stating that EPOP "may be sufficient to completely switch the transcriptional regulation" goes beyond what the data show.
  
  We thank the Reviewer for these suggestions and have tightened our discussion of these points. It is correct that variation in demethylase activity can also explain our data. We now explicitly point this out in the section “The behaviour of H3K27me3 can be explained using a simple mathematical model”. However, this possibility does not fit as well with the RNA expression data. While we found differential expression of the PcG gene EPOP, we did not detect any differential expression of the KDM6 histone demethylases (KDM6A-KDM6C). Therefore, we still favour our original suggestion of variation in the methylation rates over this possibility. In addition, we have moderated our wording on EPOP, stating in the Discussion that changes in EPOP expression “may be sufficient to alter the transcriptional regulation of specific target genes.”
  
  The predictive model for differentiation efficiency is promising but the PGCLC training set is too small for confident generalisation claims.The authors acknowledge this (109 features, ~21 data points), and the L2 regularisation is appropriate. However, the claim of 91% accuracy with a 95% CI of 78-100% on the PGCLC data should be presented more cautiously. With such a small dataset, the confidence interval is very wide. The more convincing validation comes from the DN data (143 lines from Jerber et al.), where the model trained on PGCLC data performs comparably to the full-transcriptome model. This cross-fate generalisation is quite strong and should be emphasised more prominently as the primary evidence for the validity of the model.
  
  We have followed the reviewer’s guidance and revised our language, when discussing the PGCLC case in the section “A subset of Polycomb targets is predictive of differentiation properties”. We have also emphasised more clearly the successful validation of the DN data.
  
  The claim of "epigenetic memory" during differentiation (iPSC to pre-ME) is suggestive but would benefit from additional analysis. The authors show that 60% of pre-ME DEGs overlap with iPSC DEGs, and that H3K27me3-cluster genes maintain their expression patterns. However, 60% overlap could partly reflect genes that are simply not regulated during the short 12-hour pre-ME induction. To strengthen this claim, the authors should compare the overlap rate for H3K27me3-cluster genes specifically versus other clusters. If Polycomb targets show significantly higher overlap than, for example, K4&ATAC genes, this would more convincingly support a Polycomb-specific memory mechanism.
  
  We have now performed this analysis, examining the persistence of DEGs into the pre-ME state (i.e., whether a gene that was differentially expressed in hiPSCs remains differentially expressed in pre-ME). Excluding the H3K9me3 cluster (the smallest cluster containing fewer than 25 genes), the K27 cluster is the most persistent cluster in terms of fraction of genes per cluster. When the clusters were pooled into the different variables involved (ignoring H3K9me3), H3K27me3 again emerged as the most persistent chromatin feature. Unfortunately, however, these results were not statistically significant, so we are unable to include them in the manuscript.
  
  Lack of genetic background analysis. Ten lines from nine donors will harbour substantial genetic variation. The authors note that genetic variation has been linked to iPSC heterogeneity but do not analyse whether the three "outlier" lines (Kucg2, Sojd3, Yoch6) share genetic features. For instance, common variants at PRC2 component loci, EPOP regulatory variants, or structural variants that might alter H3K27me3 domain boundaries. The HipSci consortium provides genotyping data for these lines. A targeted analysis of variants at Polycomb-related loci would be feasible and could either strengthen the epigenetic interpretation or reveal a genetic confounder.
  
  We thank the Reviewer for raising this important point. To investigate potential confounding effects due to genetic variation between the hiPSC lines in our panel, we performed a targeted analysis of genetic variation across Polycomb-related loci (H3K27me3 occupied loci and Polycomb group genes) in all ten cell lines (using whole genome sequencing data from the HipSci consortium). This analysis specifically tested whether the three “compromised” lines (Yoch6, Sojd3 and Kucg2) share consistent genetic variants relative to the seven “normal” lines. We identified 15 indels (out of 4115) that satisfied this criterium. However, all are located in non-coding regions and none overlap with ATAC-seq peaks. Hence, they are unlikely to function as gene regulatory elements (e.g., enhancers), but we cannot exclude the possibility that they affect gene expression in other ways. We have added a new Results section “Genetic variants shared between differentiation-compromised hiPSC lines” to discuss these points, as well as adding new text to the Discussion and Methods.
  
  Minor Comments
  
  The promoter definition ({plus minus}1 kb from gene start) is non-standard; most studies use a window upstream of the TSS rather than gene start. The authors mention they confirmed robustness to an alternative definition (-1 kb to gene start) but do not show this data. It should be included in the supplement.
  
  We now show the data for the alternative promoter definition in Supplementary Fig. 5B and Supplementary Fig. 7C. These results demonstrate that our conclusions are robust to different promoter definitions.
  
  For CUT&Tag, no spike-in normalisation is mentioned. Given that the key conclusions is based on quantitative comparisons of H3K27me3 levels across cell lines, the absence of spike-in controls is a potential concern. The authors should discuss whether technical variation between CUT&Tag libraries could contribute to the observed bimodality. At minimum, the correlation between replicates for H3K27me3 should be shown (presumably it is high, but this should be documented).
  
  We thank the Reviewer for this suggestion. As now shown in Supplementary Fig. 4C, the correlation between our H3K27me3 replicates is indeed high (R between 0.93 and 0.96). Hence, technical variation between CUT&Tag libraries is unlikely to contribute to the observed bimodality.
  
  The statistical test for the PGCLC/H3K27me3 overlap (p We thank the reviewer for noticing this. Indeed, this is the case. The test assumes independence of lines, which is in general a reasonable assumption, but may not always hold. Specifically, the Kolf2 and Kolf3 lines are derived from the same donor, which implies they are not completely independent. However, for all other lines, we still think independence is a reasonable assumption and, thus, the overall result of the test should be a good approximation. We have added this caveat to the manuscript.
  
  Figure 6A: the heatmaps for H3K4, ATAC and H3K27 are shown side by side but at apparently different scales; this should be clarified or made consistent.
  
  Indeed, the scales in all heatmaps are the same. We have clarified this in the captions of the figures.
  
  Reviewer #2
  
  1.) Figure 2B. Are all GO terms shown in the figure or are these just the top terms? If this is a suset then all terms should be provided as a supplemental table. If this is all significant terms, this is relitavely modest considering the number of DEGs (712) and is probably due to the fact that DEGs are derived from all comparisons and so could be diluted by the presence of multiple opposing effects. If this is the case, you could identify DEGs that define the PCA groupings and then re-run the GO analysis to potentially provide a better definition of the functional differences between groups of cell lines.
  
  The GO terms previously displayed were the top hits. We have now included all the significant terms in Supplementary Files 4 and 5 (for the Molecular Function and the Biological Process ontologies, respectively).
  
  Chromatin accessibility at gene promoters is a poor predictor of transcription, but it is likely that accessibility at distal regions (e.g putative enhancers) might be a better predictor. Did the authors look at this? This possibility should at least be mentioned when discussing the ATC-seq data and the lack of correlation with transcription.
  
  *
  
  We thank the reviewer for this suggestion. To locate additional regulatory regions, we downloaded tracks for the enhancer-associated marks H3K4me1 and H3K27ac for the ten cell lines from Todd and colleagues (Todd et al., Genome Biology, 2025; https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03658-8). We then intersected the ATAC-seq peaks with the H3K4me1 peaks in each cell line to identify putative enhancers. For each protein coding gene, we then identified the closest ATAC and H3K4me1 positive peak (among all cell lines), which we assumed was the most likely enhancer for that gene. We then evaluated the ATAC, H3K27ac and H3K27me3 signal within these enhancers for each cell line. With this information, we tried using a version of our SVM-based pipeline to improve our understanding of transcriptional regulation in genes within the ‘origin’ cluster (for which we failed to get significant insights from our standard SVM approach). Thus, we used seven variables as an input for the SVM: The four of the standard approach and three additional variables from the ATAC/H3K27ac/H3K27me3 signal at the nearest enhancer. However, for genes with an enhancer closer than 100kb, the performance of the SVM with enhancer variables was similar to the standard SVM (or slightly worse). If we focused on genes with enhancers 10kb or closer to the TSS (75 genes), then the SVM with the enhancer signal did modestly improve the prediction. However, when analysing the results more closely, it was only for a handful of genes (around 10) where the usage of the enhancer data was beneficial, and, even then, it was mostly down to the H3K27me3 signal rather than the more standard enhancer marks, such as H3K27ac or chromatin accessibility. This lack of improvement in the accuracy is probably due to our inability to identify the correct enhancers, as distance on the linear genome scale is often a poor predictor of enhancer-promoter interactions.
  
  Ultimately, because the improvement is for such a small number of genes, we have not included this analysis in the manuscript. However, we do now mention in the manuscript in section “Chromatin accessibility does not always correlate with transcription” that we tried to include distal enhancers but that this approach was not successful.
  
  2.) Fig 1C. Statistic overview at end of legend should be moved under section describing panel C in the legend.
  
  We have now made this change.
  
  3.) 'Furthermore, the transition value of 30% enables repression to be stably maintained even after DNA replication, when, on average, histone modification levels will be transiently halved'. Whilst this is potentially true and a plausible interpretation, you cannot exclude that the signal is not derived from different cell populations in the culture due to cellular heterogeneity such as cell cycle or spontaneous differentiation. This possibility should be noted in the text.
  
  We thank the Reviewer for this suggestion. Due to the possible alternative explanations pointed out by the reviewer, and to minimise any possible misunderstandings, we decided to drop this sentence from the manuscript, which is not required for any of our main conclusions.
  
  4.) 'Higher values indicate stronger correlation or anticorrelation and, thus, stronger differences between cell lines.' I don't believe this makes sense as written. Do the authors mean stronger partitioning of different iPSC lines into clusters?
  
  Indeed, this sentence wasn’t very clear -- we have now rewritten it to improve clarity: “Because absolute correlation values were used, high values indicate that expression profiles between two cell lines are either highly correlated or highly anticorrelated. Across all pairwise comparisons, high values suggest strong partitioning of cell lines with highly similar or markedly different transcriptional profiles.”
  
  5.) 'We found that 60% of the DEGs in pre-ME were also DEGs in hiPSCs'. This needs to be made clearer. Do the authors mean DEGs between iPSCs following differentiation or DEGs between undifferentiated iPSCs and their differentiated derivatives? The former suggests that the iPSCs are already partially differentiated and that differentiation in promoted or constrained by this starting state whilst the latter would suggest that some lines are skewed towards the mesendoderm.
  
  We mean that of the genes that are differentially expressed between the 10 lines in pre-ME, 60% were also differentially expressed between the 10 lines in iPSCs (prior to differentiation). We have reworded this sentence to make it clearer.
  
  6.) 'Finally, histone marks in the iPSC state were also predictive of expression in the pre-ME state, albeit with slightly lower accuracy than for the iPSC state (Supplementary Fig. 8C, D), which may indicate the existence of an epigenetic memory system that is maintained during differentiation.' Or the retention of an epigenetic signature that failed to be erased during the initial generation of the iPSCs.
  
  We agree with the reviewer that this is entirely possible: our point is that memory states may persist from iPSCs to pre-ME. The memory state may of course predate the initial generation of the iPSCs. We have amended the section “Pre-ME transcriptomes suggest inheritance along the developmental trajectory” to include this possibility.
  
  7.) 'To minimise the risk of overfitting, only reliable targets were retained'. Whilst this is outlined in the methods as stated, a summary of what this means should be included in the body text.
  
  We thank the Reviewer for this suggestion. We have included the required extra text in the section “A subset of Polycomb targets is predictive of differentiation properties”. We have also revised the performance metrics so that they are strictly comparable with the results of Jerber and colleagues (which implies, in some cases, removing error bars, as in the results of Jerber et al., 2021). The reviewer may notice differences in the values reported but all our claims remain valid.
  
  Reviewer #3
  
  The major claim that among histone modifications that have been profiled in this manuscript, H3K27me3 is the most predictive for expression is supported by the analysis. However the analysis may be skewed because the RNAseq and the H3K27me3 difference are driven by the extreme skewing of the 3 cell lines Yoch6, Sojd3 and Kucg (Fig 2A, 2C and 6A). Two of these lines cannot form EBs at all, a major failure in their pluripotent characteristics.
  
  We thank the reviewer for raising this fundamental point. Our aim for this study was to use iPSC lines that have passed existing standards and could easily be chosen from a panel of lines by an unsuspecting user. Indeed, the differentiation-compromised lines in our study are indistinguishable from other PSCs from a validated source that extensively characterises the distributed material (HipSci resource, https://www.hipsci.org). This source categorises these cell lines as correctly reprogrammed and fully pluripotent. In addition, we now present PluriTest data (doi: 10.1038/nmeth.1580) from all normal lines available from the HipSci resource (835 lines) and highlight the ten cell lines used in this study (see Supplementary Fig. 1A). All cell lines in our panel have pluripotency scores over 20, and all but one (Bima1 – which notably differentiates efficiently into PGCLCs and DNs) have novelty scores below 1.67; these values have been empirically determined as pluripotency signature thresholds (Müller et al., 2011). This analysis clearly demonstrates that the cell lines in our study are not outliers, an important fact which we have now added to section “Marked differences in the developmental efficiency of hiPSC lines”.
  
  Furthermore, one of the key advances of our study is that we identify a chromatin and transcription signature that will enable researchers in the stem cell community to identify iPSC lines with compromised differentiation potential early on. We also note that compromised differentiation potential is widespread among human PSCs. For example, Jerber et al. report that 48 out of 183 hiPSC lines could not be differentiated successfully into dopaminergic neurons (doi:10.1038/s41588-021-00801-6). Thus, our study addresses an important and widespread issue in the stem cell field, a point we now emphasise in the introduction of the manuscript.
  
  Further, one of the lines that can form EBs, fails to make PGCLCs but can differentiate into DE, Letw5 has neither the RNA profile nor the H3K27me3 profile of the skewed iPSC lines. Therefore, whether H3K27me3 truly influences phenotype at least in terms of PGCLC and DE differentiation of iPSCs is not supported by the analysis in the manuscript.
  
  We agree that the behaviour of Letw5 is interesting, and we discuss its properties extensively in section “Marked differences in the developmental efficiency of hiPSC lines” and Fig. 1E. As we state, comparing Letw5 with Kucg2, “These findings suggest that Kucg2 hiPSCs have limited developmental competence to generate PGCLCs, while Letw5 hiPSCs are capable of PGCLC specification but fail to sustain the germ cell fate, pointing to a defect in fate maintenance rather than in initial developmental capacity.” Hence, the evidence points towards Letw5 having a separate defect which is unrelated to the impaired Polycomb regulation identified in the other three problematic lines. We also emphasise this point in section " A major role for H3K27me3 in hiPSC transcriptional heterogeneity", where we state that "[...] in this case [Letw5], a distinct mechanism, independent of H3K27me3 dysregulation, may result in impaired germ cell development."
  
  What are the predictions from applying SVM to data from only the 6 cell lines Podx, Kolf2, Kolf3, Bima 1, Qolg1, Wibj2. The DE differentiation potential will also have to be measured for each of these cell lines.
  
  Following the reviewer’s suggestion, we applied the SVM only to data from those six cell lines (which do not include any of the defective cell lines), see section “Linking variation in chromatin features with transcriptional output using SVMs”. Given that the SVM only takes as input data from differentially expressed genes, the set of genes used decreased markedly as there are fewer genes differentially expressed among these cell lines (125 DEGs). Nevertheless, for this subset of genes, the SVM still retains satisfactory accuracy (both AUROC and overall accuracy in the 70% to 75% range; now shown in Supplementary Fig. 6H). This result is particularly remarkable given that the SVM is operating with very little data (five datapoints for training and one for testing, per gene) and that the cell lines are very similar to each other. As the reviewer points out, we hope these results might encourage other researchers to pursue similar analysis approaches.
  
  For DE differentiation, we previously included data (Supplementary Fig. 3B, C) for the following lines: Podx1, Kolf2, Kucg2, Letw5, Sojd3, and Yoch6. Only Kolf3, Bima1, Qolg1 and Wibj2 were missing. We have also now measured DE differentiation in three remaining lines (Kolf3, Qolg1, and Wibj2).
  
  The above analysis may also shed light on howextreme the input parameters must be for SVM to be a good classifier? Such an analysis may also assist future users of the method to assess whether SVM would be useful for their datasets.
  
  Please see our previous answer. We argue that the results presented above for six similar cell lines imply that this type of computational approach can have general applicability and does not require extreme inputs. We have followed the Reviewer’s suggestion and now incorporate this finding in section “Linking variation in chromatin features with transcriptional output using SVMs”: “Furthermore, the SVM does not require extreme values or outliers, and hence the overall approach could be of rather general applicability. As a performance verification, we applied the SVM to a dataset containing only the cell lines that could generate PGCLCs with high or intermediate efficiency, and while the performance is slightly reduced, it remains satisfactory (accuracy 75%; Supplementary Fig. 6H).”
  
  If the SVM on the 6 lines does not predict a binary switch in H3K27me3 to be predictive could the authors incorporate DNA methylation and H3K4me1 from the same publication as the chromatin accessibility. Such an analysis may also assist future users of the SVM method to assess the number of parameters required to separate closely related phenotypes.
  
  See previous answer. We note that DNA methylation data for our hiPSC panel is not available; it is not part of the study that the reviewer mentions (https://link.springer.com/article/10.1186/s13059-025-03658-8). Although H3K4me1 data is available in Todd et al., we did not find that this data improved the ability of our model to make successful predictions (see reply to Reviewer #2, point 1).
  
  Most gene regulation occurs at the level of the enhancer, restricting analysis to promoter associated histone modifications is limiting.
  
  We thank the Reviewer for raising this very valid point. Please see response to Reviewer #2, point 1.
  
  One puzzling piece of data is the very high 60% of PGCLCs on day 1 of differentiation (Fig 1E) in the competent cell lines. BLIMP1 is expressed in hiPSCs, calling into question whether the initial differentiation into pre-ME was successful.
  
  We think there is a misunderstanding regarding the experimental timeline. Day 1 of differentiation in Fig. 1E refers to one day after PGCLC induction from the pre-ME stage following the addition of BMP4, SCF, LIF, and EGF (see schematic in Fig. 1A). We have revised the text to make this clearer. Furthermore, BLIMP1 (PRDM1) is not expressed in hiPSCs. To demonstrate this, we now show the expression levels of BLIMP1 (PRDM1), B2M (low to mid-level expression in most human cell types), SOX2 (highly expressed pluripotency marker), and HOXC10 (differentiation marker that is not expressed in PSCs) across our cell line panel. At this scale, BLIMP1/PRDM1 expression is not detectable. When SOX2 is omitted from this bar plot, the very low expression levels of BLIMP1/PRDM1 become apparent, as it is close to the levels for the differentiation marker HOXC10. We conclude that BLIMP1/PRDM1 is expressed at extremely low levels across our ten hiPSC lines.
  
  The H3K27me3 and H3K9me3 signals are integrated over the entire gene as inputs into the SVM, however PCA analysis to separate the cell lines is only shown for the promoter
  
  This is not quite correct. For the PCA analysis for the histone marks and ATAC-seq, we used both the promoter region (Fig. 2C, Supplementary Fig. 5B) and the gene body (Supplementary Fig. 5A), with similar results. For the SVM, for H3K27me3 and H3K9me3, we primarily used the entire gene region, but we also tested other regions (Supplementary Fig. 6A), with similar or slightly inferior results.
  
  SVMs have been used to predict enhancers from epigenomic data PMID: 22328731 and to classify cancers PMID: 11120680. Applying SVM as classifier for gene expression prediction is not very novel.
  
  We thank the Reviewer for raising this point. We did not claim that the use of SVMs was itself novel. It has certainly been used in other contexts, as the reviewer points out, to predict enhancers, for cancer classification, and to predict expression patterns. In fact, SVMs had already been used to predict gene expression from chromatin features (Cheng et al, 2011; already cited in our manuscript). What is novel in our work is the reverse-engineering of the method to extract mechanistic information about each gene (i.e., assign a chromatin feature set relevant to the changes in expression). This computational methodology, in conjunction with the rich experimental dataset produced, allows us to classify differentially expressed genes in terms of the chromatin features that enable prediction of transcription. This highlights the differences between cell lines and enables further downstream analysis such as, mechanistic models of histone modification dynamics and the prediction of iPSC differentiation efficiency. We have rewritten the Introduction to the manuscript to better emphasise these points.
  
  The biological insights are limited. For example, the observation that " a variety of forms of transcriptional regulation" Fig 4B. It is well known that H3K27me3 decorates lineage specifying genes and is part of the bivalent domain with H3K4me3. The anti-ATAC category could represent locations where a repressor is bound DNA which would also result in increased accessibility and is not a surprising result.
  
  We believe our work does offer significant biological insights. While we agree that it is well known that H3K27me3 decorates lineage specifying genes, it was not previously known that digital Polycomb dysregulation at specific loci was a key feature controlling the ability of pluripotent cell lines to differentiate properly. In addition, we have been able to identify a core set of genes whose H3K27me3 profiles are highly informative for differentiation efficiency. Moreover, we are able to explain the variation in H3K27me3 levels by simple, quantitative, mathematical model.
  
  Finally, the anti-ATAC category is a minor finding and not one of the central conclusions of this paper. Nevertheless, we appreciate the Reviewer’s suggestion and have incorporated this possible interpretation into section “Chromatin accessibility does not always correlate with transcription”.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2025.07.25.666753
Jul 2026
www.biorxiv.org www.biorxiv.org

Stable excitatory-inhibitory synapse balance despite dynamic turnover

1
1. Public_Reviews 07 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  In this valuable study, the authors developed long-term imaging tools to simultaneously monitor the temporal and spatial dynamics of excitatory and inhibitory synapses and reported that excitatory and inhibitory synapses need to develop synergistically during synaptogenesis to maintain balance. While the analysis and quantification of the imaging data are incomplete, there is convincing evidence that the developed tools are feasible. If these tools can function stably in vivo, their applications will be much broader.
  
  We have completely overhauled our analysis and quantification methods and generated custom-made drift correction and tracking pipelines. Also, we have tested these tools ex vivo.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  By imaging the dynamics of synaptic proteins in cultured neurons, this study presents significant findings regarding the dynamics of excitatory and inhibitory synaptic proteins during development. The evidence shows that the ratios of excitatory and inhibitory synaptic proteins are stable during synapse development. This discovery advances our understanding of the complex mechanisms governing synapse formation. The strength of the evidence is robust, as it is supported by a combination of biological assays and endogenous labeling.
  
  Strengths:
  
  This research sheds light on the dynamics of the excitatory and inhibitory synapses during development. It is crucial to understand that while excitatory synapses and inhibitory synapses are developed independently, the ratio of their number is relatively stable during development, maintaining a stable excitatory/inhibitory ratio.
  
  Important findings and implications in the research include:
  
  (1) Persistent Synapse Dynamics: Excitatory and inhibitory synapses remain highly dynamic even in mature neurons (DIV12-14), challenging the dogma that synaptic structures are stable after the synaptogenesis stage.
  
  (2) Maintained E/I Balance: Despite ongoing synapse turnover (formation/elimination) and presynaptic terminal reduction, the overall density and ratio of excitatory-to-inhibitory synapses remain relatively stable during circuit maturation (Figure 7).
  
  (3) Developmental Shifts: While presynaptic compartments decrease over time, postsynaptic sites increase, suggesting independent regulation of pre- and postsynaptic elements within a stable E/I framework.
  
  We thank the Reviewer for their positive feedback and careful review of our study.
  
  Weaknesses:
  
  This study focuses on specific synaptic proteins within synapses, which may not fully represent the dynamics of other synaptic machinery; also, whether similar observations exist in vivo is still unknown. Further research is needed to explore the implications of these findings in more complex neuronal environments.
  
  We also thank the Reviewer for their insights and suggestions. We have added discussion of this important point to the Discussion section. Furthermore, we have tested the applicability of our tools ex vivo (new Figures 1, 4, and 6). While using these tools in vivo for live imaging is the eventual goal, we started in a reduced culture system given the relative simplicity. Our current study now provides a framework for future experiments applying these approaches in more complex in vivo systems.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The Garbett et al. identified a critical need to begin to understand the interplay between the assembly, maturation, and elimination of excitatory and inhibitory synapses. They also detail the lack of reliable tools to address this gap in knowledge. Here, the authors developed synaptic reporters expressed by lentiviruses (mClover3-Homer1c, HaloTag-Syb2, and tdTomatoGephyrin). They combined these reporters with resonance scanning confocal imaging to measure synapses over a 15-hour period during neuron development and in mature neurons in primary hippocampal cultures. Using these reporters in the same neuron, the authors compared the ratios of postsynaptic excitatory and inhibitory specializations that co-localize with presynaptic terminals during development and in mature neurons and found that they are stable across time points. Finally, the authors developed CRISPR/Cas9 tools (TKIT) to knock-in endogenous fluorescent tags (GFP/tdTomato-Gephyrin) or epitope tags (HA-Bassoon and HAHomer1) to begin to study synapse dynamics using endogenous proteins. I believe this paper highlights an important gap in knowledge and begins to offer methodologies to determine the dynamic coordination between excitatory and inhibitory synapses.
  
  Strengths:
  
  (1) The experiments are well-designed and carefully controlled.
  
  (2) The authors carefully validated the reporter and TKIT constructs.
  
  (3) The authors provide strong proof-of-principle for the use of the reporter constructs to track synapse formation, maintenance, and elimination over a 15-hour period.
  
  (4) Ingenious use of technologies (reporters, TKIT, and resonance scanning confocal microscopy) to develop a platform for future studies of synapse dynamics.
  
  (5) Strong evidence supporting that the ratio of excitatory and inhibitory synapses (those that oppose syb2) stays constant through development.
  
  We thank the Reviewer for their positive assessment of our study.
  
  Weaknesses:
  
  Overall, this is a well-executed study that develops tools to simultaneously image excitatory and inhibitory synapse dynamics and represents an important first step to address the fundamental question regarding the coordination between these two types of synapses.
  
  Minor weaknesses of the manuscript include:
  
  (1) The lack of a characterization of endogenous Homer1-positive excitatory synapses using TKIT.
  
  We attempted to perform live imaging of endogenous Homer1-positive synapses using the TKIT approach by tagging endogenous Homer1 with mClover3 but encountered low signal/noise while live imaging. This prompted us to focus our current study on live imaging endogenous Gephyrin. Future studies using more robust tags (e.g. StayGold, HaloTag) for TKIT tagging of endogenous Homer1 will likely help circumvent this issue.
  
  (2) Discussion about other approaches to study excitatory and inhibitory synapses using endogenous proteins (e.g., intrabodies - FingR or nanobodies) should be included.
  
  This important point was also raised by other Reviewers. We have now significantly expanded the Discussion section, including discussion of this point.
  
  (3) The activity state of a neuron and/or a synapse might alter the dynamic properties (formation, maintenance, and/or elimination). A discussion on whether the overexpression of Homer1 and/or gephyrin might alter synapse/neuron activity would provide greater interpretability of the results. A discussion of the potential limitations and benefits of the reporter and TKIT approaches would be beneficial.
  
  We agree and have added discussion of these points to the Discussion section.
  
  (4) A description and interpretation of the computational approach to calculate particle tracking would be helpful. I found that particle tracking figures, while elegant, are difficult to interpret.
  
  As discussed in more detail below, we have generated drift correction and particle tracking approaches for the revised manuscript. We now elaborate on these new approaches in the paper.
  
  We thank the Reviewer again for their very helpful input and suggestions.
  
  Reviewer #3 (Public review):
  
  In the present study, the authors describe the development of new tools and imaging strategies to assess the concomitant development of excitatory and inhibitory synapses in dissociated neuron cultures. To this end, they generate fluorescently tagged constructs of excitatory and inhibitory synapse marker proteins using either conventional overexpression or CRISPR-based strategies. They then image these marker proteins over a timespan of 15 hours to assess synaptic dynamics at different developmental timepoints. Based on their data, they conclude that excitatory and inhibitory synapse development occur in concert to maintain a functional balance despite individual synapse turnover.
  
  Overall, this study addresses an interesting question, i.e., the interplay between the development of excitatory and inhibitory synapses, which has important implications, particularly for neurodevelopmental disorders in which the balance of excitation and inhibition is disrupted. The experiments are technically solid and well-executed, and the individual images are highly compelling.
  
  We thank the Reviewer for their positive assessment of our study.
  
  However, a number of aspects remain to be addressed in order for the study to support the claims made by the authors. First, the novelty aspect of the development of the fluorescently tagged synaptic proteins is unclear, since reporters of this nature are in routine use in many labs. Second, the analysis of the acquired images often seems incomplete, with only example images but no quantification shown, or the distinction between spatial and temporal dynamics appearing unclear. Third, given this incomplete analysis, the interpretations of the authors are not always convincingly supported by the data presented. In conclusion, substantial improvements are required to render the main messages of the study clear and compelling.
  
  We agree and have incorporated all of the Reviewer’s suggestions in the revised manuscript (please see below).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  This is an interesting study. This reviewer has the following questions/comments for the authors:
  
  (1) Please provide evidence that the gRNAs targeting each gene of synaptic protein have no offtarget effects.
  
  We now include analysis of off-target effects for the TKIT tools (new Figure S6).
  
  (2) While structural E/I balance is shown, functional electrophysiological validation (e.g., mEPSC/mIPSC ratios) is absent. It is interesting to know whether the balanced functional structural changes translate to functional?
  
  We thank the Reviewer for this insightful suggestion and now include these recordings in the revised paper (new Figure 8).
  
  (3) In lines 217-218, please define thresholds for "stable" vs. "dynamic" puncta (e.g., temporal and spatial criteria).
  
  We more clearly define our categorization parameters (e.g. new Figure 2).
  
  (4) In Figure 5B: The low co-localization between endogenously tagged Bassoon and antibodystained Bassoon is likely due to the low TKIT efficiency. Quite a few HA-tagged Basson signals are insensitive to Basson-antibody. The authors are suggested to explain those.
  
  We thank the Reviewer for identifying this and add discussion to the Results section.
  
  (5) For the data analysis. If each n represents an independent neuronal culture, should the authors are suggested to provide the number of neurons/dendrites analyzed for each independent culture?
  
  We have added these important details to the manuscript.
  
  (6) Regarding the title, the author used the term "coordinated dynamics". This reviewer finds it is a bit over-claim because the stable ratios of the number of excitatory synapses and inhibitory synapses are likely an association, not actively "coordinated". I suggest that the authors rephrase this.
  
  We agree that we cannot argue that excitatory and inhibitory synapses are causally coordinated in our current study. Their levels are likely associated by either association or direct coupling, which we now discuss further in the first paragraph of the Discussion. We have rephrased the title accordingly.
  
  Reviewer #2 (Recommendations for the authors):
  
  I have only minor suggestions that I think will improve the manuscript:
  
  (1) Please define Syn1/2 on line 129.
  
  We have defined this in the revised paper.
  
  (2) For Figures 2B, C, and 4B, C: are the puncta in panel C from the dendrites in panels B? If so, it would be helpful to identify the ROIs selected in panels C.
  
  We now include this in new Figure 2.
  
  (3) For the particle tracking figures, while the ability to track all synaptic puncta is very impressive, it is sometimes difficult to clearly track the lifespan of a synaptic puncta from the current figures. I believe that it would be helpful if the authors selected specific examples of synapses formed, maintained, and eliminated.
  
  We agree and now include more examples.
  
  (4) I believe that more detail about the computational approach and analysis for the particle tracking (Figs 2E and 4E) would help the interpretability of the figure.
  
  This important point was also raised by the other Reviewers. We generated custom tools during the revision that significantly expand the capabilities of our tracking approaches and more clearly describe them in the revised manuscript.
  
  (5) Similar to the rigorous gephyrin TKIT analysis (Fig. 6), did the authors perform a similar analysis for Homer1c TKIT? This might be valuable to confirm that overexpression of the Homer1 reporter does not indirectly alter synapse dynamics.
  
  We attempted to perform live imaging of mClover3 TKIT-tagged endogenous Homer1 but encountered low signal/noise with live imaging. We now add discussion that optimization of more robust tags (e.g. StayGold, HaloTag) will likely be necessary for live imaging of different target proteins.
  
  (6) The tools developed by Garbett et al. have the potential to be broadly utilized in the field to provide new insight into the coordination of excitatory and inhibitory synapses. It would thus be helpful for the authors to include a discussion about the strengths and limitations of the reporter and TKIT methods relative to other approaches used to live image synapses (e.g., intrabodies (FingR and nanobodies)).
  
  We have now significantly expanded the Discussion to include these important points.
  
  (7) In the discussion, can the authors elaborate on whether it is experimentally feasible to apply their TKIT labeling of gephyrin and Homer1c in the same neuron to assess the endogenous excitatory and inhibitory synapse dynamics from the same neuron?
  
  We have added discussion of this point and also proof-of-concept data supporting tagging of two postsynaptic targets within the same neuron (new Figure S5D).
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) While the new tools described in the current manuscript can undoubtedly be used for the described purposes, the novelty of these tools is unclear to me. Viral vectors expressing fluorescently tagged versions of Homer1, synaptobrevin, and gephyrin are commercially available, e.g., via Addgene, and they are in routine use in many labs. CRISPR-mediated strategies for this purpose have also been previously reported (e.g., Willems et al. 2020, PLOS Biology; Fang et al. 2021, eLife). It is not clear to me how the tools reported here present a significant improvement over existing resources, other than that they use different fluorescent tags. If this aspect is a central part of the current manuscript, it should be expanded on in the discussion, including a direct comparison with available tools to highlight the novel aspects.
  
  We agree and have significantly expanded the Discussion to include these important points. Also, rather than argue that our tools are superior to pre-existing approaches, we adjust the text to argue that our tools and analytical approaches have been designed and optimized for the purposes we apply them to.
  
  (2) In addition to generating new tagged constructs, the authors also state that they have developed new imaging and analysis strategies to facilitate long-term assessment of synaptic dynamics. However, in many figures, they present only sample images, with little quantification to allow assessment of the wider relevance of the imaged synapses. For example, in Figures 2C and 4C, they present one example each of, e.g., a stable, nascent, transient, or eliminated synapse. However, they do not provide any quantification on how frequently any of these events occur, or whether they can be reliably quantified at all. These quantifications (i.e., percentage of each event type across a large population of synapses) would be necessary and should be added to demonstrate that this tool can be used for more than single example images.
  
  We have generated custom-made drift correction and particle tracking approaches for the revised manuscript. Based on the reviewer’s suggestion, we have quantified the relative frequencies of stable, nascent, transient, and eliminated synapses (Fig 2B-G, Fig3A-F, Fig 5A-F, Fig 7B-C). These metrics greatly enhance the biological interpretation of our results. We have also added a supplemental movie with an example image with corresponding categorized tracks for each puncta type (Movie S3)
  
  (3) The authors do present an automated visual representation of spatial track length across the neuron, e.g., in Figure 2E and 4E, although this is also not quantified. Moreover, the track lengths appear surprisingly short, despite the authors' claims that their analyses 'highlight the dynamic nature of excitatory synapses over these timescales'. It is not clear to me whether these short tracks are more than just jitter, either in the synapses themselves or in the images due to technical limitations. E.g., in panel 2E, I see very few examples in which the track is not simply centered around one point, but actually expands over a distance. Quantification of the distance between start and end points of the tracks would be important to support the claim that these synapses are dynamic in terms of spatial translocation (if that is what the authors meant). Or if the 'dynamic nature' of the synapses referred to temporal dynamics, it is unclear to me how this information can be gained from the represented tracks.
  
  We thank the reviewer for these excellent points. To accurately access spatial motion, we drift-corrected our images with a custom correction algorithm to eliminate stage or microscope drift as a source of contaminating motion (See Methods, Movie S2), in addition to collecting time-lapse imaging with Nikon perfect focus. We noticed heterogeneity in our cultures such that some areas contained very mobile neurites, while other remained stationary (Fig. S1). We binned movies into either moving or still neurites and assessed spatial metrics as suggested (Fig. S1A). Consistent with our binning, puncta on moving neurites showed larger net displacement (distance between start and end points), but puncta on still neurites also showed ~1 µm net displacement (Fig. S1D). We also quantified puncta speed and found that puncta on moving neurites generally moved faster (Fig. S1C). We appreciate the reviewer’s insight that track length were surprisingly short, and after employing our drift correction and revised tracking methods, we now see substantially longer track lengths (Fig 2E, Fig 3C & F, Fig S2B & C). We additionally see a large fraction of tracks that persist throughout the imaging session (Fig 2E, Fig S2B & C).
  
  (4) In Figure 3, the authors now quantify track length, but in this case in the unit 'minutes', from which I would interpret that this is now meant to assess the temporal dynamics rather than the spatial dynamics. The lack of a clear distinction between spatial dynamics and temporal dynamics is very confusing to me, since these are entirely independent measures. 'Track length' to me indicates spatial dynamics, and I would expect the units to be a measure of distance. 'Track duration', which the authors also use in some places, but inconsistently as far as I can tell, makes sense to me for the assessment of temporal dynamics, with the units being a measure of time. I would strongly recommend being very clear about this distinction, since the current representation of the data is very difficult to follow and interpret.
  
  In addition to new spatial metrics, we have clarified in the text when we are referring to spatial dynamics (distance) versus temporal dynamics (time). As suggested, we use duration when referring to time, and speed or distance when referring to spatial metrics.
  
  (5) The images from the newly generated CRISPR-based tags in Figures 5-7 are striking and very compelling - these will be very useful tools. However, here too, it seems that the interpretation of the data does not really match the results. All quantification indicates that there is very little change in synapse density or other assessed parameters over the time course of the imaging, and yet the authors emphasize the dynamic nature of visualized synapses. More compelling quantification would be needed to support this claim.
  
  We have quantified spatial and temporal metrics for live neuron culture imaging for all tools developed including CRISPR-based tags (Figure 7).
  
  (6) The discussion is extremely short and provides almost no integration of the results of the study into the framework of existing knowledge. Instead, it focuses almost exclusively on unanswered questions and future perspectives, which are also important, but not helpful in interpreting the findings from the current study. The latter aspects should be added to provide essential context for the current findings.
  
  We agree and have added additional discussion of our current findings to help contextualize their significance.
  
  We thank the Reviewers again for their positive feedback and insightful input, which has undoubtedly strengthened our study.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.06.02.657384v2
www.biorxiv.org www.biorxiv.org

Sensitivity of the human temporal voice areas to nonhuman primate vocalizations

1
1. Public_Reviews 07 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This study investigates how human temporal voice areas (TVA) respond to vocalizations from nonhuman primates. Using functional MRI during a species-categorization task, the authors compare neural responses to calls from humans, chimpanzees, bonobos, and macaques while modeling both acoustic and phylogenetic factors. They find that bilateral anterior TVA regions respond more strongly to chimpanzee than to other nonhuman primate vocalizations, suggesting that these regions are sensitive not only to human voices but also to acoustically and evolutionarily related sounds.
  
  The work provides important comparative evidence for continuity in primate vocal communication and offers a strong empirical foundation for modeling how specific acoustic features drive TVA activity.
  
  Strengths:
  
  (1) Comparative scope: The inclusion of four primate species, including both great apes and monkeys, provides a rare and valuable cross-species perspective on voice processing.
  
  (2) Methodological rigor: Acoustic and phylogenetic distances are carefully quantified and incorporated into the analyses.
  
  (4) Neuroscientific significance: The finding of TVA sensitivity to chimpanzee calls supports the view that human voice-selective regions are evolutionarily tuned to certain acoustic features shared across primates.
  
  (4) Clear presentation: The study is well organized, the stimuli well controlled, and the imaging analyses transparent and replicable.
  
  (5) Theoretical contribution: The results advance understanding of the neural bases of voice perception and the evolutionary roots of voice sensitivity in the human brain.
  
  Weaknesses:
  
  (1) Acoustic-phylogenetic confound: The design does not fully disentangle acoustic similarity from phylogenetic proximity, as species co-vary along both dimensions. A promising way to address this would be to include an additional model focusing on the acoustic features that specifically differentiate bonobo from chimpanzee calls, which share equal phylogenetic distance to humans.
  
  (2) Selectivity vs. sensitivity: Without non-vocal control sounds, the study cannot determine whether TVA responses reflect true selectivity for primate vocalizations or general auditory sensitivity.
  
  (3) Task demands: The use of an active categorization task may engage additional cognitive processes beyond auditory perception; a passive listening condition would help clarify the contribution of attention and task performance.
  
  (4) Figures and presentation: Some results are partially redundant; keeping only the most representative model figure in the main text and moving others to the Supplementary Material would improve clarity.
  
  We thank the reviewer for contributing to the improvement of the present study and for the extremely constructive criticism. Concerning the identified weaknesses of our work, we provide here some general answers while the detailed review (below) addresses point-by-point the reviews in high detail.
  
  (1) We totally agree that acoustics and phylogeny cannot be disentangled in our study, which is a limitation. We now provide the suggested analysis on the acoustic specificities of chimpanzee and bonobo calls.
  
  (2) This point on selectivity vs. specificity is indeed crucial, and we now provide a more careful viewpoint and phrasing on this aspect, since our study can only provide partial arguments for this important distinction.
  
  (3) Task demand following species categorization might rightfully yield to the engagement of distinct brain network compared to merely listening to the stimuli. We discuss this aspect and put forward the argument that, while we cannot control for this aspect, our attentional control study performed by an independent sample, N=28 provides clear evidence that no species triggered an attention bias. In other words, task demand might play a role, but at least in the study we know that attentional resources were not biased towards one species in particular since no effects were observed.
  
  (4) We agree that results were not articulated in a clear fashion and that figures were redundant. We addressed this aspect and regrouped the figures where appropriate while we include the rest in the supplementary material now.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study investigated how the human brain responds to vocalizations from multiple primate species, including humans, chimpanzees, bonobos, and rhesus macaques. The central finding - that subregions of the temporal voice areas (TVA), particularly in the bilateral anterior superior temporal gyrus, show enhanced responses to chimpanzee vocalizations - suggests a potential neural sensitivity to calls from phylogenetically close nonhuman primates.
  
  Strengths:
  
  The authors employed three analytical models to consistently demonstrate activation in the anterior superior temporal gyrus that is specific to chimpanzee calls. The methodology was logical and robust, and the results supporting these findings appear solid.
  
  Weaknesses:
  
  The interpretation of the findings in this paper regarding the evolutionary continuity of voice processing lacks sufficient evidence. A simple explanation is that the observed effects can be attributed to the similarity in low-level acoustic features, rather than effects specific to phylogenetically close species. The authors only tested vocalizations from three non-human primate species, other than humans. In this case, the species specificity of the effect does not fully represent the specificity of evolutionary relatedness.
  
  We want to thank the reviewer for the constructive criticism and for evaluating the manuscript.
  
  Concerning the principal weakness highlighted, we provide new analyses behavioral, acoustics, model-based fMRI that improve our understanding of the influence of both phylogeny and bioacoustics in our data. We argue that the explanation proposed by the reviewer cannot explain our results, as also observed in several other research from us and others. We discuss this aspect and emphasize that including stimuli from more species would greatly improve the understanding of phylogeny and bioacoustics in this context.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Ceravolo et al. employed functional magnetic resonance imaging (fMRI) to examine how the temporal voice areas (TVA) in the human brain respond to vocalizations from different nonhuman primate species. Their findings reveal that the human TVA is not only responsible for human vocalizations but also exhibits sensitivity to the vocalizations of other primates, particularly chimpanzee vocalizations sharing acoustic similarities with human voices, which offers compelling evidence for cross-species vocal processing in the human auditory system. Overall, the study presents intellectually stimulating hypotheses and demonstrates methodological originality. However, the current findings are not yet solid enough to fully support the proposed claims, and the presentation could be enhanced for clarity and impact.
  
  Strengths:
  
  The study presents intellectually stimulating hypotheses and demonstrates methodological originality.
  
  Weaknesses:
  
  (1) The analysis of the fMRI data does not account for the participants' behavioral performance, specifically their reaction times (RTs) during the species categorization task.
  
  (2) The figure organization/presentation requires significant revision to avoid confusion and redundancy.
  
  We thank the reviewer for evaluating our manuscript and for the constructive criticism as well as the many suggestions. Concerning the weaknesses of the study, we provide here some quick answers while more detailed responses can be found below.
  
  (1) We now include behavioral data analysis (accuracy data controlled for reaction times and acoustics of existing Model 3, using mixed-effects logistic regression) in addition to a new, 4th model for fMRI data. This 4th model was computed in a model-based fashion by modeling the probability of correct categorization within the TVA (fitted regression coefficients, per Participant, Species, Trial) and revealing the neural correlates of this modulator.
  
  (2) We totally agree that figure redundancy was a problem and we now reduced confusion by combining congruent aspects while pushing other results to the supplementary material.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  With additional analyses and discussions, the work has the potential to offer important insight into the evolutionary continuity of voice processing.
  
  We thank the Reviewing Editor for this additional motivation and for offering us the possibility to revise our manuscript. We will now provide our point-by-point reviewing, referring to manuscript modifications by section and/or line number(s). All modifications are also highlighted in light grey in the text.
  
  Reviewer #1 (Recommendations for the authors):
  
  The manuscript is clearly written and addresses an important comparative question about the specificity of human TVA responses. The acoustic analyses are well designed, and the imaging work is careful and thorough. However, several conceptual and methodological issues need clarification or tempering of claims, particularly regarding (i) the distinction between sensitivity and selectivity, (ii) the confounding of acoustic and phylogenetic factors, and (iii) the interpretation of "chimpanzee-specific" TVA activity.
  
  (1) Introduction
  
  Line 48: cite more recent infant EEG evidence for early voice sensitivity (Calce, Curr Biol).
  
  The reference and explanation were added, lines 46-48.
  
  Line 53: mention recent data on voice processing in marmosets (Jafari, Cell Rep; Dureux, Curr Biol).
  
  We added the references and the mention of these interesting studies on common marmosets, lines 53-54.
  
  Line 59: Fecteau et al. (2004) already explored cross-species selectivity; please integrate and discuss.
  
  We now mention here the work from Fecteau and colleagues and its relevance, see lines 57-59.
  
  Line 70: clarify that in [27] (Bodin et al., 2021) human TVA responded similarly to human nonverbal vocalizations and macaque coos, likely due to acoustic similarity.
  
  We added this important aspect, thank you for this precision. See lines 71-72.
  
  Clarify why an active species-categorization task was chosen instead of passive listening, which is standard in TVA research. Were participants familiarized with stimuli beforehand?
  
  We added a sentence on this aspect, but basically to summarize it here: we wanted to be able to test human recognition of nonhuman primate species’ calls. From the start, we wanted to test the frontal mechanisms related to decision-based processes of humans when categorizing non-human primate calls hence the 2023 article we published. See lines 75-77 and we also added information on familiarization to the stimuli in the Methods, lines 679-682.
  
  The 16 acoustic features mentioned should be briefly defined earlier, as they are central.
  
  We feel like describing 16 acoustic parameters in the introduction would be heavy on the reader, so we instead added a reference to the supplementary table (Table S1) in which these are named and described. See line 80.
  
  Explain why only chimpanzees and bonobos were selected among the great apes, and discuss the value of including both, given their equal phylogenetic proximity but largely dissimilar acoustics.
  
  The stimuli were obtained by Thibaud Gruber and his team and through collaborations with Katie Slocombe and Zanna Clay. Unfortunately, at the time we could only use chimpanzee and bonobo calls for the great apes. Therefore, it was mainly a material constraint rather than a deliberate choice to exclude other great apes. We now discuss this aspect and present the absence of other great apes as a limitation (lines 587-591).
  
  Rephrase references to "recruitment" of TVA - this term implies general activation, while the key question concerns selectivity (stronger responses to voices vs. non-vocal controls).
  
  We rephrased throughout the manuscript, thank you for this suggestion.
  
  The hypothesis section should more clearly separate the acoustic and phylogenetic predictions, and clarify which earlier data motivate each.
  
  We now explicitly categorize the hypotheses according to either Bioacoustics or Phylogeny to clarify. We also added references motivating each hypothesis. See lines 114-120.
  
  (2) Methods
  
  Clarify whether stimuli were RMS-normalized or otherwise balanced for energy (line 128).
  
  Sound pressure level was kept constant but the stimuli were not normalized, specifically to avoid a negative impact on their naturality. We added a sentence (lines 131-132) including a reference on this aspect.
  
  The task design could benefit from reporting accuracy in addition to reaction times for the 4AFC species classification task.
  
  We agree this aspect was missing. We now report accuracy data (controlled for reaction times and acoustics of Model 3) for the species categorization task (lines 147-165; Fig.1B), and in the Methods (lines 769-786). The fitted regression values of this analysis are also used for a new fMRI model (Model 4), to uncover within-TVA correlates of the probability of correct species categorization (lines 309-325; Fig.4).
  
  Please note that previously, the behavioral data of the species categorization task were completely absent (N=23), and the reaction times data previously part of Fig.1 were for the species attentional bias task (independent sample of N=28). Since this aspect was not clear at all (same remark by all reviewers—apologies for that), we now include a clear separation in Fig.1, with newly added panels D & E part of a distinct figure area named: “Control task: Testing for Species attentional bias (N=28)”. Panel D illustrates the control task paradigm (each species as exogenous cue; “dot-probe” paradigm) while panel E shows the results (target sine wave tone or “bip” detection reaction times), showing that no species triggered more attentional capture than the others (Species effect non-significant).
  
  The acoustic parameters used in Models 2 and 3 should be explicitly listed in the Methods (even if already published elsewhere).
  
  In addition to their description in Table S1, we now include the 16 acoustic parameters used to calculate acoustic distance between the species in the Methods, see lines 828-844.
  
  Consider simplifying the presentation of the three models: a figure summarizing their relationships would help.
  
  We now include only one figure (Fig.2) for Model 3, and we pushed model 1&2 to the supplementary material. We also simplified Fig.3 for a clearer view of the overlaps between the 3 models within the TVA.
  
  The description of “systematic and thorough control of phylogeny” (line 119) is overstated, given that only three nonhuman species were included.
  
  We agree with the reviewer and we suppressed both “systematic” and “thorough” from the sentence.
  
  Provide rationale for not including a nonvocal control category (e.g., scrambled vocalizations or environmental sounds) to assess TVA selectivity.
  
  The main objective of the study was to uncover whether human participants could recognize the vocalizations from nonhuman primates—from both great apes and monkeys—as compared to the human voice. We therefore did not include nonvocal or noise stimuli. We added this point as a limitation in the Discussion (lines 593-596 and 609-611).
  
  Even though we did not include such stimuli for the reason mentioned above, the delineation of subtypes of nonvocal material within the TVA of our participants (Fig.2) are, in our opinion, clarifying the message: chimpanzee-selective activations are fully within ‘voice vs. animal’ and ‘voice vs. nature’ TVA subareas, while it is not the case in ‘voice vs. music’ and ‘voice vs. noise’ TVA subareas.
  
  Clarify if participants were trained or had a practice session to recognize the four species before scanning.
  
  The participants were indeed trained on 3 stimuli per species before entering the MRI scanner. These stimuli were discarded from the species categorization task. We added a sentence about this aspect, see lines 131-132.
  
  Specify what is meant by "no good or bad response" in the attentional control task (line 724).
  
  We suppressed this wording as it was highly confusing.
  
  (3) Results
  
  Behavioral accuracy should be reported to complement reaction times.
  
  We now added behavioral data for the species categorization task as well as the neural correlates of accurate species categorization. See our previous response above (‘‘‘).
  
  Figures 2-4 largely overlap; consider merging or simplifying to reduce redundancy.
  
  We agree and this point was raised by the other reviewers as well. Task-based results are now presented only for Model 3 as Fig.2, while Fig.3 (previously Fig.5) summarizes the overlap between the three models. Figures for Models 1 & 2, previously labelled Fig.3 and Fig.4, were moved to the supplementary material.
  
  Figure 2: Please indicate more clearly where "chimp-selective" areas are located (perhaps with zooms).
  
  We agree, we now modified Fig.2 with zoomed-in panels and a clearer outline of chimp-selective areas (solid blue outline). This outline is also referenced in the text (lines 236-237).
  
  Correction for multiple contrasts: With many pairwise tests, adjustments (Bonferroni or FDR) should be mentioned explicitly.
  
  We now specify ‘FDR correction at the voxel level’ at the beginning of the Results section (lines 195-198) as well as in each figure.
  
  Replace "specific to chimpanzee" with "selective for chimpanzee" to avoid implying exclusivity.
  
  We made the suggested replacement throughout the manuscript.
  
  Discuss whether the small macaque-related clusters might simply reflect acoustic overlap rather than true category selectivity.
  
  We added a section on this important aspect, including results that support the role of mid-STG/STS regions for more noise-like stimuli, including the use of macaque coos. See lines 450-461.
  
  (4) Discussion
  
  The discussion overstates claims of "chimpanzee-selectivity" in TVA. The evidence shows relative preference, not absolute selectivity.
  
  We now specify from the start of the Discussion that we are not interpreting the results as absolute selectivity but rather as more relative preference, see lines 371-373.
  
  The authors repeatedly conflate acoustic and phylogenetic factors; this should be explicitly acknowledged as a limitation.
  
  We agree, and we completed the limitations section already dedicated to this aspect by a more explicit account of the confound, see lines 609-611.
  
  Clarify what is meant by "recruitment" and "selectivity" (lines 411-419, 577). TVA activity often reflects enhanced responses to voices compared to non-vocal sounds, not exclusive activation.
  
  We clarified this wording in the Discussion (lines 377-378) and replaced another instance by “activated the […]” to make it clearer what we imply, namely enhanced activity triggered by chimpanzee calls within human TVA.
  
  The lack of non-vocal control conditions should be discussed as a major interpretive limitation.
  
  We added this point as a limitation in the Discussion (lines 593-596).
  
  The statement that "chimpanzee-selective activity" arose in humans who have never been exposed to chimp calls (line 450) invites evolutionary speculation but should be more cautiously phrased.
  
  We agree, and we rephrased by: “[…] with chimpanzee calls triggering responses in the anterior STG/TVA of our human participants […]”. See lines 432-433.
  
  The comparison to recent macaque data (Giamundo et al., 2024 PNAS) is crucial: these findings of human-voice-selective neurons in macaques directly parallel the present human-chimp result.
  
  We agree with the reviewer, and we are hopeful to read similar results for other apes/great apes in the future.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) The primate vocalizations used in this study were recorded in diverse social and emotional contexts, which may have contributed to the observed differences in TVA activation. Since the temporal voice areas are known to be sensitive to affective and socially relevant cues, these contextual differences could confound the interpretation of species-specific neural responses. Therefore, I suggest that the authors conduct a post-hoc analysis to quantify and compare the affective valence, arousal levels, and social contexts associated with each stimulus set.
  
  We agree that the TVA are sensitive to social—or socially relevant—cues, motivating the very thorough work of the expert reserve personnel on-site to accurately categorize the calls according to the very specific context they were produced in. If the reviewer meant presenting these stimuli to non-expert participants and asking them to categorize the context or valence, we think it would make no sense since the ratings would be completely below chance level and therefore uninformative. The newly added behavior—and model-based fmri—data include this crucial point, a factor that we named ‘Context’ in our analyses. In fact, for each species’ 18 stimuli, we control for agonistic and affiliative production context—split evenly, per species. Also, computing an additional posthoc analysis by splitting the stimuli according to Context would result in too few trials to get sensible and reliable fMRI results.
  
  That being said, our study targets this specific aspect by extracting the acoustic features that characterize our stimulus set the best, across context-species-valence-arousal, which is exactly what we want. Through the three types of modeling we used—from more simplistic to more elaborate the results converge only for one species: chimpanzee calls.
  
  We think the addition of behavioral data, model-based fMRI data, and the specific analysis on acoustic differences between chimpanzee and bonobo calls strengthens the message and the validity of our findings.
  
  (2) Although the author mentioned that the behavioral effects triggered by these vocalizations have been reported previously, the behavioral responses of the participants in the current study are also crucial for our understanding of the results. If the MRI data can be combined with the participants' behavioral responses for comprehensive analysis, the conclusions of this study will be more compelling.
  
  We agree with the reviewer, and we added the behavioral data—controlling for reaction times, production context and acoustics of interest—and we also included a model-based fMRI modeling of the probability of correct species categorization as Model 4, Fig.4. See, respectively: lines 147-165, Fig.1B; Methods, lines 769-786; Neuroimaging results, lines 309-325.
  
  (3) I am still not convinced that phylogenetic proximity drives the observed neural selectivity. While chimpanzee vocalizations do elicit stronger responses in anterior STG, the claim that this reflects evolutionary relatedness lacks evidence. If the acoustic features of a certain call from a particular species are similar to those of human voices, it may also lead to similar effects.
  
  We agree with the reviewer that generalizing our results in terms of phylogenetic proximity alone is not a viable option. Including many more primate species including other great apes would be necessary, and we mention this crucial aspect in the limitations section. We also insist in the Discussion on the interdependence between phylogeny and acoustics in our data, since: 1) we cannot fully disentangle these factors here, 2) we cannot attribute our results to either one or the other. See lines 387-390, 410-411, 473-477, 587-591.
  
  If the acoustic features of a certain call from a particular species are similar to those of human voices, it may also lead to similar effects.
  
  We agree, and nobody could disagree: if an auditory object is extremely similar to the human voice in terms of acoustics, it would therefore potentially activate the TVA. This is exactly our message: in the natural ‘auditory world’, the calls from chimpanzees seem to be among the very few animal auditory signals that are sufficiently close, acoustically, to the human voice and therefore trigger TVA activity. They also happen to be the calls from a species which is phylogenetically the closest to humans with minimal differences with other great apes. Our results are in that sense very aligned with work from the laboratory of Pascal Belin, namely on ‘voice patches’ in the primate brain located in the (anterior) TVA, cited in our manuscript.
  
  We therefore think our interpretation does not exclude that in the near future, similar results within the TVA could be observed for other auditory objects, and if animal, from a species potentially much more distant phylogenetically or from vocal signals of other great apes.
  
  We added a key limitation point in the Discussion on the absence of auditory control stimuli in our design, such as scrambled or spectrum shifted per-species stimuli, which would have made the interpretation clearer identical acoustics but alteration/destruction of the species auditory object. See lines 593-596 and 609-611.
  
  Reviewer #3 (Recommendations for the authors):
  
  While the manuscript presents intriguing results, several concerns are raised for further consideration, detailed below.
  
  We thank the reviewer for evaluating the manuscript and for the constructive criticism and suggestions.
  
  Major concerns:
  
  (1) This study claims that bilateral anterior superior temporal gyrus (aSTG) in humans can be specifically activated by chimpanzee vocalizations rather than all other primate species after regressing out relevant acoustic parameters using three distinct analyses. I am wondering if a control stimulus (e.g., scrambled chimpanzee vocalizations) were presented, would the activation patterns in these same temporal voice areas (TVA) exhibit significant differences compared to the natural chimpanzee vocalizations?
  
  We completely agree with the reviewer, and this point was also raised by the other reviewers. We therefore added a key limitation point in the Discussion on the absence of auditory control stimuli in our design, such as per-species scrambled or spectrum shifted stimuli, which would have made the interpretation clearer—identical acoustics but alteration/destruction of the species auditory object. See lines 609-611.
  
  (2) The figure organization/presentation requires significant revision to avoid confusion and redundancy. E.g:
  
  Figure 1C is the same as Figure S1. In addition, Figure 1C lacks a figure legend and descriptive label.
  
  The scatter plots in Figures 2D, 2H, 3D, 3H, and 4D, 4H are same as those in Figures S2, S3, and S4. However, some of these duplicate plots even have inconsistent axis labels.
  
  In several panels, the main figures appear to be summaries derived from the supplementary figures. The authors should organize these figures well to eliminate redundancy.
  
  Please double-check all the figures to make sure of accuracy.
  
  We agree that the figures were badly organized and were too crowded and redundant. We now suppressed the redundancy between Fig.1 and Fig.S1, and we reduced fMRI results to one figure for statistical Model 3 while the other models are in the supplementary data—we also justify this decision in the text by highlighting that model 3 is the most elaborate and sensitive one. Fig.3 (previously ‘Fig.5’) shows the overlaps between models and was simplified and clarified as well.
  
  (3) The analysis of the fMRI data does not account for the participants' behavioral performance, specifically their reaction times (RTs) during the species categorization task. It is possible that processing vocalizations from certain species requires more cognitive effort or induces higher decision uncertainty. Could the observed neural effects be confounded by the decision-making process itself?
  
  We now include behavioral data analysis (accuracy data controlled for reaction times and acoustics of existing Model 3, using mixed-effects logistic regression) in addition to a new, 4th model for fMRI data. This 4th model was computed in a model-based fashion by modeling the probability of correct categorization within the TVA (fitted regression coefficients, per Participant, Species, Trial) and revealing the neural correlates of this modulator. We now display these results in Fig.4 and we introduce the motivation factor for including a categorization task rather than more traditional passive listening (lines 75-77), as well as limitations, lines 595-596.
  
  (4) One interesting attempt of this study is to dissociate biologically salient information in animal vocalizations from their low-level acoustic properties. This presents a fundamental conceptual challenge: how to rigorously disentangle a vocalization's species-specific attributes from its inherent acoustic correlates. More precisely, what essential biological information persists in a species' vocal signal after statistically accounting for all quantifiable acoustic features? I recommend that the authors address it in the discussion.
  
  We thank the reviewer for this very important comment, and for suggesting we discuss it in the manuscript. We completely agree: we cannot fully orthogonalize species and acoustics, and this aspect relates also more broadly to cognitive and affective neuroscience studies involving vocal material. Namely: “What is an auditory object without acoustics?”
  
  We included a full paragraph on this aspect, see Discussion, lines 570-584.
  
  (5) If a brain region, such as TVA, is responsive to both acoustic parameters and biological meanings of animal vocalizations, the method used in this study might be inadequate by setting covariates to zero. It is possible that species information is embedded within a specific acoustic pattern. The current modeling approach may not capture such complex information and could potentially introduce bias when estimating the species effect. I recommend that the authors address this issue in the discussion.
  
  We thank the reviewer for this point once again, we addressed it in the Discussion, lines 581-584, and also in the section dedicated to study limitations, lines 609-613.
  
  (6) In the discussion, non-human primate vocalizations are "unreadable" to humans. If this is the case, what is the fundamental perceptual difference between these vocalizations and those from the other animal species? An alternative and highly plausible explanation for the findings is the differential familiarity of the participants with the various species, driven by media exposure (e.g., documentaries) or zoo visits and interactions. The authors need to provide a stronger justification for their control stimuli and directly address, either through discussion or additional analysis, how the factor of familiarity might explain their results better than the proposed "evolutionary distance" hypothesis.
  
  We now discuss this important aspect, see lines 560-569.
  
  We thought about doing additional analyses on this aspect but we concluded that we did not have any reliable indicators of familiarity for our participants, and additionally they were all recruited for being ‘unfamiliar’ with great apes or old-world monkeys’ vocalized communication.
  
  Also, frequent mismatches in the media between images of apes and the associated vocal signals (for instance, the depiction of a chimpanzee but with background audio of macaque coos) are not helping this cause.
  
  Minor:
  
  (1) No figure legend and result description for Figure 1.
  
  Figure 1 has a legend, maybe it was cut out during the uploading process, but it is present and verified now.
  
  (2) In the main text, three statistical models were referenced. Was the data used in each subsequent statistical model derived from the processed data of the preceding model? Please clearly explain this in the main text.
  
  We now specify this aspect in the Methods and the Results section to clarify that each model is independent from the others (lines 964-966 and 189-191, respectively).
  
  (3) In Figure 5, the two dashed lines representing Model 1 and Model 2 are confusing for readers.
  
  We modified the figure (now Fig.3) and simplified it by removing some outlines and clarifying the colors, therefore improving readability.
  
  (4) Lack of reaction times in the species categorization task.
  
  We clarified behavioral data, including the results for the species categorization task and for the control, exogenous cueing task, see modified Fig.1 and behavioral results section of the Results.
  
  (5) Figures 2, 3, 4, 5, Please keep the font size of the figure title consistent.
  
  Figure title font size were uniformized.
  
  (6) Line 201, Line 224, and so on, (EFG) → (E, F, G).
  
  We modified this aspect in every figure legend, including the supplementary material.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.19.677258v4
bookshelf.vitalsource.com bookshelf.vitalsource.com

Differentiating Instruction and Assessment for ELLs

1
1. KristenOwens 06 Jul 2026
  
  in Public
  
  To understand the educational benefit of home language literacy development, imagine an ELL whose home language is Arabic. This student arrives in his 3rd-grade classroom already reading at a 3rd-grade level in Arabic. He will need to learn that books in English are read from left to right and that what he considers to be the front of a book is the back of a book written in English. Through literacy instruction, he learns that both English and Arabic print depend on word order and progression for meaning and that letters and words in both languages represent sounds (although the student may not have heard or articulated some of the sounds in English).
  
  This section made me stop and think because I had never considered that a student could already be a strong reader in another language but still struggle in English. I've only worked with one ELL student, and this helped me realize that just because a student is still learning English doesn't mean they don't already have strong academic skills. As teachers, I think we need to recognize those strengths and use them to help students continue learning.
Visit annotations in context

Annotators

KristenOwens

URL

bookshelf.vitalsource.com/reader/books/9781681256658
www.biorxiv.org www.biorxiv.org

An RNA ligase shapes transcriptional profiles, neural function, and behaviour in the developing larval zebrafish

2
1. EMBOpress 06 Jul 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Response to Reviewer's Comments
  
  We thank the reviewers for their careful, constructive, and encouraging assessment of our manuscript. As described in detail in the point-by-point response below, we have extensively revised the manuscript and Supplementary Information. Together, these changes provide further support for the role of Rlig1 in neural function and visually guided behaviour during zebrafish development.
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  Summary: Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).
  
  This study characterizes the function of RNA ligase 1 (Rlig1) in the vertebrate model zebrafish. Rlig1 is one of only two known RNA ligases in vertebrates, and its biological roles remain poorly understood. The authors combine gene expression analysis, loss-of-function approaches, transcriptomic profiling, calcium imaging, and behavioral assays to investigate its function during development. They show that loss of rlig1 (including maternal-zygotic loss) has no major effects on development or morphology, but that it leads to impairments in visually-guided behavior and altered neuronal activity in response to visual stimuli. Transcriptomic analyses reveal widespread dysregulation across multiple developmental stages, nominating genes that may underly the observed neural phenotypes. Together, the findings support a role for Rlig1 in neural development and function in vertebrates.
  
  We thank the reviewer for this accurate and positive summary of our study and for recognising the complementary, multi-level approaches used to examine the in vivo role of Rlig1.
  
  Major comments: - Are the key conclusions convincing?
  
  The key conclusion of this study is that Rlig1 plays an important role in the development and function of vertebrate neural circuits. Overall, this overarching conclusion, as well as the individual conclusions from each set of experiments, are well supported by the data presented. The combination of tissue-specific expression of rlig1, robust behavioral phenotypes in mutants, transcriptomic changes across multiple developmental stages, and circuit differences observed through calcium imaging provides a coherent, multi-faceted argument for the importance of this enzyme in brain development and function. While the precise RNA substrates of Rlig1 and the mechanistic link between transcriptomic changes and neural phenotypes remain to be defined, the authors clearly acknowledge these next steps and limitations. This study is a critical foundation for those future experiments.
  
  We appreciate the reviewer’s positive assessment of the strength and coherence of the evidence.
  
  Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?
  
  The claims in the manuscript are generally well-supported. The authors clearly acknowledge limitations and future experiments to further dissect mechanism in the Discussion section.
  
  Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.
  
  No major additional experiments appear essential for supporting the current claims.
  
  Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.
  
  No experiments are required for the current claims of the manuscript.
  
  We thank the reviewer for this assessment.
  
  Are the data and the methods presented in such a way that they can be reproduced?
  
  The methods are generally well described. I would suggest that the "raw images, data, and source code for custom scripts used in this work" be made accessible without having to request from the authors. Zenodo provides up to 50 GB of storage, which is likely sufficient for the data presented in this manuscript. In particular, I think it is important to share the behavior analysis, calcium imaging pipeline, and transcriptomics analysis. Even if all the data is too large, a sample dataset and analysis scripts should be publicly available.
  
  We agree and thank the reviewer for this important suggestion. To ensure that the study can be reproduced without the need to contact the authors, we have made the underlying data and custom analysis code publicly accessible. The RNA-seq data have been deposited in the GEO repository under accession number GSE308510 and are available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE308510.
  
  In addition, the raw imaging data, behavioural and calcium-imaging datasets, processed data, and custom scripts used for the behavioural, calcium-imaging, as well as the tRNA and rRNA sequencing data have been deposited on KonDATA (DOI: 10.48606/vpwgm69277srrgaj) – together more than 190 GB – and can be accessed using this link: https://kondata.uni-konstanz.de/radar/en/dataset/vpwgm69277srrgaj?token=gLEaYEENHjmHBhjhHUHK.
  
  We have revised the Data and code availability statement in the manuscript accordingly.
  
  Are the experiments adequately replicated and statistical analysis adequate?
  
  The experiments appear adequately replicated, and statistical analyses are appropriate for the types of data presented.
  
  We thank the reviewer for this positive assessment. To further improve transparency, we have revised the figure legends and Methods to define sample-size notation consistently throughout the manuscript. As suggested by Reviewer 3, we now distinguish biological replicates or independent experiments (N) from individual embryos, larvae, cells, imaging planes, or trials (n), as appropriate.
  
  Minor comments: - Specific experimental issues that are easily addressable. - Are prior studies referenced appropriately? - Are the text and figures clear and accurate? - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?
  
  Throughout the manuscript: use the prime symbol for 5/3 DNA/RNA instead of an apostrophe. The prime symbol is present in a small number of sentences, but mostly the apostrophe is used.
  
  We thank the reviewer for noting this. We have replaced apostrophes with prime symbols throughout the manuscript to ensure consistent notation of 5′ and 3′ RNA/DNA termini.
  
  Line 227: "Next, we compared the total number of neurons". The elavl3 driver labels brain cells in addition to neurons. - The authors compared to the total number of brain cells, but can they make any comments on the size of the brain across the various areas? I imagine this data is also accessible by analyzing the imaging already collected.
  
  The elavl3 promoter is widely used as a pan-neuronal driver in zebrafish. Our calcium-imaging experiments used the Tg(elavl3:H2B-GCaMP8s) line, in which nuclear-localised GCaMP8s is expressed under the control of the elavl3 regulatory region. This established configuration enables brain-wide functional imaging of neuronal activity in larval zebrafish.
  
  To assess whether differences in regional brain size might contribute to the observed phenotype, we quantified brain dimensions in 5 dpf larvae using the existing imaging data. Measurements were performed manually in Fiji in a blinded manner, with genotypes assigned only after completion of the analysis. We quantified tectum width, hindbrain width, and tectum length, as illustrated in the new Supplementary Figure 6.
  
  MZrlig1 larvae showed a modest reduction in tectum width (MZrlig1: 299 ± 14 µm; WT: 312 ± 10 µm; one-sided t-test, p = 0.00125) and tectum length (MZrlig1: 122 ± 5 µm; WT: 134 ± 9 µm; one-sided t-test, p = 1.03 × 10⁻⁵). In contrast, hindbrain width did not differ between genotypes (MZrlig1: 164 ± 10 µm; WT: 164 ± 10 µm; one-sided t-test, p = 0.52). Following assessment of data distribution, statistical significance was evaluated using one-sided t-tests with Bonferroni correction for three comparisons (n = 18 MZrlig1 and n = 20 WT larvae).
  
  Importantly, the unchanged hindbrain width indicates that the reduced number of motion-responsive hindbrain neurons in MZrlig1 larvae is unlikely to be explained by a gross difference in hindbrain size. These findings therefore support our interpretation that Rlig1 loss is associated with reduced neuronal responsiveness in the hindbrain.
  
  Given that there is already a mouse mutant for this gene and transcriptomics, can the authors do a more thorough job comparing the transcriptomics from that study with their own?
  
  We thank the reviewer for this helpful suggestion. When we applied the differential-expression thresholds used in our zebrafish analysis (absolute log₂ fold change ≥ 1.5 and adjusted p value ≤ 0.05) to the genes reported in the mouse study, only flg2 met these criteria. Thus, the available mouse dataset provides limited scope for a direct gene-by-gene comparison with our data.
  
  To extend our analysis beyond poly(A)-enriched mRNA sequencing, we additionally performed tRNA and rRNA sequencing using total RNA from 5 dpf WT and MZrlig1 larvae. The tRNA analysis identified 17 significantly altered tRNAs in MZrlig1 larvae, including seven upregulated and ten downregulated species (Figure 5i; Supplementary Tables 8–9). Notably, the affected tRNAs include tRNA-Lys-CTT, which was previously identified among RNAs enriched in human Rlig1 immunoprecipitates, and tRNA-Thr-CGT, which was reported to be increased in female rlig1 knockout mouse brains. Although the direction of change is not fully conserved across these studies, these overlaps further support the possibility that Rlig1 influences tRNA homeostasis.
  
  In parallel, rRNA sequencing revealed differential abundance of 122 5S rRNA transcripts, with 86 upregulated and 36 downregulated in MZrlig1 larvae (Figure 5h; Supplementary Tables 10–11). Together, these new analyses show that loss of Rlig1 is associated with altered abundance of both tRNA and rRNA species, consistent with previous evidence linking Rlig1 to RNA homeostasis. At the same time, we explicitly state that these data do not identify direct enzymatic substrates of Rlig1, but provide a resource and rationale for future mechanistic studies.
  
  A clearer statement on the similarities and differences of Rlig1 and RtcB would be helpful. Is it possible RtcB is compensating at all?
  
  We thank the reviewer for this comment. We have clarified the similarities and differences between Rlig1 and RtcB in the Introduction and Discussion. Although both enzymes catalyse RNA ligation, they act on distinct end chemistries. RtcB mediates 3′–5′ ligation of RNA ends generated during canonical tRNA splicing, joining a 5′-hydroxyl end to a 2′,3′-cyclic phosphate or 3′-phosphate end. In contrast, Rlig1 catalyses 5′–3′ ligation of RNA fragments bearing a 5′-phosphate and a 3′-hydroxyl group.
  
  These distinct substrate requirements make direct functional compensation by RtcB unlikely. RNA ends generated for ligation by Rlig1 would first require end processing to generate termini compatible with RtcB-mediated ligation. Nevertheless, indirect compensation or partial functional overlap after such processing cannot be excluded.
  
  We sought to address this question experimentally by obtaining rtcb mutants from the European Zebrafish Resource Center. However, subsequent genotyping showed that the supplied sperm did not contain the intended rtcb mutant alleles, precluding analysis in the present study. We have therefore explicitly acknowledged that the extent to which RtcB may compensate for loss of Rlig1 remains unresolved and will require analysis of validated rtcb mutant lines in future work.
  
  I examined the DEG tables, and I did not notice an obvious substantial enrichment of genes on chromosome 25 (White et al., 2022, https://doi.org/10.7554/eLife.72825). Were the different samples from different clutches or the same clutch? I may have missed it. Regardless, I would carefully check the DEGs that are important for conclusions and check that they are not on the same chromosome as rlig1. It is likely worth rerunning all of the GO/GSEA with genes on chromosome 25 excluded.
  
  We thank the reviewer for raising this potential confound. The RNA-seq samples were derived from independent clutches. To determine whether the observed transcriptional changes could be influenced by local effects associated with the rlig1 locus on chromosome 25, we performed two complementary analyses.
  
  First, we examined the chromosomal distribution of differentially expressed genes (DEGs) at each developmental stage. The chromosomal distribution was assessed using the original DEG analysis presented in the manuscript (no pre-filtering before DESeq2; DEGs defined as padj 1). Chromosome 25 contains 806 of 25,254 annotated protein-coding genes in the zebrafish genome, corresponding to 3.2% of all coding genes. Across developmental stages, the proportion of DEGs located on chromosome 25 ranged from 1.4% to 4.1% (cleavage: 12/419; sphere: 17/; shield: 37/892; bud: 26/781; 1 dpf: 3/216; 5 dpf: 8/587). Relative to the genomic expectation, this corresponds to enrichment values between 0.43- and 1.30-fold. Only the shield stage showed a modest increase in the proportion of chromosome 25 DEGs (1.30-fold), whereas all other stages were at or below the genomic expectation. Thus, genes on chromosome 25 are not globally overrepresented among the DEGs in the rlig1 mutant dataset.
  
  Second, we repeated the complete differential-expression analysis for each developmental stage after excluding all chromosome 25 genes before DESeq2 normalisation, size-factor estimation, and dispersion modelling. This re-analysis was performed using an updated workflow, including removal of genes with zero total counts prior to DESeq2, which changes the number of genes entering Benjamini–Hochberg correction and consequently the total number of detected DEGs; all other analysis parameters were identical to the original analysis. This approach ensured that chromosome 25 genes could not influence either normalisation or statistical inference for genes on other chromosomes. Using the same DEG thresholds as in the original analysis (padj 1), exclusion of chromosome 25 had only minimal effects on the remaining DEG sets.
  
  Stage
  
  Full DEGs
  
  Non-Chr25 DEGs
  
  Lost (Chr25)
  
  Lost (non-Chr25)
  
  Gained
  
  1 (4-cell)
  
  419
  
  415
  
  5
  
  0
  
  1
  
  2 (Sphere)
  
  913
  
  879
  
  34
  
  5
  
  5
  
  3 (Shield)
  
  592
  
  553
  
  37
  
  5
  
  3
  
  4 (Bud)
  
  349
  
  329
  
  20
  
  0
  
  0
  
  5 (1 dpf)
  
  7
  
  6
  
  1
  
  0
  
  0
  
  6 (5 dpf)
  
  168
  
  164
  
  4
  
  0
  
  0
  
  Across all six developmental stages, only ten non-chromosome-25 genes lost significance and nine genes gained significance. These minor changes were confined largely to the sphere and shield stages, which also showed the highest relative representation of chromosome 25 DEGs. At the 4-cell, bud, 1 dpf, and 5 dpf stages, no non-chromosome-25 genes lost significance after chromosome 25 was excluded.
  
  We also repeated the GO and GSEA analyses after excluding chromosome 25 genes. As expected, a small number of individual terms changed; however, the principal enrichment patterns and overall biological interpretation remained unchanged. Together, these analyses indicate that the transcriptomic phenotype is not substantially driven by chromosome 25-linked DEGs or by local effects associated with the edited rlig1 locus. While this analysis cannot exclude effects on individual linked genes, it shows that such effects do not substantially affect the main transcriptional or pathway-level conclusions of the study.
  
  **Referees cross-commenting**
  
  I missed the point about the RNA-seq samples being cousin-matched. While I am optimistic that the results won't change, I agree with Reviewer #3 that some confirmation is necessary. It was unclear to me whether the samples were from the same or different clutches - if they are from different clutches and share overlapping genes, that would also add support to the results. I think that detail was missing from the methods, and I had pointed it out. Either additional RNA-seq or even qPCR of some top genes from a heterozygous incross is a reasonable request.
  
  We thank the reviewer for raising this point and apologise that the breeding design for the transcriptomic experiments was not described sufficiently clearly. The developmental RNA-seq samples were not cousin-matched. Rather, WT and MZrlig1 embryos were collected from separate group matings and therefore originated from different clutches. Independent pooled samples were analysed at each developmental stage, as now described explicitly in the revised Methods.
  
  We agree that independent validation in a sibling-controlled genetic setting is important. We therefore performed RT-qPCR for eight genes selected from the 5 dpf mRNA-seq dataset using sibling-matched zygotic rlig1 mutants and WT larvae generated by heterozygous incrosses. For each genotype, three independent biological replicates were analysed, with four larvae per sample. Six of the eight selected genes showed changes in the same direction as in the original MZrlig1 RNA-seq dataset: cyp2p9, itln3, sult3st4, fabp7b, hamp, and rlig1 itself. In particular, itln3 remained strongly upregulated, whereas rlig1 expression was markedly reduced in the sibling-matched zygotic mutants. In contrast, gdf3 and gstp1.1 did not show the same directional change in this validation experiment.
  
  These results provide independent support that several of the transcriptional changes identified in the MZrlig1 RNA-seq dataset are also observed in sibling-matched zygotic mutants. At the same time, the incomplete concordance of individual genes is consistent with the fact that maternal-zygotic and zygotic mutants represent biologically distinct conditions and may differ in both effect size and molecular consequences. We have added these validation data as Supplementary Figure 7 and revised the Results and Methods accordingly.
  
  Reviewer #1 (Significance (Required)):
  
  Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.
  
  This study provides a conceptual and biological advance by identifying a role for a vertebrate RNA ligase in brain development, behavior, and transcriptional regulation.
  
  Place the work in the context of the existing literature (provide references, where appropriate).
  
  Although RNA ligases from single-cell organisms and phage are well-characterized, the roles of RNA ligases in vertebrates are relatively understudied. There are only two, including the one the one that is the focus of this manuscript. This study demonstrates an in vivo function for Rlig1, linking molecular changes to neural development and function. The Rlig1 enzyme was only very recently discovered (2023), making this work timely and an important addition to an area with relatively few studies.
  
  A major strength of the study is its multi-level approach, integrating diverse techniques to coherently link this gene to organism-level phenotypes. This work provides a strong conceptual and functional advance by demonstrating a role for Rlig1 in vertebrate neural circuit function and behavior. A remaining mechanistic gap is that the direct RNA substrates of Rlig1 are not identified, and the observed transcriptomic changes in mRNA are likely downstream consequences of its loss. However, these points are clearly acknowledged in the discussion, making the study a well-balanced contribution. Given the existence of a mouse knockout model, further discussion comparing the zebrafish transcriptomic results and phenotypes to those observed in mouse would help place this work in the context of prior studies. Overall, the main conclusions are well supported, and the limitations do not undermine them. This study represents an important contribution that establishes a foundation for future mechanistic work linking Rlig1 substrates to the observed phenotypes.
  
  We thank the reviewer for this thoughtful and encouraging assessment.
  
  State what audience might be interested in and influenced by the reported findings.
  
  Zebrafish basic science researchers, particuarly those studying how genes lead to altered neural circuits and behavior, are the most direct target audience. However, the work is of more broad interest to those in the fields of neurodevelopment, gene regulation, and RNA biology / processing.
  
  Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.
  
  I am comfortable evaluating zebrafish mutants, transcriptomics, and behavioral assay design. I have more limited experiment in neural circuit anaysis and interpretation of calcium imaging data, though this part of the manuscript was also clearly presented and understandable.
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  Summary Klusman et al have investigated the function of the RNA ligase rlig1 in zebrafish. They first document expression of the gene, by quantitative RT-PCR and HCR-fluorescent in situ hybridization. They then test ligase activity of the Rlig1 protein in vitro. They next generate a null mutant and test function of the visual system using behaviour as well as calcium imaging. The data indicate that rlig1 is broadly expressed and capable of ligating RNA; loss of rlig1 has mild effects on overall development and pronounced effects on behavioural and neuronal response to visual stimuli. Finally, the authors use bulk transcriptome analysis to identify changes in gene expression in the mutants.
  
  We thank the reviewer for this accurate summary of our study and for recognising that the behavioural and calcium-imaging results together support a role for rlig1 in visual processing and visually guided behaviour.
  
  **Referees cross-commenting**
  
  I agree that more details are required about the crosses would be useful.
  
  We also agree that further detail on the breeding schemes is important. We have therefore expanded the Methods and figure legends to describe the crosses used for each experiment, including the relationship between mutant and control animals and whether samples were sibling- or cousin-matched.
  
  Reviewer #2 (Significance (Required)):
  
  Overall, the conclusions that rlig1 is required for normal development of the embryo, especially of a fully functioning visual system, are well supported. The optomotor response experiments have high power and, together with functional imaging, show a clear difference between mutant and wildtype.
  
  One limitation of this manuscript is in the characterization of gene expression. The gene expression database in Zfin contains one image of rlig1 (https://zfin.org/ZDB-IMAGE-060710-1925#image), which shows broad expression in cells of the embryo and larvae and no expression in the yolk. The images here, with the exception of the mutant in Figure 3C, show expression in the yolk. This would suggest that the yolk signal is not autofluorescence, which is inconsistent with the Thisses' data. Additonally, Figure S1 indicates a variable level of non-specific signal, especially in panel g. Thus, the distribution of rlig1 mRNA is unclear.
  
  We agree that the yolk-associated signal should not be interpreted as specific rlig1 expression.
  
  rlig1 transcripts are completely absent from the RNA-seq datasets of MZrlig1 mutants at all developmental stages analysed. Thus, the variable fluorescence observed in the yolk and in the no-probe controls (Supplementary Figure 1) cannot represent residual rlig1 expression, but must reflect non-specific background signal and/or autofluorescence. We have clarified this point in the revised manuscript.
  
  The transcriptome analysis identified changes in gene expression in the mutant. This establishes a role for rlig1 in development, and identifies several processes that are disrupted by loss of rlig1. However, the molecular analysis sheds little light on direct targets of the ligase. Given the established effects on tRNA, for example, it is unclear why RNA was analysed only by short reads on poly(A) RNA. The reader is left wondering whether zebrafish tRNA contains introns that require Rlig1 for processing. In this context, it would be useful for the authors to provide more background on tRNA splicing in vertebrates, including a mention of tricRNA, and potentially the role of TSEN complex in brain development.
  
  We have expanded the Introduction as suggested to provide additional context on tRNA splicing in vertebrates. We now explain that canonical tRNA splicing is initiated by the TSEN complex and completed by RtcB, which ligates RNA ends with chemistries distinct from those used by Rlig1. We also discuss that excised tRNA introns can form stable tRNA intronic circular RNAs (tricRNAs), and that defects in TSEN complex components are associated with neurodevelopmental disorders, underscoring the importance of RNA processing for nervous-system development.
  
  We agree that our poly(A)-enriched RNA-seq data do not identify direct RNA substrates of Rlig1. We have clarified throughout the manuscript that these experiments were designed to characterise downstream transcriptional consequences of rlig1 loss.
  
  We have additionally analysed tRNA and rRNA abundance in total RNA from 5 dpf WT and MZrlig1 larvae. These analyses identified altered levels of specific tRNA and 5S rRNA species in MZrlig1 larvae (Figure 5h,i; Supplementary Tables 8–11), supporting an association between Rlig1 loss and altered RNA homeostasis.
  
  To summarize, this manuscript extends work in the mouse and in cell lines that demonstrate a requirement for rlig1. It does not shed light on direct targets of Rlig1, but provides a strong foundation for future work on the role of RNA ligation in vertebrate development and brain function.
  
  This paper is expected to be of interest to a specialised audience.
  
  Minor points: The images showing gene expression in Figure 2 are not easy to see, due to the LUT used and low intensity of the signal. To aid the reader, the HCR channel should be shown in grayscale, possibly with the contrast enhanced (to the same extent in all images).
  
  To improve the visibility and interpretation of the HCR signal, we have added a new Supplementary Figure 2 showing the rlig1 channel in greyscale. Within comparable developmental-stage panels, identical contrast settings were applied to all images.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  Summary This paper provides good evidence that a newly described enzyme that catalyzes 5'-3' RNA ligation - rlig1 - plays some role in early vertebrate neurodevelopment. Using embryonic and larval zebrafish as a model, they found that, while rlig1 mRNA is highly maternally deposited and ubiquitously expressed early on, expression later in development localizes to the brain and eyes. They generated a stable CRISPR/Cas9 large deletion mutant spanning from upstream the 5'UTR past the start codon. By comparing wild type and maternal-zygotic (MZ) rlig1 mutants, the authors found that animals developed overtly normally but did show reduced behavioral responsiveness to a visual stimulus experimental paradigm. By combining calcium imaging and poly(A)-enriched RNA-sequencing transcriptomic analyses, they found that there was decreased neuronal activity in regions needed for visual processing, and that there was dysregulation of neural-related gene networks and metabolic and translational pathways.
  
  We thank the reviewer for this detailed and accurate summary of our study and for recognising the convergent evidence linking Rlig1 loss to altered neural activity and visually guided behaviour in developing zebrafish.
  
  Major comments 1) My main major comment is that, because there is so much inherent variability in behavior and even development across different clutches, this study relies on comparing (cousin-matched) WT and maternal-zygotic rlig1 mutant animals. In most reliable peer-reviewed papers, this is not a fair comparison. While I appreciate that authors stated that they used parents that were siblings (so, offspring would be cousin-matched), I do not consider this scientifically rigorous enough for the claims presented. a. I do not consider it a reasonable request to ignore the massive amount of work that went into this paper using WT and MZrlig1 comparisons. However, at minimum, authors should consider performing essential behavior and RNA-seq (see point b) experiments with heterozygous incrosses of single-pair matings, and genotyping the animals post-hoc. Including this critical data in a main figure, as the basis for using MZ animals for the rest of the paper, would induce some confidence that the phenotypes and claims presented are not a result of inherent variability. If the authors already have adult heterozygous animals of mating age, I estimate that these experiments may be completed very reasonably within 3-4 weeks; if new animals need to be generated, this request would take ~4 months. Typically, these kinds of experiments would not be considered a financial burden to perform.
  
  Our central genetic condition was maternal-zygotic loss of rlig1, motivated by the strong maternal deposition of rlig1 mRNA during cleavage stages. A heterozygous incross would produce zygotic mutants that still receive maternal rlig1 transcript and protein, and would therefore test a related but biologically distinct condition. For the maternal–zygotic experiments, we used cousin-matched WT controls derived from the same parental family to minimise genetic-background differences, and we performed the behavioural assays with substantial numbers of larvae across independent experiments.
  
  We nevertheless repeated the behavioural analysis as suggested using zygotic rlig1 mutants and WT sibling controls obtained from heterozygous incrosses. This analysis revealed a qualitatively similar, although less pronounced, reduction in visually guided behaviour in zygotic mutants (new Supplementary Figure 4). We speculate that the reduced effect size is consistent with partial compensation by maternally supplied rlig1 transcript or protein in zygotic mutants.
  
  b. For transcriptomic analyses, I have two main points: i) again, it is difficult to statistically rigorously compare transcriptomes of nonsibling-matched animals with such low numbers of single 5 dpf brains. In line with point a, it would be essential to pool at least a few WT and rlig1 mutant siblings for at least 3 biological replicates per samples and compare those analyses with the results from MZ animals. ii) Typically this would not be a major concern, however given the nature of the gene of interest and published in vitro findings, I do consider that the rlig1 enzyme catalyzes 5'-3' RNA ligation, has been shown to be implicated in rRNA integrity and tRNA targeting, and is broadly essential for repair, splicing, and editing of RNAs. Thus, while the poly(A)-enriched RNA sequencing can provide context about gene networks that are affected (either primarily or secondarily), sequencing that enriches for tRNAs, polysome profiling or ribosome profiling, or some more targeted sequencing approach would be more appropriate to more rigorously support the claims in the paper. Depending on readiness of mating-age animals, this experiment and analyses may reasonably take up to 3 months; this approach may be considered a financial burden. Alternatively, with the current mRNA sequencing, the authors could delve into whether they can identify altered splicing or RNA editing dynamics in different RNA modules. I estimate that this alternative analysis approach may take up to one month to develop and interpret.
  
  We would like to clarify that the poly(A)-enriched RNA-seq was not performed on single 5 dpf brains, but on independent pools of 8–10 age- and genotype-matched whole embryos or larvae collected across six developmental stages. We have also validated eight selected 5 dpf RNA-seq candidates by RT-qPCR using sibling-matched zygotic rlig1 mutants and WT larvae generated by heterozygous incrosses. For each genotype, we analysed three independent biological replicates, each comprising a pool of four larvae. Six of the eight tested genes showed changes in the same direction as in the original MZrlig1 RNA-seq dataset, including cyp2p9, itln3, fabp7b, hamp, sult3st4, and rlig1 (new Supplementary Figure 7). Although zygotic mutants are not equivalent to maternal–zygotic mutants because they retain maternally supplied rlig1 transcript and protein, these results provide independent support for a substantial subset of the transcriptional changes identified in the MZrlig1 dataset. We have revised the Methods, Results, and Discussion to describe the breeding schemes and this limitation more explicitly.
  
  We also agree that poly(A)-enriched RNA-seq alone cannot identify direct Rlig1 substrates or adequately assess non-polyadenylated RNA classes. We therefore added targeted analyses of tRNA and rRNA abundance from total RNA isolated from 5 dpf WT and MZrlig1 larvae. The tRNA analysis identified seven tRNAs with increased and ten with decreased abundance in MZrlig1 larvae, including tRNA-Lys-CTT, previously found among RNAs enriched in human Rlig1 immunoprecipitates, and tRNA-Thr-CGT, which was reported to be increased in female rlig1 knockout mouse brains (Figure 5i; Supplementary Tables 8–9). In parallel, the rRNA analysis identified altered abundance of 122 5S rRNA species, with 86 increased and 36 decreased in MZrlig1 larvae (Figure 5h; Supplementary Tables 10–11).
  
  These new data provide additional evidence that loss of Rlig1 is associated with altered tRNA and rRNA homeostasis. At the same time, we explicitly state that neither the mRNA-, tRNA-, nor rRNA-seq datasets establish direct enzymatic substrates of Rlig1 or demonstrate altered tRNA splicing, RNA editing, or translation. Direct substrate mapping and analyses such as ribosome profiling will be important directions for future work. The revised manuscript frames the transcriptomic analyses accordingly.
  
  o The experiments as documented are adequately replicated and statistical analyses adequate (minus the nonsibling-matched point 1). I note that labels should more clearly state or denote individual (n) or experimental (N) numbers, some of which I provide in Minor comments below.
  
  We agree and have revised the figure legends accordingly. We now distinguish N for independent experiments or biological replicates from n for individual embryos, larvae, imaging planes, segmented cells or trials. Where pooled samples were used, the legends and Methods now state the number of embryos or larvae per pool and the number of independent pools or experiments.
  
  Minor comments Comments on figures or figure legends: 1) Figure 1e, align the "#" labels better, they look diagonal.
  
  Thank you. We corrected the alignment of the labels in Figure 1e.
  
  2) For 1f, consider labeling independent replicates directly on the graph instead of just the label, otherwise not very clear to the reader.
  
  We have revised Figure 1f to make the independent replicates more transparent. The figure now clearly indicates the number of independent replicates used for quantification. Every replicate has a different colour now, and N = 3 is indicated in the figure.
  
  3) Figure 2a, consider adding the reference gene (eef1a) in the legend.
  
  We have added eef1a to the Figure 2a legend and clarified that relative rlig1 mRNA levels were calculated using eef1a as the reference gene.
  
  4) Figure 2a - if I understand the experiment correctly, the current label n=3 (which would mean 3 individual embryos/larvae) should read N=3 (three independent experiments of x number of embryos/larvae per run)
  
  Thank you very much for this suggestion. We have corrected the sample-size notation in Figure 2a. The label now uses N for independent experiments and specifies the number of embryos or larvae used per experiment where appropriate.
  
  5) Supplementary Figure 1 was very unconvincing comparing WT to MZ mutants, I'm sorry to say I really could not tell much difference. When compared to Figure 3c, they look quite different. The DRAQ7 labeling also appeared uneven in Supplementary Figure 1. Consider optimizing the imaging strategy and providing more interpretably images. A separate, aesthetic comment - magenta was very difficult for me to see against a black background, consider switching the rlig1 channel to grayscale or flip the colors so that rlig1 mRNA is cyan, for example.
  
  We thank the reviewer for this comment and apologise that the purpose of Supplementary Figure 1 was not sufficiently clear. This figure shows no-probe control samples imaged in the rlig1 detection channel to document stage-dependent background and autofluorescence. Because no rlig1 probe was applied, no genotype-dependent difference between WT and MZrlig1 samples is expected in these images. The variable signal, including the yolk-associated fluorescence, therefore represents background rather than specific rlig1 mRNA detection.
  
  In contrast, Figure 3c shows samples processed with the rlig1 HCR probe set. The marked reduction of punctate signal in MZrlig1 larvae in this experiment is therefore attributable to the absence of rlig1 transcripts, consistent with the RNA-seq and RT-qPCR data. We have clarified this distinction in the revised text and figure legends.
  
  The apparently uneven DRAQ7 signal in some no-probe control images reflects differences in embryo orientation and imaging planes rather than genotype-specific staining differences. To improve the visibility and interpretability of the HCR data, we have additionally included a new Supplementary Figure 2 showing the rlig1 channel in greyscale, with matched contrast settings within comparable developmental-stage panels.
  
  6) Calcium imaging - related to Major comments above, consider performing this experiment in sibling-matched animals, especially with only one copy of the transgene. If WT vs. sibling mutant results look similar to the WT vs MZ mutant results, this would be more convincing.
  
  We agree that calcium imaging in sibling-matched zygotic mutants would provide a valuable complementary dataset. However, zygotic mutants retain maternally supplied rlig1 transcript and protein and therefore represent a biologically distinct condition from the maternal–zygotic mutants examined in our principal imaging experiments. Consistent with this distinction, the behavioural phenotype in sibling-matched zygotic mutants was qualitatively similar but less pronounced than in maternal–zygotic mutants.
  
  A sufficiently powered brain-wide calcium-imaging analysis in sibling-matched animals would require generation, imaging, and analysis of a substantial additional cohort, while the expected smaller effect size would limit its ability to directly test the maternal–zygotic phenotype reported here. We therefore believe that this experiment extends beyond the scope of the present study.
  
  **Referees cross-commenting**
  
  I agree with Reviewer #1 that at least the raw code is uploaded to GitHub or Zenodo, and raw data to be uploaded to Zenodo.
  
  We agree and thank the reviewer for this important suggestion. To ensure that the study can be reproduced without the need to contact the authors, we have made the underlying data and custom analysis code publicly accessible. The RNA-seq data have been deposited in the GEO repository under accession number GSE308510 and are available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE308510.
  
  In addition, the raw imaging data, behavioural and calcium-imaging datasets, processed data, and custom scripts used for the behavioural, calcium-imaging, as well as the tRNA and rRNA sequencing data have been deposited on KonDATA (DOI: 10.48606/vpwgm69277srrgaj) – together more than 190 GB – and can be accessed using this link: https://kondata.uni-konstanz.de/radar/en/dataset/vpwgm69277srrgaj?token=gLEaYEENHjmHBhjhHUHK.
  
  We have revised the Data and code availability statement in the manuscript accordingly (also see the response to Reviewer #1).
  
  I agree with Reviewer #1 that brain size can and should also be assessed, presumably using the same images already collected. For example, in Figure 5b, number of neural cells (even when normalized) could be lower if brain size is small. Reasonable control analysis.
  
  As suggested, we have quantified tectum width, tectum length, and hindbrain width from the existing calcium-imaging datasets in a blinded manner. Although MZrlig1 larvae showed modest reductions in tectum width and length, hindbrain width did not differ between genotypes. Thus, the reduced number of motion-responsive hindbrain cells is unlikely to be explained by a gross difference in hindbrain size. These control analyses are presented in the new Supplementary Figure 6 (also see the response to Reviewer #1).
  
  I agree with Reviewer #2 that addressing, either by writing or experimentally, a bit more about direct targets of the ligase (including tRNAs and rRNAs) will strengthen the manuscript significantly.
  
  We thank the reviewer for this helpful suggestion. To address this point, we have added new analyses of rRNA and tRNA abundance in 5 dpf WT and MZrlig1 larvae, together with an expanded discussion of their interpretation. These data provide additional evidence that loss of Rlig1 is associated with altered RNA homeostasis, while we distinguish such effects from the direct RNA substrates of the ligase, which remain to be identified (also see the response to Reviewer #2).
  
  I agree with Reviewer #1 first comment (last sentence) that, if RNA-seq (or other appropriate sequencing) of sibling-matched samples is financially prohibitive, then at least qPCR of some top genes would be acceptable.
  
  We have performed RT-qPCR validation of selected top differentially expressed genes using sibling-matched WT and zygotic rlig1 mutant larvae generated by heterozygous incrosses. These data provide independent support for the altered expression of several genes identified in the maternal–zygotic rlig1 RNA-seq dataset and are presented in new Supplementary Figure 9 (also see the response to Reviewer #1).
  
  I agree with the additional comment from Reviewer #1 - the manuscript details cousin-matched samples in lines 666-667, but I'd like to add a suggestion that the authors include details about "single-pair" versus "group-mating". For behavior and all analyses in these kinds of zebrafish experiments, it is very important that multiple replicates of single-pair (one female crossed to one male), sibling-matched groups are used.
  
  We appreciate the reviewer’s helpful suggestion. We agree that further detail on the breeding schemes is important. We have therefore expanded the Methods to specify, for each experiment, whether embryos or larvae were obtained from single-pair or group matings, the number of independent crosses or clutches, and whether mutant and control animals were sibling- or cousin-matched.
  
  Reviewer #3 (Significance (Required)):
  
  This study provides a good increase in our knowledge about a newly described RNA ligase enzyme - rlig1 - in vivo. The authors integrate their results across organismal behavior, brain cell activity, and transcriptomes using a newly generated stable genetic mutant to uncover a new link between neuronal RNA processing, development, and sensory-motor computation. Given that the human orthologue of this gene has been associated with neurological and cognitive conditions, including neurodevelopmental and neuroinflammatory disorders and Alzheimer's disease, the generation and characterization of this stable mutant line proves valuable. There are important technical limitations, specifically related to the comparison of wild type and maternal-zygotic mutant animals, that may not faithfully represent statistical differences compared to sibling-matched animals. Basic biological audiences, including in neurodevelopment, genetics, and RNA biology, would be interested in this research.
  
  We thank the reviewer for recognising the value of the stable rlig1 mutant line and for highlighting the importance of the breeding design. We agree that comparisons between cousin-matched WT and maternal–zygotic (MZ) mutant larvae require careful interpretation. However, a fully sibling-matched WT versus MZrlig1 comparison is not genetically possible. Maternal–zygotic mutants must be produced by homozygous mutant mothers, whereas WT siblings can only be obtained from a different maternal genotype. Thus, the maternal genotype and, critically, the presence or absence of maternally deposited rlig1 RNA and protein – necessarily differs between these conditions. This is not merely a technical limitation of the experimental design, but an intrinsic feature of testing maternal–zygotic gene function. A heterozygous incross instead produces sibling-matched zygotic mutants, which retain maternal rlig1 products and therefore represent a biologically distinct genetic condition rather than a direct replacement for the MZ comparison.
  
  For the MZ experiments, we minimised genetic-background differences by using cousin-matched controls derived from the same parental family and by analysing independent experimental replicates. Importantly, the principal behavioural finding was independently supported in sibling-matched zygotic mutants generated by heterozygous incrosses. These larvae showed a qualitatively similar reduction in visually guided behaviour, although with a smaller effect size (new Supplementary Figure 4). We also validated selected transcriptional changes in sibling-matched zygotic mutants by RT-qPCR (new Supplementary Figure 9). The weaker phenotype in zygotic mutants is consistent with partial buffering by maternal rlig1 transcript or protein. Future studies will be valuable to further separate how maternal and zygotic Rlig1 affects gene expression and visually guided behaviour.
  
  Insufficient expertise to evaluate: While I understand the first part of Figure 1, I do not have expertise in these sorts of assays. The rest of the experiments I do have sufficient expertise to evaluate. And thank you to the authors for providing direct DOI links to references.
  
  We are grateful for the reviewers’ detailed comments, which substantially improved the manuscript. We hope that the revised text and additional analyses address the central concerns and make the study more transparent and useful to the field.
  
  PeerReviewed
2. EMBOpress 06 Jul 2026
  
  in Review Commons
  
  Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Referee #1
  
  Evidence, reproducibility and clarity
  
  Summary:
  
  Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).
  
  This study characterizes the function of RNA ligase 1 (Rlig1) in the vertebrate model zebrafish. Rlig1 is one of only two known RNA ligases in vertebrates, and its biological roles remain poorly understood. The authors combine gene expression analysis, loss-of-function approaches, transcriptomic profiling, calcium imaging, and behavioral assays to investigate its function during development. They show that loss of rlig1 (including maternal-zygotic loss) has no major effects on development or morphology, but that it leads to impairments in visually-guided behavior and altered neuronal activity in response to visual stimuli. Transcriptomic analyses reveal widespread dysregulation across multiple developmental stages, nominating genes that may underly the observed neural phenotypes. Together, the findings support a role for Rlig1 in neural development and function in vertebrates.
  
  Major comments:
  
  Are the key conclusions convincing?
  
  The key conclusion of this study is that Rlig1 plays an important role in the development and function of vertebrate neural circuits. Overall, this overarching conclusion, as well as the individual conclusions from each set of experiments, are well supported by the data presented. The combination of tissue-specific expression of rlig1, robust behavioral phenotypes in mutants, transcriptomic changes across multiple developmental stages, and circuit differences observed through calcium imaging provides a coherent, multi-faceted argument for the importance of this enzyme in brain development and function. While the precise RNA substrates of Rlig1 and the mechanistic link between transcriptomic changes and neural phenotypes remain to be defined, the authors clearly acknowledge these next steps and limitations. This study is a critical foundation for those future experiments. - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?
  
  The claims in the manuscript are generally well-supported. The authors clearly acknowledge limitations and future experiments to further dissect mechanism in the Discussion section. - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.
  
  No major additional experiments appear essential for supporting the current claims. - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.
  
  No experiments are required for the current claims of the manuscript. - Are the data and the methods presented in such a way that they can be reproduced?
  
  The methods are generally well described. I would suggest that the "raw images, data, and source code for custom scripts used in this work" be made accessible without having to request from the authors. Zenodo provides up to 50 GB of storage, which is likely sufficient for the data presented in this manuscript. In particular, I think it is important to share the behavior analysis, calcium imaging pipeline, and transcriptomics analysis. Even if all the data is too large, a sample dataset and analysis scripts should be publicly available. - Are the experiments adequately replicated and statistical analysis adequate?
  
  The experiments appear adequately replicated, and statistical analyses are appropriate for the types of data presented.
  
  Minor comments:
  
  Specific experimental issues that are easily addressable.
  
  Are prior studies referenced appropriately?
  
  Are the text and figures clear and accurate?
  
  Do you have suggestions that would help the authors improve the presentation of their data and conclusions?
  
  Throughout the manuscript: use the prime symbol for 5/3 DNA/RNA instead of an apostrophe. The prime symbol is present in a small number of sentences, but mostly the apostrophe is used.
  
  Line 227: "Next, we compared the total number of neurons". The elavl3 driver labels brain cells in addition to neurons.
  
  The authors compared to the total number of brain cells, but can they make any comments on the size of the brain across the various areas? I imagine this data is also accessible by analyzing the imaging already collected.
  
  Given that there is already a mouse mutant for this gene and transcriptomics, can the authors do a more thorough job comparing the transcriptomics from that study with their own?
  
  A clearer statement on the similarities and differences of Rlig1 and RtcB would be helpful. Is it possible RtcB is compensating at all?
  
  I examined the DEG tables, and I did not notice an obvious substantial enrichment of genes on chromosome 25 (White et al., 2022, https://doi.org/10.7554/eLife.72825). Were the different samples from different clutches or the same clutch? I may have missed it. Regardless, I would carefully check the DEGs that are important for conclusions and check that they are not on the same chromosome as rlig1. It is likely worth rerunning all of the GO/GSEA with genes on chromosome 25 excluded.
  
  Referees cross-commenting
  
  I missed the point about the RNA-seq samples being cousin-matched. While I am optimistic that the results won't change, I agree with Reviewer #3 that some confirmation is necessary. It was unclear to me whether the samples were from the same or different clutches - if they are from different clutches and share overlapping genes, that would also add support to the results. I think that detail was missing from the methods, and I had pointed it out. Either additional RNA-seq or even qPCR of some top genes from a heterozygous incross is a reasonable request.
  
  Significance
  
  Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.
  
  This study provides a conceptual and biological advance by identifying a role for a vertebrate RNA ligase in brain development, behavior, and transcriptional regulation. - Place the work in the context of the existing literature (provide references, where appropriate).
  
  Although RNA ligases from single-cell organisms and phage are well-characterized, the roles of RNA ligases in vertebrates are relatively understudied. There are only two, including the one the one that is the focus of this manuscript. This study demonstrates an in vivo function for Rlig1, linking molecular changes to neural development and function. The Rlig1 enzyme was only very recently discovered (2023), making this work timely and an important addition to an area with relatively few studies.
  
  A major strength of the study is its multi-level approach, integrating diverse techniques to coherently link this gene to organism-level phenotypes. This work provides a strong conceptual and functional advance by demonstrating a role for Rlig1 in vertebrate neural circuit function and behavior. A remaining mechanistic gap is that the direct RNA substrates of Rlig1 are not identified, and the observed transcriptomic changes in mRNA are likely downstream consequences of its loss. However, these points are clearly acknowledged in the discussion, making the study a well-balanced contribution. Given the existence of a mouse knockout model, further discussion comparing the zebrafish transcriptomic results and phenotypes to those observed in mouse would help place this work in the context of prior studies. Overall, the main conclusions are well supported, and the limitations do not undermine them. This study represents an important contribution that establishes a foundation for future mechanistic work linking Rlig1 substrates to the observed phenotypes. - State what audience might be interested in and influenced by the reported findings.
  
  Zebrafish basic science researchers, particuarly those studying how genes lead to altered neural circuits and behavior, are the most direct target audience. However, the work is of more broad interest to those in the fields of neurodevelopment, gene regulation, and RNA biology / processing. - Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.
  
  I am comfortable evaluating zebrafish mutants, transcriptomics, and behavioral assay design. I have more limited experiment in neural circuit anaysis and interpretation of calcium imaging data, though this part of the manuscript was also clearly presented and understandable.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2025.12.01.691575
www.biorxiv.org www.biorxiv.org

Cytosolic Carboxypeptidase 5 maintains mammalian ependymal multicilia to ensure proper homeostasis and functions of the brain

1
1. Public_Reviews 06 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank the Editors for the positive assessment on our manuscript. We also thank the Reviewers for their positive remarks and constructive comments. Based on the Reviewers’ feedback, we have conducted additional experiments and provided supporting data to address Reviewers’ comments. Particularly, we provided quantitative measurement for rotational polarity of ependymal cells in Agbl5<sup>M1/M1</sup> mutants and assessed the microtubule polarization. We quantified the intensity of apical actin network in ependymal cells to strength the role of CCP5 in organizing actin network. Using scanning electron microscopy, we demonstrated the affected polarity of trachea multicilia in Agbl5<sup>M1/M1</sup>. We co-immunostained ependymal cilia with GT335 and acetylated tubulin to address the effects on their length in cilia in the mutant. We assessed the presence and length of primary cilia in ependymal cell progenitors to identify their potential contribution to the defective polarity in Agbl5<sup>M1/M1</sup> ependymal cells. We feel that these revisions have much strengthened this MS.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Dad et al. explored the roles of cytosolic carboxypeptidase 5（CCP5）in the development of ependymal multicilia in the brain. CCP family are erasers of polyglutamylation of ciliary-axoneme microtubules. The authors generated a new mutant mouse of Agbl5 gene, which encodes CCP5, with deletion of its N-terminus and partial carboxypeptidase (CP) domain (named AGBL5M1/M1).
  
  Strengths:
  
  The mutant mice revealed lethal hydrocephalus due to degeneration of ependymal multicilia. Interestingly, this is in contrast with the phenotype of Agbl5 mutants with disruption solely in the CP domain of CCP5 (named AGBL5M2/M2) that did not develop hydrocephalus despite increased glutamylation levels in ependymal cilia as observed for AGBL5M1/M1 mutants. The study has been well-performed and the findings suggest a unique function of the N-domain of CCP5 in ependymal multicilia stability.
  
  Weaknesses:
  
  The content of this article is relatively descriptive and lacks molecular insights.
  
  We thank the Reviewer’s positive comments. To address the molecular insights of the dysregulated planar cell polarity (PCP) in Agbl5<sup>M1/M1</sup> ependyma, we have conducted additional experiments to assess the microtubule polarization in ependymal cells (Figure 7O-P). We quantified the intensity of actin networks around BB patches to better understand how it is affected in the ependyma of the mutants and contributes to the dispersion of BBs (Figure 4M-N), (Please see Recommendations for the authors).
  
  We also assessed trachea multicilia in Agbl5<sup>M1/M1</sup> mutants using SEM and found that the polarity of trachea multicilia was affected as well (Figure S2).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study analyzed the consequences of Agbl5 mutation on ependymal cell development and function. The authors first characterize their mutant mouse line reporting a reduced lifespand and severe hydrocephalus. Next, they report a defect in ependymal cell cilia number and motility. They provide evidence for impaired basal body organisation and cilia glutamylation.
  
  Strengths:
  
  Description of a mutant mouse which implicates Cytosolic Carboxypeptidase 5 (the product of Agbl5 gene) for proper ependymal cells.
  
  Weaknesses:
  
  Description of phenotype is incomplete:
  
  We thank the Reviewer’s constructive comments. We have performed additional quantitative analysis of the phenotypes in Agbl5<sup>M1/M1</sup> that we feel strengthen this study.
  
  Figure 3G - the sequence from the movie is not really informative. Providing beating frequencies as quantification of the data would be more informative.
  
  We have provided the beating frequency as well as the mean vector length of cilia beating directions (that reflects the coordination of cilia) in Figure 3H and 3I respectively in the revised manuscript.
  
  Figure 3 - the quantification of actin network would strengthen the message.
  
  We agree with the Reviewers. We have quantified the total intensity of actin around BBs and the actin intensity normalized to signals of the BB marker (CEP164). The data have been provided in Figure 4M and 4N respectively. The quantitative analysis showed that both the total intensity of apical actin network and the intensity of F-actin per BB are reduced in Agbl5<sup>M1/M1</sup> ependymal cells compared to that in wild-type mice, suggesting that CCP5 is involved in organizing actin network around BB. This analysis certainly improves the clarity of this message.
  
  Lines 219 -220 - the authors conclude «Taken together, in Agbl5M1/M1 ependymal cells, the expression of genes promoting multiciliogenesis were not impaired but certain proteins associated with differentiated ependymal cells are not properly expressed». However, they do not assess gene but protein expression (IF). In addition, their quantification shows differences in the number of FoxJ1 positive cells which indeed is an impaired expression.
  
  We will clarify this statement and emphasize the number of FoxJ1-positive cells.
  
  Microtubules are involved in the local organization of ciliary basal bodies (see Werner et al., Vladar et al.,2011; Boutin et al., 2014). It would be interesting for the authors to check whether the subapical network of microtubules is glutamylated or not during ependymal cell differentiation and how this network is affected in their mutants.
  
  We thank the Reviewer’s constructive comments. We conducted an immunostaining on whole-mount lateral walls of lateral ventricles for GT335 and Centrin1, the position of the latter being used to localize the subapical layer. While the GT335 signal in multicilia is increased in Agbl5<sup>M1/M1</sup> ependyma (Figure S8E), its signals underneath BBs are not much different between the mutant and wild-type (Please see Figure S8C, D, G, H).
  
  Showing the data mentioned in the discussion on Cep110 would be a nice addition to the paper.
  
  These data have been provided in Supplementary Figure S9.
  
  Line 354: "The latter serves as a component of tissue polarity that is required for asymmetric PCP protein localization in each cell (Boutin et al., 2014; Vladar et al., 2012)." The cited reference did not demonstrate that this microtubule network is required for asymmetric PCP localization.
  
  We thank the Reviewer for critical reading. The cited reference (Bountin et al., 2014) has been removed.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors developed a new Agbl5 KO allele, extending the deletion to the N-terminus of CCP5 to explore its function in mouse ependymal cells.
  
  Strengths:
  
  They show that the KO mice exhibit severe hydrocephalus due to disorganized and mislocated basal bodies. Additionally, they present evidence of both impaired beating coordination and a reduction in ciliary beating.
  
  Weaknesses:
  
  The manuscript is well-written but lacks specific interpretations of the results presented. Further experiments are needed to be fully convincing.
  
  We thank the Reviewer’s comments. We have performed further analysis and conducted additional experiments to strengthen this study.
  
  (1) We have quantified the intensity of actin staining around BB patches and its intensity relative to the number of BBs to assess to which extent the actin networks in Agbl5<sup>M1/M1</sup> ependymal cells are affected (please refer to the above response to the comments of Reviewer 2#). The results were shown in Figure 4M-N.
  
  (2) We Co-stained tdTomato with an ependymal cell-specific markers to strengthen the expression of Agbl5 in ependymal cells (please see Figure 6C-E).
  
  (3) We have conducted co-immunostaining of GT335 and Ac-Tub and compared the length of their signals in ependymal multicilia between WT and Agbl5<sup>M1/M1</sup> mice (please see Figure 6O, P, R, S).
  
  (4) We quantified the area of ependymal cells in the wild-type and Agbl5<sup>M1/M1</sup> mice. Indeed, the area of ependymal cells is increased in the mutants. However, the primary cilia are present in the ependymal cell progenitors of Agbl5<sup>M1/M1</sup> mice and have similar length with that in the wild-type (Please see Figure 7M, N and our response to this point below).
  
  (5) We performed additional analysis to address the affected rotational polarity in the Agbl5<sup>M1/M1</sup> mutant mice (please see Figure 3I, Figure 7E).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The authors showed that the actin networks were severely affected, leading to impaired stability of basal bodies and that the intensity and length of acetylated tubulin signal in the multicilia were dramatically reduced in AGBL5M1/M1mutant mice (Figures 3 and 5). Data also suggested the dysregulation of planar cell polarity. Are expression and localization of other planar cell polarity proteins such as tyrosinated tubulin and Fzd6 affected in mutant mice?
  
  We thank the Reviewer’s recommendations. We have assessed the expression of tyrosinated tubulins and found they are similarly polarized in ependymal cells from wild-type and Agbl5<sup>M1/M1</sup> mice. The results are presented in Figure 7O, P in the revised MS. We also tried to assess the expression of Fzd6. However, with the antibody we tested, Fzd6 signals were not convincing. Therefore, we prefer to not showing the results and drawing a conclusion on it.
  
  (2) The phenotype of multiciliated cells in tracheas should also be examined in mutant mice. It is important to elucidate whether AGBL5 commonly functions in multiciliated cells of other organs.
  
  We thank the Reviewer’s suggestion. We have assessed the multicilia in the tracheas of P30 mice using scanning electron microscopy. Indeed, unlike the multicilia in wild-type mice that orientate to the same direction, those in the tracheas of Agbl5<sup>M1/M1</sup> mice often radiate to different directions in individual cells (Figure S2). Therefore, Agbl5 appears commonly involved in the alignment of multicilia.
  
  (3) According to Figure 1B, AGBL5 is highly expressed in the brain. Which cells in the brain express it besides ependymal cells?
  
  Based on the localization of tdTomato tracer engineered in Agbl5 mutant alleles (Figure 5B), Agbl5 is broadly expressed in the brain, including most if not all neurons, but its expression is much weaker in the subventricular zone (Please see Figure 5B). We clarified this in the revised MS.
  
  (4) From a mechanistic point of view, it is necessary to identify binding proteins with the N-domain of AGBL5 and perform functional analyses.
  
  We agree with the Reviewer. We feel that identification of the binding partners of CCP5 N-domain and functional analysis may be more suitable to go along with other mechanistic analysis on the function of CCP5 in ependymal cell polarities in our future study.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Movie 3: The authors could comment on beating direction that seems impaired at the cell scale here, analysis of rotational polarity would be a plus.
  
  We thank the reviewer’s recommendation. We have analyzed the beating directions of cilia in individual cells and presented their consistency in each cell using mean vector length. These results indeed demonstrated defective rotational polarity in the cell level in Agbl5<sup>M1/M1</sup> mice (please refer to Figure 3I). We also analyzed the beating directions of ependymal multicilia in earlier stage in tissue level (Figure 7E). The mean vector length of cilia beating direction in Agbl5<sup>M1/M1</sup> mice is significantly reduced compared to that in wild-type, suggesting an aberrant rotational polarity in the tissue level in the mutant (Figure 7E).
  
  (2) Line 166 : ref to Werner et al., 2011 is not correct (no ependymal cells in that paper).
  
  We thank the reviewer’s critical reading. This reference has been removed.
  
  (3) Figure S4: B and D look similar picture to me same for C and F.
  
  We apologize for using the wrong images in this Figure. It has been corrected (Revised Figure S5).
  
  (4) Line 328: "Therefore, CCP5 apparently contributes to the establishment of both translational and tissue polarities in ependymal cells." Should be rephrased since translational polarity is also a tissue-level parameter which is the coordinated positioning of the ciliary patch. Cf Mirzadeh et al., 2010; Boutin et al., 2014.
  
  We thank the Reviewer’s comments. The sentence has been rephrased. This concept has been clarified where else needed in the revised manuscript.
  
  (5) Line 348: "Planar cell polarity (PCP) pathway is essential for the establishment of rotational and tissue polarities in ependymal cells" Rotational polarity also has a tissular component (ie coordination of beating direction across tissue which is reflected by coordination of basal body polarities across tissue).
  
  We thank the Reviewer’s comments. We have clarified this point in the revised MS.
  
  (6) Incomplete bibliography citation (ie Walentek et al. without date).
  
  We thank the Reviewer’s critical reading. This bibliography citation has been fixed.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Figure 3: The authors assert that the mutant's apical actin networks are significantly disrupted. However, the cell shown in Figure 3Q-R exhibits less compact centrioles than the controls, which could account for the reduction in phalloidin staining. Because centriole dispersion is variable in the mutant, quantifying actin staining in representative cells would be necessary to support such a statement.
  
  We thank the Reviewer’s comments. To address this concern, we have quantified the total intensity of actin network around BBs as well as the intensity of F-actin signals normalized to the level of immunosignals of BBs ((revised Figure 4M, N) please also refer to our response to Reviewer 1#). The results indicated the intensity of actin signal per BB is reduced in the mutant compared to that of wild-type mice. We feel that this analysis strengthened our statement.
  
  (2) Figures S3 and 4A-B show that the authors examine tdT expression to show that Agbl5 is expressed in ependymal cells but not in the SVZ. However, the tdT signal intensity is very low, and cells are very dense in this brain region. Double staining with specific markers of ependymal and/or SVZ cells would help convince readers that tdT is not expressed in SVZ cells.
  
  We agree with the Reviewer that the intensity of tdT signal is low, but broadly detectable in brain. Compared with its expression in ependymal cells, that in SVZ is much lower if any (Figure 4B’). To further confirm the identity of tdT-positive cells along the surface of ventricles, we have co-stained the brain sections of Agbl5<sup>WT/M1</sup> mice for tdT and S100b, a marker of mature ependymal cells (Figure 5C-E). The signal of tdt is colocalized with that of S100b and is much lower in cell layers next to S100b-positive cells.
  
  (3) Figure 4C-D and S4: The authors demonstrate that the number of FoxJ1+ cells per section increases at P7 (4C-E), while the number of S100β+ cells per mm decreases. Quantifications should be carried out in a similar manner to ensure comparability (number of positive cells per mm). Additionally, it remains unclear how to interpret these results, as S100β and FoxJ1 are two markers of differentiated cells, yet they exhibit opposite trends compared to controls. Is this a direct or indirect effect of Agbl5 mutation? The increase in the number of FoxJ1+ cells is particularly surprising given that the number of GT335 multicilia per mm remains unchanged (Figure 5).
  
  We agree with the Reviewer that quantifications should be carried out in a similar manner. In the revised MS, the quantification of Foxj1-positive cells is presented in number per mm (Figure 5I). To be noted, the expression of Foxj1 was assessed at P7 when ependymal cells are differentiating. while the expression of S100β was assessed at P17 when ependymal cells are supposed to be fully mature. Although S100b is used as a marker of mature ependymal cells, given its unclear function, we removed the results of S100b-positiving cell counting to avoid confusion in the revised manuscript.
  
  (4) Figure 5: In this figure, the authors analyze the labeling obtained with GT335, Acetylated Tubulin, and Arl13b antibodies. They show that the area of the cilium labeled by GT335 has increased, while the area labeled by the Acetylated Tubulin antibody has decreased in the knockout (KO) compared to the control. However, the length of the cilia observed through labeling with the Arl13b antibody remains unchanged. These observations are intriguing, but the low-magnification images in Figure 4 do not allow for the differences in ciliary axoneme labeling to be seen. Double GT335/AcTub labeling and higher magnifications are necessary for improved visualization of the differences in labeling along the axonemes.
  
  We thank the Reviewer comments. We have co-stained the cilia with GT335 and Ac-Tub antibodies, re-quantified cilia length labeled with respective antibodies and provided high magnification images. Please see the revised Figure 6O,P,R,S.
  
  (5) Figure 6: An analysis of ciliary beats using a high-speed camera shows no difference in ciliary beat frequency between the control and KO groups. At least, 3 animals should be analyzed. According to Figure 5, these findings indicate that the decrease in ciliary acetylation and the increase in ciliary glutamylation do not affect the beat frequency; instead, they disrupt the orientation of the beats. While these results are intriguing, they require further confirmation. Analyzing ciliary beats with a high-speed camera is informative, but at least three animals per genotype should be examined to ensure rigor. Furthermore, if the coordination of ciliary beats is impaired within the cells, this should be validated by double-labeling centrioles and basal feet to demonstrate that the orientation of cilia within the cells is abnormal.
  
  We thank the Reviewer’s comments. Sections shown in Figure 5 (currently Figure 6) are from P7 mice, while the ciliary beating analysis shown in Figure 6 (currently Figure 7) is from P15 mice. As the PTM changes in cilia were also observed in Agbl5<sup>M2/M2</sup>, we don’t think this is the cause that disrupts the orientation of the beats. The rotational polarity of Agbl5<sup>M1/M1</sup> ependymal cells is affected. Please refer to the analysis in Figure 3I and Figure 7E in the revised manuscript.
  
  (6) Figure 6F-G: β-Catenin labeling reveals cells of varying sizes in the KO. This phenotype is typical of ciliary mutants that lack primary cilia (Mirzadeh et al., 2010). Hence, it is essential to examine the mutation's impact on the presence, length, and positioning of the primary cilium in ependymal cell progenitors.
  
  We thank the Reviewer’s constructive comments. We assessed the area of ependymal cells labeled with β-Catenin. Indeed, the ependymal cells in the mutant showed larger area than that of wild-type. The ratio of the area of BB patch over that of cell surface is reduced (please see Figure 7O, P in the revised manuscript). However, primary cilia are present in ependymal cell progenitors in the mutant and exhibit comparable length with those in the wild-type (Figure S8). Due to some technique problems, we were unable to get convincing results from whole-mount ventricle walls for the primary cilium positioning at this time. We speculate that the localization of certain sensory proteins in primary cilia or the positioning of primary cilia might be affected in Agbl5<sup>M1/M1</sup> mice. We discussed this possibility and will certainly systemically assess this intriguing aspect in our future investigation.
  
  (7) Given the regular beating frequency in the KO at P15, how do the authors explain the complete absence of ciliary beating in the adult? How many animals were analyzed? One would expect ciliary beating to remain unaffected as it was at P15 unless the cilia structure was specifically altered at the adult stage. Is that the case?
  
  We thank the Reviewer’s critical questions. We do think that the ciliary structure of Agbl5<sup>M1/M1</sup> ependymal cells is likely altered during aging. Given that only Agbl5<sup>M1/M1</sup> but not Agbl5<sup>M2/M2</sup> mice develop hydrocephalus, we speculate the N-domain of CCP5 may contribute to the integrity of ependymal multicilia. We have added this in the Discussion section. For each genotype, 2 mice were analyzed.
  
  (8) Line 264 of the manuscript: replace intercellular with intracellular.
  
  It has been revised.
  
  (9) Indicate the number of animals analyzed in each experiment
  
  It has been included in figure legends.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.30.630763v3
www.biorxiv.org www.biorxiv.org

Heritability of movie-evoked brain activity and connectivity

1
1. Public_Reviews 06 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Gruskin and colleagues use twin data from a movie-watching fMRI paradigm to show how genetic control of cortical function intersects with the processing of naturalistic audiovisual stimuli. They use hyperalignment to dissect heritability into the components that can be explained by local differences in cortical-functional topography and those that cannot. They show that heritability is strongest at slower-evolving neural time scales and is more evident in functional connectivity estimates than in response time series.
  
  Strengths:
  
  This is a very thorough paper that tackles this question from several different angles. I very much appreciate the use of hyperalignment to factor out topographic differences, and I found the relationship between heritability and neural time scales very interesting. The writing is clear, and the results are compelling.
  
  We thank Reviewer 1 for their kind words and enthusiastic support of our manuscript.
  
  Weaknesses:
  
  The only "weaknesses" I identified were some points where I think the methods, interpretation, or visualization could be clarified.
  
  (1) On page 16, the authors compare heritability in functional connectivity (FC) and response time series, and find that the heritability effect is larger in FC. In general, I agree with your diagnosis that this is in large part due to the fact that FC captures the covariance structure across parcels, whereas response time series only diverge in terms of univariate time-point-by-time-point differences. Another important factor here is that (within-subject) FC can be driven by intrinsic fluctuations that occur with idiosyncratic timing across subjects and are unrelated to the stimulus (whereas time-locked metrics like ISC and timeseries differences cannot, by definition). This makes me wonder how this connectivity result would change if the authors used inter-subject functional connectivity (ISFC) analysis to specifically isolate the stimulus-driven components of functional connectivity (Simony et al., 2016). This, to me, would provide a closer comparison to the ISC and response time series results, and could allow the authors to quantify how much of the heritability in FC is intrinsic versus stimulus-driven. I'm not asking that the authors actually perform this analysis, as I don't think it's critical for the message of the manuscript, but it could be an interesting future direction. As the authors discuss on page 17, I also suspect there's something fundamentally shared between response time series and connectivity as they relate to functional topography (Busch et al., 2021) that drives part of the heritability effect.
  
  We agree that investigating the heritability of ISFC (or stimulus-driven functional connectivity) would make for a very interesting future direction. Ultimately, we chose to analyze FC (vs. ISFC) profiles to allow for direct comparison with the sizable existing literature on the heritability of FC (such as in our Movie vs. Rest FC analysis) and decided to refrain from analyzing ISFC data in order to keep the present manuscript focused. ISFC analysis of this dataset will be a focus of future work.
  
  (2) The observation that regions with intermediate ISC have the largest differences between MZ, DZ, and UR is very interesting, but it's kind of hard to see in Figure 1B. Is there any other way to plot this that might make the effect more obvious? For example, I could imagine three scatter plots where the x- and y-axes are, e.g., MZ ISC and UR ISC, and each data point is a parcel. In this kind of plot, I would expect to see the middle values lifted visibly off the diagonal/unity line toward MZ. The authors could even color the data points according to networks, like in Figure 3C. (They also might not need to scale the ISC axis all the way to r = 1, which would make the differences more visible.)
  
  We thank R1 for this helpful suggestion- we originally set the y-axis limits to r = 1 in order to facilitate comparison between ISC (Fig. 1B) and FC profile (Fig. 6B) similarity, but we agree that this renders the group differences harder to discern and have updated the plot accordingly (along with thicker lines to enhance readability). We prefer to keep the line plots in the main body as they allow for direct comparison of all three groups on the same plot, but we have included the scatter plot version in Fig. S2 for those who are interested.
  
  (3) On page 9, if I understand correctly, the authors regress the vector of ISC values across parcels out of the vector of heritability values across parcels, and then plot the residual heritability values. Do they center the heritability values (or include some kind of intercept) in the process? I'm trying to understand why the heritability values go from all positive (Figure 2A) to roughly balanced between positive and negative (Figure 2B). Important question for me: How should we interpret negative values in this plot? Can the authors explain this explicitly in the text? (I also wonder if there's a more intuitive way to control for ISC. For example, instead of regressing out ISC at the parcel/map level, could they go into a single parcel and then regress the subject-level pairwise ISC values out when computing the heritability score?).
  
  We indeed included an intercept in this model using MATLAB’s fitlm function. This means that the model estimates the best-fitting line of the following form: heritability<sub>i</sub>=β0+β1ISC<sub>i</sub> +ε<sub>i</sub>. We agree that the interpretation of these ε<sub>i</sub> values and alternative approaches to controlling for ISC should be clarified. As such, we have added the following passages to the text:
  
  Methods: “Because the heritability of ISC is constrained by the degree of synchronization in a given area, we also sought to identify areas in which BOLD time courses were more/less heritable than would be expected based on ISC alone by fitting a linear model of the form heritability<sub>i</sub>=β0+β1ISC<sub>i</sub>+ε<sub>i</sub> and plotting the residuals. Regarding alternative approaches to controlling for ISC, although the heritability model introduced by Ge et al. allows for the inclusion of covariates defined at the subject level (e.g., age), it does not allow for covariates that are defined at the dyad level (e.g., pairwise ISC).”
  
  Results: “Here, negative values in the residual map indicate parcels where heritability is lower than expected based on ISC, while positive values indicate higher-than expected heritability.”
  
  (4) On page 4 (line 155), the authors say "we shuffled dyad labels"- is this equivalent to shuffling rows and columns of the pairwise subject-by-subject matrix combined across groups? I'm trying to make sure their approach here is consistent with recommendations by Chen et al., 2016. Is this the same kind of shuffling used for the kinship matrix mentioned in line 189?
  
  Briefly, shuffling the kinship matrix involved permuting the rows and columns of the matrix in the same manner (also known as the quadratic assignment procedure), whereas shuffling the dyad labels involved random permutations of the three group labels (MZ, DZ, unrelated), which could not be done through matrix operations as the age- and gender matching precluded the use of a complete similarity matrix. However, given concerns raised by Reviewer 2, we have removed our significance claims from this (and similar) sections, which we discuss in more detail in response to Reviewer 2’s weakness A.
  
  (5) I found panel A in Figure 4 to be a little bit misleading because their parcel-wise approach to hyperalignment won't actually resolve topographic idiosyncrasies across a large cortical distance like what's depicted in the illustration (at the scale of the parcels they are performing hyperalignment within). Maybe just move the green and purple brain areas a bit closer to each other so they could feasibly be "aligned" within a large parcel. Worth keeping in mind when writing that hyperalignment is also not actually going to yield a one-to-one mapping of functionally homologous voxels across individuals: it's effectively going to model any given voxel time series as a linear combination of time series across other voxels in the parcel.
  
  We agree that our efforts to present a simplified depiction of hyperalignment may mislead less familiar readers and have amended Fig. 4A according to this suggestion. We have also added text to the methods section (below) to clarify that the outputs of hyperalignment are time series that reflect linear combinations of other voxels’ time series from that parcel.
  
  “This approach independently transforms each subject's data within discrete anatomical parcels into the common space, yielding functionally aligned vertex time series that are calculated as weighted linear combinations of the original time series from all other vertices within that same parcel for that subject.”
  
  (6) I believe the subjects watched all different movies across the two days, however, for a moment I was wondering "are Day 1 and Day 2 repetitions of the same movies?" Given that Day 1 and Day 2 are an organizational feature of several figures, it might be worth making this very explicit in the Methods and reminding the reader in the Results section.
  
  We agree that this would be helpful and have added the following text to the relevant sections:
  
  “All clips were only viewed once by each subject, with the exception of the brief montage which was included at the end of each of the four runs for test-retest purposes.”
  
  “To characterize the heritability of brain responses to complex stimuli, we used 7T fMRI data from 178 HCP Young Adult subjects acquired across two days (using two largely non-overlapping sets of movie stimuli, see Methods)…”
  
  References:
  
  Busch, E. L., Slipski, L., Feilong, M., Guntupalli, J. S., di Oleggio Castello, M. V., Huckins, J. F., Nastase, S. A., Gobbini, M. I., Wager, T. D., & Haxby, J. V. (2021). Hybrid hyperalignment: a single high-dimensional model of shared information embedded in cortical patterns of response and functional connectivity. NeuroImage, 233, 117975. https://doi.org/10.1016/j.neuroimage.2021.117975
  
  Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248259. https://doi.org/10.1016/j.neuroimage.2016.05.023
  
  Simony, E., Honey, C. J., Chen, J., Lositsky, O., Yeshurun, Y., Wiesel, A., & Hasson, U. (2016). Dynamic reconfiguration of the default mode network during narrative comprehension. Nature Communications, 7, 12141. https://doi.org/10.1038/ncomms12141
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors attempt to estimate the heritability of brain activity evoked from a naturalistic fMRI paradigm. No new data were collected; the authors analyzed the publicly available and well-known data from the Human Connectome Project. The paper has 3 main pieces, as described in the Abstract:
  
  (1) Heritability of movie-evoked brain activity and connectivity patterns across the cortex.
  
  (2) Decomposition of this heritability into genetic similarity in "where" vs. "how" sensory information is processed.
  
  (3) Heritability of brain activity patterns, as partially explained by the heritability of neural timescales.
  
  Strengths:
  
  The authors investigate a very relevant topic that concerns how heritable patterns of brain activity among individuals subjected to the same kind of naturalistic stimulation are. Notably, the authors complement their analysis of movie-watching data with resting-state data.
  
  Weaknesses:
  
  The paper has numerous problems, most of which stem from the statistical analyses. I also note the lack of mapping between the subsections within the Methods section and the subsections within the Results section. We can only assess results after understanding and confirming the methods are valid; here, however, Methods and Results, as written, are not aligned, so we can't always be sure which results are coming from which analysis.
  
  (A) Intersubject correlation (ISC) (section that starts from line 143): "We used nonparametric permutation testing to quantify average differences in ISC for each parcel in the Schaefer 400 atlas for each day of data collection across three groups: MZ dyads, DZ dyads, and unrelated (UR) dyads, where all UR dyads were matched for gender and age in years." ... "some participants contributed to ISC values for multiple dyads (thus violating independence assumptions)"
  
  This is an indirect attempt to demonstrate heritability. And it's also incorrect since, as the authors themselves point out, some subjects contribute to more than one dyad.
  
  Permutation tests don't quantify "average differences", they provide a measure of evidence about whether differences observed are sufficient to reject a hypothesis of no difference.
  
  Matching subjects is also incorrect as it artificially alters the sample; covarying for age and sex, as done in standard analyses of heritability, would have been appropriate.
  
  It isn't clear why the authors went through the trouble of implementing their own nonparametric test if HCP recommends using PALM, which already contains the validated and documented methods for permutation tests developed precisely for HCP data.
  
  The results from this analysis, in their current form, are likely incorrect.
  
  We appreciate that permutation tests do not quantify average differences and intended to write “We used non-parametric permutation testing to quantify [the significance of] average differences…”. Our intention with this analysis was not to demonstrate heritability, but rather to quantify group differences in ISC in a manner that is interpretable for readers who are unfamiliar with h<sup>2</sup> (e.g., “identical twins’ BOLD time courses were 59% more similar than those from pairs of unrelated individuals”) and motivate the formal heritability analysis used later in the paper. Indeed, all of the heritability analyses in this paper leveraged a validated multidimensional heritability method first introduced by Ge et al. (2016) and used by many other investigators since then. Furthermore, we covaried for age and sex at the subject level in all our heritability analyses, and always tested the significance of these heritability values using a validated permutation procedure (the quadratic assignment procedure; Hubert & Schultz, 1976) that respects the non-independence of dyadic data.
  
  Regarding the shuffling procedure used for Figure 1, while PALM is the standard for univariate, subject-level GLMs in the HCP pipeline and can accommodate nested designs (i.e., subjects within families), it is not designed to handle the unique relational dependencies of dyadic ISC analysis (i.e., the same subject contributing to multiple dyads). Although the element-wise resampling approach was the most appropriate approach available, it is known to inflate the false positive rate (Chen et al., 2016; doi:10.1016/j.neuroimage.2016.05.023); given that this analysis was simply meant to motivate our later hypothesis testing heritability analyses, we have removed significance claims from this section of the manuscript. Still, we emphasize that this has no bearing on the validity of our conclusions which were supported by our formal heritability analyses; throughout our paper we have correctly used the appropriate methods to back the stated claims.
  
  (B) Functional connectivity (FC) (section that starts from line 159): Here the authors compute two 400x400 FC matrix for each subject, one for rest, one for movie-watching, then correlate the correlations within each dyad, then compared the average correlation of correlations for MZ, DZ, and UR. In addition to the same problems as the previous analysis, here it is not clear what is meant by "averaging correlations [...] within a network combination". What is a "network combination"? Further, to average correlations, they need to be r-to-z transformed first. As with the above, the results from this analysis in its current form are likely incorrect.
  
  We regret that R2 had difficulty understanding our analysis and have added the following text to the relevant Methods section to clarify our approach:
  
  “For example, there are 16 parcels in the Kong et al. Auditory network and 17 parcels in the Language network, so the FC profile for a given subject’s Auditory-Language network combination consists of the (16 * 17 =) 272 correlation coefficients between all unique pairs of one parcel from each network.”
  
  As we stated in the previous Methods paragraph, “All Pearson r values in this and all other analyses were Fisher z-transformed before averaging (and converted back to Pearson r for visualization)”. Thus, contrary to the reviewer’s assertion, these analyses were performed correctly. Once again, we emphasize that this analysis was not intended to demonstrate heritability, but rather to describe group differences in FC in familiar units.
  
  (C) ISC and FC profile heritability analyses (section that starts from line 175): Here, the authors use first a valid method remarkably similar to the old Haseman-Elston approach to compute heritability, complemented by a permutation test. That is fine. But then they proceed with two novel, ill-described, and likely invalid methods to (1) "compare the heritability of movie and rest FC profiles" and (2) to "determine the sample size necessary for stable multidimensional heritability results". For (1), they permute, seemingly under the alternative, rest and movie-watching timeseries, and (2), by dropping subjects and estimating changes in the distribution.
  
  The (1) might be correct, but there are items that are not clearly described, so the reader cannot be sure of what was done. What are the "153 unique network combinations"? Why do the authors separate by day here, whereas the previous analyses concatenated both days? Were the correlations r-to-z transformed before averaging?
  
  The (2) is also not well described, and in any case, power can be computed analytically; it isn't clear why the authors needed to resort to this ad hoc approach, the validity of which is unknown. If the issue is the possibility that the multidimensional phenotypic correlation matrix is rank-deficient, it suffices that there are more independent measurements per subject than the number of subjects.
  
  Regarding (1), we have clarified in section 2.6 that the 153 unique network combinations reflect each unique pair of 17 Kong networks. All of our analyses, including this one, were performed separately for each day of data collection, as we state throughout the paper and visualize in our figures (although we acknowledge that, on some occasions, we [conservatively] performed FDR-correction on a combined set of p-values, as discussed in our response to K). Given that the null hypothesis for this analysis is that rest FC and movie FC are equally heritable, we are not sure why permuting rest and movie FC matrices would be invalid. All Pearson r values were z-transformed before averaging, as we stated in our paper.
  
  Regarding (2), we included this analysis in response to editorial concerns that our heritability analyses were not sufficiently powered, and we chose this approach because it serves as a simple way to demonstrate the stability of our results at various sample sizes whose validity is self-evident. Furthermore, this sort of subsampling approach has been used many times before in our field (e.g., Marek et al., 2022) and others (e.g., Manyara et al., 2024) to demonstrate the sample-size dependence and stability of statistical effects. We have added text explaining this to the relevant Methods section (2.6).
  
  (D) Frequency-dependent ISC heritability analysis (from line 216): Here, the authors decompose the timeseries into frequency bands, then repeat earlier analyses, thus bringing here the same earlier problems and questions of non-exchangability in the permutations given the dyads pattern, r-z transforms, and sex/age covariates.
  
  We did not use dyadic permutation testing for any of the frequency-dependent ISC analyses; rather, we used the jackknife SEMs to compare heritability across frequency bands and have added an explicit description of this to section 2.7. We have addressed the r-z transform and covariate concerns in previous comments.
  
  (E) FC strength heritability analysis (from line 236): Here, the authors use the univariate FC to compute heritability using valid and well-established methods as implemented in SOLAR. There is no "linkage" being done here (thus, the statement in line 238 is incorrect in this application. SOLAR already produces SEs, so it's unclear why the authors went out of their way to obtain jackknife estimates. If the issue is non-normality, I note that the assumption of normality is present already at the stage in which parameters themselves are estimated, not just the standard errors; for non-normal data, a rank-based inversenormal transformation could have been used. Moreover, typically, r-to-z transformed values tend to be fairly normally distributed. So, while the heritabilities might be correct, the standard errors may not be (the authors don't demonstrate that their jackknife SE estimator is valid). The comparison of h2 between dyads raises the same questions about permutations, age/sex covariates, and r-z transforms as above.
  
  We used jackknife SEs for these analyses to maintain consistency with the multidimensional heritability package used here, which only outputs jackknife SEs. We note that this jackknife approach (and the corresponding multidimensional heritability analysis) was detailed in prior work (Anderson et al., 2021), and that the leave-one-family-out jackknife has a long history of being used to estimate SEs in heritability studies, especially when working with smaller samples (Knapp et al., 1989). We are also not sure what “the comparison of h2 between dyads” means- heritability cannot be compared “between” dyads; rather, it is defined across dyads.
  
  (F) Hyperalignment (from line 245): It isn't clear at this point in the manuscript in what way hyperalignment would help to decompose heritability in "where vs. how" (from the Abstract). That information and references are only described much later, from around line 459. The description itself provides no references, and one cannot even try to reproduce what is described here in the Methods section. Regardless, it isn't entirely clear why this analysis was done: by matching functional areas, all heritabilities are going to be reduced because there will be less variance between subjects. Perhaps studying the parameters that drive the alignment (akin to what is done in tensor-based and deformation-based morphometry) could have been more informative. Plus, the alignment process itself may introduce errors, which could also reduce heritability. This could be an alternative explanation for the reduced heritability after hyperalignment and should be discussed. An investigation of hyperaligment parameters, their heritability, and their co-heritability with the BOLD-phenotypes can inform on this.
  
  To help set up our hyperalignment analyses, we have added text to the introduction explaining how hyperalignment would help to decompose heritability. The description in the Methods section included a reference to Bazeille et al., 2021, in which the hyperalignment method used here is discussed in detail. Still, we have added citations to additional papers (also cited in the Bazeille et al. paper, and elsewhere in our paper) in case that might be helpful. We note that it is not the case that all heritabilities were reduced by hyperalignment- as can be seen in Figs. 4D, 8A, and S15, hyperalignment did increase heritability in some voxels and network combinations. This would be expected under the alternative (albeit unlikely) hypothesis that functional topographies are not heritable, such that topographic variation between related individuals would obscure similarities in their (heritable) topography-independent brain responses. Recognizing that this alternative is unlikely, we believe the main novelty of this analysis comes from the magnitude of the hyperalignment effect (up to 40% of brain-wide heritability) and its spatial pattern (e.g., larger heritability decreases in visual vs. auditory cortex, the opposite of our NT result).
  
  We agree that we would see lower post-hyperalignment heritability if the alignment process itself introduced errors/noise, but this would be deeply surprising as hyperalignment increases ISC by design (and errors/noise could only decrease ISC). To demonstrate this, we have added Figure S7 which shows that (as expected) ISC across all voxels and subject pairs increases after hyperalignment (and that this increase is larger when hyperalignment is performed in larger parcels). Given that hyperalignment increased ISC, and that it is blind to twin status, we are unsure how it could have introduced errors that would have confounded this result.
  
  (G) Relationships between parcel area and heritability (from line 270): As under F), how much the results are distorted likely depends on the accuracy of the alignment, and the error variance (vs heritable variance) introduced by this.
  
  We agree that alignment accuracy could potentially impact parcel-level differences in how much heritability changes following hyperalignment, and we included the frequency dependent h<sup>2</sup><sub>residuals</sub> (controlling for differences in ISC) in Fig. 3 for this reason, as more accurate hyperalignment should result in greater increases in ISC, raising the heritability ceiling. We note that we observe similar relationships between parcel rank and frequency dependent changes in these residualized maps, suggesting that our parcel-level differences are not simply the result of better alignment in more sensory parcels.
  
  (H) Neural timescale analyses (from line 280): Here, a valid phenotype (NT) is assessed with statistical methods with the same limitations as those previously (exchangability of dyads, age/sex covariates, and r-z transforms). NT values are combined across space and used as covariates in "some multivariate analyses". As a reader, I really wanted to see the results related to NT, something as simple as its heritability, but these aren't clearly shown, only differences between types of dyads.
  
  We have addressed the exchangeability, covariates, and r-z transform comments above (in A). As we explained for our FC strength analyses, we are underpowered to evaluate the heritability of unidimensional traits (like the heritability of NT magnitude), and the heritability of a closely-related measure (BOLD turnover magnitude) has already been established in a larger sample of HCP subjects (https://doi.org/10.1152/jn.00402.2022). Still, we agree that more results related to the heritability of NTs would be of interest to our readers. As such, we have added an analysis in section 3.4 quantifying the heritability of multivariate NT topographies and used SOLAR to quantify the heritability of NT magnitudes, with the disclaimer that this and similar analyses are underpowered (hence the large difference in day 1 and day 2 heritability effect sizes). We also removed significance claims for the dyadic NT similarity analysis.
  
  (I) Significance testing for autocorrelated brain maps and FC matrices (from line 310): Here, the authors suddenly bring up something entirely different: reliability of heritability maps, and then never return to the topic of reliability again. As a reader, I find this confusing. In any case, analyses with BrainSMASH with well-behaved, normally distributed data are ok. Whether their data is well behaved or whether they ensured that the data would be well behaved so that BrainSMASH is valid is not described. As to why Spearman correlations are needed here, Mantel tests, or whether the 1000 "surrogate" maps are valid realizations of the data under the null, remains undemonstrated.
  
  We brought up reliability in this section because we show the reliability of our results across the two days of data collection several times in the paper. R2 is correct to point out that BrainSMASH was validated using normally distributed brain maps, and although some of our brain maps contain normally distributed values, others are right skewed (due largely to the fact that many voxels/parcels exhibit low ISC while visual/auditory areas have very high ISC). In preparing our original manuscript, we visualized BrainSMASH’s variogram outputs for one of the most skewed inputs (vertex-wise BOLD time course heritability) and found that the autocorrelation structures of the empirical and null maps were well-matched. We did not include this in the original manuscript as it is not commonplace in the field to report the variograms, see Author response image 1. Furthermore, our use of Spearman (vs. Pearson) correlations renders these distributional differences less relevant, as the Spearman correlation transforms all inputs to a uniform distribution. To empirically check that these distributional differences do not bias our results, we retested the significance of all brain map associations using the spin test (10.1016/j.neuroimage.2018.05.070), an alternative method that does not assume normally distributed inputs, and obtained identical p-values for all analyses (P<.001 in all cases).
  
  Author response image 1.
  
  (J) Global signal was removed, and the authors do not acknowledge that this could be a limitation in their analyses, nor offer a side analysis in which the global signal is preserved.
  
  Although we agree that GSR is a contentious preprocessing step for certain analyses, it has explicitly been shown to increase ISC signal-to-noise without compromising FC fingerprints (Graff et al., 10.1016/j.dcn.2022.101087), and it is uncommon to perform ISC analyses with and without GSR. Still, we have added additional text to our Methods section explaining our rationale for using GSR and that this could affect our results. We also re-ran our main analysis (BOLD time course heritability) with and without GSR and found that GSR had little impact on our results; we have included this in our manuscript as Fig. S4.
  
  Specifically, we see that GSR resulted in a slight increase in heritability (average Day 1 h<sup>2</sup> with/without GSR = .064/.060; Day 2: .068/.061) and almost no effect on the spatial pattern of our results (With GSR/without GSR Spearman ρ = .99, P<sub>brainSMASH</sub> < .001 on both Day 1 and Day 2).
  
  (K) FDR is used to control the error rate, but in many cases, as it's applied to multiple sets of p-values, the amount of false discoveries is only controlled across all tests, but not within each set. The number of errors within any set remains unknown.
  
  We agree that the FDR usage in our original manuscript was inconsistent, in that for two analyses we FDR-corrected p-values from the two days of data collection together (instead of correcting p-values from each day separately and reporting voxels/parcels/etc. that were significant at q<.05 on both days, as in the rest of our analyses). We note that both approaches are more conservative than reporting significant results at q<.05 separately; regardless, to maintain consistency we have updated all analyses such that FDR correction is always performed separately for each day of data collection.
  
  (L) Generally, when studying the heritability of a trait, the trait must be defined first. Here, multiple traits are investigated, but are never rigorously defined. Worse, the trait being analyzed changes at every turn.
  
  Here, we analyze the heritability of movie-evoked BOLD time courses (Figures 1-5) as well as FC profiles (Figures 6-8). We defined FC profiles in our Introduction as an individual’s pattern of pairwise FC strengths (and further detailed how we quantified FC profiles in the relevant Methods section), and believe that “BOLD time course” is a well understood phrase in the field and does not need to be further defined. We also used hyperalignment to decompose the heritability of these traits into topography-dependent and independent portions, and (new to this version) also explicitly quantify the heritability of neural timescales, which we defined as the AUC of the ACF until the first negative ACF value in both the relevant Results and Methods sections.
  
  To make this clearer, we have modified the last paragraph of our Introduction to begin with:
  
  In the present work, we address these questions by analyzing 7T fMRI recordings of a twin sample acquired by the Human Connectome Project (Van Essen et al., 2013) to quantify the heritability of two distinct high-dimensional traits—stimulus-evoked BOLD time courses and functional connectivity profiles—across the cortex.
  
  Reviewer #3 (Public review):
  
  Strengths:
  
  It's sort of novel to study the heritability of movie-watching fMRI data. The methodology the authors used in the paper is also supportive of their findings. Figures are nicely organized and plotted. They finally found that sensory processing in the human brain is under genetic control over stable aspects of brain function (here referring to neural timescale and resting state connectivity).
  
  Weaknesses:
  
  What I am worried about most is the sample size and interpretation of heritability.
  
  (1) Figure 1. I assumed that the authors just calculated the ISC within each group (MZ, DZ, and UR). Of course, you can get different variations between each group. Therefore, there is heritability. Why not calculate ISC across the whole sample, then separate MZ, DZ, and UR?
  
  We believe that this question is getting at the difference between pairwise ISC (i.e., correlating one BOLD time course from one subject with that from another subject) and leave-one-subject-out ISC (i.e., correlating one BOLD time course from one subject with the corresponding average time course across all other subjects). We chose to use the pairwise ISC method because it allows us to capitalize on the information contained in the n<sup>2</sup> pairwise ISC matrix (whereas the other approach averages out meaningful information to yield a n<sup>1</sup> ISC matrix) and leverage a more sophisticated multidimensional heritability approach. Also, the leave-one-subject-out approach introduces additional issues re: handling family-level data (e.g., should we include a subject’s twin in the leave-one-subject-out average? If so, how should we handle subjects who don’t have a twin in the dataset, as averaging data from different numbers of subjects will lead to different ISC magnitudes? etc.).
  
  (2) Heritability scores in the paper are sort of small. If the sample size is small, please consider p-values, which will tell more about the trustworthiness of your heritability.
  
  We report p-values for heritability throughout our paper (e.g., stating that BOLD time courses are significantly heritable in 99% of parcels in Figure 2), and we believe that the reliability of our spatial maps across days of data collection (also quantified with p-values) further demonstrates the trustworthiness of our results. Finally, as we demonstrate in Figure S5, our sample size is more than sufficient to reliably detect small effects.
  
  (3) I don't understand the high-frequency signals in fMRI data. It's always regarded as noise, the band 1 here in particular.
  
  In addition to driving shared neuronal responses (which are captured in BOLD signal oscillations <.1 Hz or so), movies also elicit shared cardiac, respiratory, and motion responses across participants at higher frequencies. Although we used a relatively conservative denoising approach here, we believe some of these non-neuronal signals are still present in our data; alternatively, it is also possible that these signals reflect “fast” BOLD responses at >.15 Hz (as discussed in 10.1016/j.neuroimage.2021.118658). In any case, the fact that information in this frequency band is considerably less heritable than information in slower frequency bands supports the idea that this band is noisier and suggests that our heritability results are driven by canonical neuronal activity-related BOLD signals.
  
  (4) The statement "we show that the heritability of brain activity patterns can be partially explained by the heritability of the neural timescale" should come from Figure 5. However, after controlling for NT, the heritability decreased max. 0.025 in temporal areas. I am not sure this change supports the statement. If the visual cortex is outlined, and combining ISC changes in the visual cortex, I think this would somehow be answered. Instead of delta h2, adding a new model h2 would be obvious to the readers.
  
  Although the decrease of 0.025 is small, we note that this constitutes around ~50% of BOLD time course heritability in some voxels (seen in comparison to Fig. 4C), and the spatial pattern of this result is quite consistent across days of data collection, indicating its reliability. Furthermore, the whole-brain distributions of results shown in Fig. 5B are clearly skewed towards negative values, indicating that controlling for NT partially reduces (or “explains”) BOLD time course heritability. Still, we agree that showing raw h<sup>2</sup> values in addition to the difference maps would be helpful for some readers and have added a corresponding supplementary figure (S12) which shows these.
  
  (5) Figures 7 and 8, when getting the difference of heritability, please also consider the standard errors of the heritability estimates. Then you can compare across networks/regions.
  
  We did consider adding standard errors for these heritability estimates, but found that visualizing standard errors for each of the 153 unique network combinations in our heatmaps rendered the visualizations difficult to parse, and given that our hypotheses concerned global (e.g., hyperaligned vs. MSM-aligned) or network-level (e.g., sensory vs. associative) patterns, we focused on calculating standard errors/p-values for these analyses (although we note that dyad-level standard errors can be found in Fig. 6B, where they are clearly marginal compared to the group effects).
  
  (6) I think movie VS resting state is a really important result in this paper. However, there is almost no discussion. Discussing this part would be more beneficial for understanding the genetic control over the neuron arousal and excitation circuits.
  
  We agree that this result was relatively under-explored in our Discussion section and have added additional text (lines 851-855) to connect this result to recent work on arousal-dependent uniqueness of FC.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Do the authors have any ideas why we see this hotspot of heritability in pMTG/LOTC? It really jumps out in Figure 1A and Figure 2. The more posterior sensory MT+ area seems to drop when regressing out ISC in Figure 2B, but this pMTG area stays hot. Is there anything special about this kind of multimodal biological motion/action observation / social perception area (Pitcher & Ungerleider, 2021)? I don't think this is necessary to discuss in the manuscript, but I'm curious if the authors have any speculation.
  
  We are not certain as to why BOLD time courses in this parcel are particularly heritable- although this area is associated with biological motion, that particular function tends to be more right lateralized, and here we see nominally higher heritability in the left hemisphere. Per a Neurosynth review (and consistent with the left lateralization), we believe this may have more to do with speech processing, but a more definitive answer will require further investigation.
  
  (2) Page 3, line 127: "More information on these clips"-it might be worth saying a little bit more here just to make sure people understand that these are audiovisual clips, they include language, they're long enough to convey meaningful social and narrative information, etc.
  
  We agree and have added additional details on the clip composition to the relevant methods paragraph.
  
  (3) Figure 1 caption: can you add a sentence reminding readers what's going on with Day 1 and Day 2?
  
  We thank R1 for this suggestion and have added a sentence to this effect at this location.
  
  (4) Page 9, line 379: "although these more associative parcels do not encode a substantial amount of stimulus-specific information"-is this really true? I suspect these association areas still have decent ISCs, even if there are many processing stages downstream of the raw stimulus.
  
  Although these parcels are not the most synchronized by the stimulus, we agree that it is unfair (and vague) to say that they do not encode a substantial amount of stimulus-specific information. We have edited this sentence to make a more specific claim and highlight the relatively lower ISC in these parcels vs. more unimodal sensory areas.
  
  (5) Page 9, line 417: Can you unpack a bit more what you mean by "supra-BOLD frequency band"?
  
  Here, we refer to the fact that BOLD signals resulting from neuronal firing events have frequencies below ~.15 Hz (Josephs and Henson, 1999). We have added additional text and the Josephs and Henson citation to this line to further unpack this point.
  
  (6) Page 18, line 695: This discussion of how attention and gaze might partly shape response time series reminded me of recent work by Borovska & de Haas (2024)-might be worth citing.
  
  We are grateful to R1 for alerting us to this very relevant work and have included a reference to it in our discussion.
  
  (7) Page 19, line 755: I'm not sure I'd describe the hyperalignment results here as a "deleterious effects [on] heritability"-my reading was that hyperalignment allows you to say something more specific about heritability of function by allowing you to effectively factor out heritability effects that reduce to individual differences cortical topography; this seems like a good thing!
  
  We agree that “deleterious” was a poor word choice given its negative connotation, and have edited this sentence to read:
  
  “With this in mind, future studies investigating genetic correlations between brain function and behavioral variables may benefit from hyperalignment, as it can factor out individual-specific cortical topography and thus yield more precise estimates of functional heritability.”
  
  (8) I would love to see a ventral view in some of these plots! Not asking you to recreate the figures, but the ventral temporal cortex is an area of interest for many folks in the movie fMRI space (e.g., Haxby et al., 2011).
  
  We agree that ventral views would be of interest to some readers and have added the corresponding maps for our main results in supplementary figures S3 and S9.
  
  References:
  
  Borovska, P., & de Haas, B. (2024). Individual gaze shapes diverging neural representations. Proceedings of the National Academy of Sciences, 121(36), e2405602121. https://doi.org/10.1073/pnas.2405602121
  
  Haxby, J. V., Guntupalli, J. S., Connolly, A. C., Halchenko, Y. O., Conroy, B. R., Gobbini, M. I., Hanke, M., & Ramadge, P. J. (2011). A common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron, 72(2), 404416. https://doi.org/10.1016/j.neuron.2011.08.026
  
  Pitcher, D., & Ungerleider, L. G. (2021). Evidence for a third visual pathway specialized for social perception. Trends in Cognitive Sciences, 25(2), 100-110. https://doi.org/10.1016/j.tics.2020.11.006
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) To address the common core analytical problems listed under A), B), C), D), E), and basically throughout the methods:
  
  (a) Conduct permutations with exchangability restrictions to account for the pattern of dyad-relationships as e.g. implemented in PALM.
  
  (b) Control for age and sex covariates as covariates (e.g. as in SOLAR), rather than by matching.
  
  (c) Perform r-to-z transforms when conducting further analyses on correlations that assume normality.
  
  (d) For all analyses that assume normal distributions, e.g. in SOLAR and BrainSMASH, check that this is the case.
  
  We have explained how PALM is not suited for the study of effects that are defined at the dyad level (A), that we controlled for age and sex covariates in all our formal heritability analyses in our original submission (B), that we always performed r-to-z transforms when indicated in our original submission (C), and that our spatial permutation results don’t hinge on distributional differences (D).
  
  (2) Replace SEs derived from kacknife approach with those from SOLAR, or provide a comparison and motivation and/or demonstrate that SEs are correct.
  
  A more thorough explanation of the block jackknife procedure can be found in prior work introducing the multidimensional heritability method used here (Anderson et al., 2021).
  
  (3) Given problem (F & G):
  
  (a) Consider studying the parameters that drive the hyperalignment. They can be included as covariates in heritability analyses, and/or their heritability is of interest to understand the reasons for the heritability reduction post-hyperaligment.
  
  We agree that this would be interesting but the specific parameters that drive hyperalignment are beyond the scope of this study.
  
  (b) Include the alternative explanation of hyperalignment-induced noise in the discussion.
  
  We have added a figure showing that hyperalignment does not increase noise in ISC and explained here why “hyperalignment-induced noise” does not constitute a reasonable alternative explanation for our results.
  
  (4) Add heritability results for NT phenotypes.
  
  We have added heritability analyses for NT topography and (global) NT magnitude, as detailed above.
  
  (5) Motivate global signal removal, and acknowledge this process typically alters results substantially.
  
  We have added an explanation of our rationale for using GSR and shown in this response that it does not in fact substantially alter the results.
  
  (6) Rephrase and/or clarify the following:
  
  (a) "permutations quantify average differences" (under A).
  
  (b) "network combinations" and related analyses (under B & C).
  
  (c) why some analyses are separated per visit/day and others not (C).
  
  (d) methods and reasons for sample size estimation (C).
  
  We have rephrased or clarified all of the above.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Participants should be recleared. I know HCP 7T data has 184 subjects. How can the authors have 176 twins and 690 unrelated subjects?
  
  As we reported in our Methods section, 178 subjects had complete movie-watching datasets, and 176 subjects had complete movie-watching and resting-state datasets. Of the 178 subjects with complete movie-watching data, we identified 690 age- and sex-matched dyads.
  
  (2) Figure 1. I don't find Figure S1A in Figure S1.
  
  We thank R3 for catching this error- we have amended this reference to read Fig. S1.
  
  (3) I could also suggest putting Figure 1 and Figure 2 together.
  
  We thank R3 for this suggestion- ultimately, we prefer to keep these figures separate to reinforce the difference between our dyadic similarity and formal heritability analyses.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.09.16.612469v3
www.biorxiv.org www.biorxiv.org

The lysosomal glutamine transporter SLC38A7/SNAT7 promotes HIV-1 production in human macrophages

1
1. EMBOpress 05 Jul 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  This study from the Niedergang lab establishes SNAT7 as a host-dependency factor in human macrophages that supports HIV-1 replication. They show a modest increase in SNAT7 levels HIV-1 infected macrophages and suggest that SNAT7 levels are transiently increased. Employing siRNA against SNAT7 they show reduction in HIV-1 protein levels and viral RNAs and claim that there is a block of reverse transcription in SNAT7 KD cells. Focusing on a known HIV-1 restriction factor in macrophages, SAMHD1, they interconnect the SNAT7 depletion with a reduction in phosphorylated, i.e. catalytical inactive SAMHD1 arguing that SNAT7 regulates the phosphorylation and thereby antiviral activity of SAMHD1. Since SNAT7 is a glutamine transporter that provides this AA from lysosomes, they lastly supplement glutamine and this somehow rescues the reduction of HIV-1 production in SNAT7 KD cells.
  
  Major comments:
  
  The strength of this manuscript is the clear focus on primary human macrophages that are HIV-1 infected and the interconnection of HIV-1 replication to the SNAT7 siRNA KD experiments in combination with SAMHD1 depletion and lastly glutamine supplementation. This establishes a stringent and coherent story line. The effects reported are modest; high variability is not a problem since using primary hMDM this is expected and can be addressed by testing several donors and applying stringent statistics.
  
  Having said so, I realize that while they give information on the statistical test used, i.e. one-way ANOVA they miss to explain the post-test used to assess significance (i.e. Bonferroni, Fishers LSD, whatsoever). Please add this information.
  
  We thank the reviewer for this comment. The figure legends have been updated to include more details of all the statistical tests used.
  
  Another issue that might underestimate the effects of HIV-1 infection on SNAT7 levels and vice versa of SNAT7 KD on HIV-1 replication is the non-single cell approach employed, i.e. WBlots. I assume that HIV-1 infection rates in macrophages are not super high, usually not exceeding 20-30%. So indeed the effects the authors observe could be much higher, when checking at the single cell level. I do not know about the SNAT7 ab, but all the other reagents should work via flow cytometry and could hence improve the readout a lot.
  
  We agree with the reviewer and indeed, in previous studies on HIV-1 infection of human macrophages performed in the lab, we observed via immunofluorescence that the proportion of infected cells ranged from 20 to 40 %. At the time of submission, we did not have the possibility to label the native SNAT7 protein by immunofluorescence, as the commercial antibody used only works for western blotting.
  
  In the meantime, we have been validating a new antibody (Proteintech) targeting SNAT7 for immunofluorescence. If this is confirmed, we will be able to detect and quantify HIV-1 p24 by immunofluorescence in SNAT7-depleted human macrophages and control cells, thus confirming our results in single-cell analysis.
  
  Flow cytometry analyses are difficult to perform on primary human macrophages because these cells are highly adherent and must be detached first. The process induces significant cell death and damage. This is why we would prefer to carry out these analyses using immunofluorescence and microscopy on adhered cells. This option will be undoubtedly pursued.
  
  Furthermore the authors never commented about a dose-response effect in terms of HIV-1 infection levels. There is a MOI dependency described for Suppl.Fig.1 C-F, unfortunately the data is missing in the manuscript.
  
  We apologize for this omission. The figures showing the increase in SNAT7 protein expression following HIV-1 infection at MOIs ranging from 0.05 to 0.5 were added to the new version of the manuscript (Supp. Fig. 1 C-F).
  
  Figure1: specify circulating T lymphocytes. I would expect to see levels of SNAT7 in PHA or CD3/CD28 activated lymphocytes versus resting T cells and a time course of SNAT7 levels upon activation. I think even though SNAT7 levels in T cells might be low, they could also be increased by HIV-1 infection and it is essential that the authors test for this. If not, the result is a valid negative control. For this they should employ HIV-1 primary strains with a tropism for T cells, or at least lab-adapted HIV-1 NL4-3
  
  We thank the reviewer for this comment. Circulating T lymphocytes isolated from the blood of healthy donors are now referred to resting lymphocytes in the new version of the manuscript, as opposed to activated T lymphocytes stimulated with IL2 and PHA-P for several days (Fig. 1 A-C).
  
  The expression levels of SNAT7, both at the gene and protein levels, are lower in resting or IL2/PHA-P-activated T cells than in macrophages from the same donors. As suggested, we will perform a kinetic of T-cell activation upon HIV-1 infection to investigate how SNAT7 expression varies in these conditions.
  
  Figure 2 again single cell measurements could reveal much more pronounced effects; it is a bit counterintuitive that siRNA #2 is more efficient in SNAT7 KD but has higher levels of HIV-1 replication in terms of Gag levels. I assume when looking at the stats it is always a comparison to the Ctl treated cells (C-G), but this is not entirely clear. Unify labeling as compared to the stats in Fig.2 I (this also applies for all the other figs).
  
  We thank the reviewer for this comment. Fig. 2B indeed shows one of the different donors analyzed. However, protein quantification across six different donors shows that SNAT7 is more depleted with siRNA #2 (Fig. 2C), and that Gag Pr55 protein levels are consequently more reduced, than with siRNA #1 (Fig. 2D).
  
  We use GraphPad Prism software to perform statistical analysis. Depending on the test used, the software automatically plots the comparison bar and displays the p-value above it. We changed the representation of statistics as suggested.
  
  Figure 3: It is a bit odd that they finally conclude on RT as essential step that is reduced in the absence of SNAT7 and then they fail to provide statistical significance for this (Fig.3 panels F and G). One would expect that RT is much more affected given the huge effects on HIV-1 capsid and particle production shown in Fig.2 F, G and I.
  
  The reviewer is right in pointing that we observed a stronger effect during the later stages of the viral cycle, from transcription of viral RNAs (Fig. 2I and Supp. Fig. 2G) to the production of viral particles in the supernatant (Fig. 2D-G), than during the earlier stage of reverse transcription (Fig. 3F, G). Also, it is also possible that we might have missed the peak in ERT/LRT production, which is transient.
  
  It should be noted that SAMHD1 exhibits both dNTPase (Goldstone et al., 2011) and nuclease (Beloglazova et al., 2013) activities. The ability of SAMHD1 to restrict the virus, through dephosphorylation at T592, is mediated by its RNase activity (Ryoo et al., 2014), and not by the dNTPase activity (Welbourn et al., 2013; White et al., 2013).This could explain why SNAT7 exhibit a stronger impact on viral transcription than on reverse transcription.
  
  Figure 4; again single cell flow measurements of SAMHD1, pSAMHD1 and p24 /SNAT7 might help to more clearly discriminate effects that are specifically induced upon infection or happen in virally infected cells. Maybe alternatively IF?
  
  We thank the reviewer for this suggestion. As mentioned under comment #2, flow cytometry analyses are difficult to perform on strongly adherent primary human macrophages.
  
  With regard to immunofluorescence, there is a technical limitation based on the species in which the antibodies are produced. The antibody that targets the native SNAT7 protein, which is currently being validated in our laboratory, is produced in rabbits. An anti-CAp24 antibody produced in goats can be used. It will then be necessary to co-label the cells with anti SAMHD1 and phospho-SAMHD1produced in mouse. We will try to find options to co-label the cells.
  
  The wblot shown in panel D does not really reflect the point the authors want to make by the quantification in panels G-I. Primary data (D) suggests that SNAT7 KD reduces HIV-1 production even in the absence of SAMHD1. The quantification rather indicates that SNAT7 KD does not affect HIV-1 production in the absence of SAMHD1. This needs clarification/corroboration by orthogonal approaches.
  
  We respectfully disagree with the reviewer.
  
  Figure 4D shows a representative blot of the six donors analysed. As mentioned, the depletion of SNAT7 in the absence of SAMHD1 reduces the production of the viral proteins GagPr55 and CAp24 (see Fig. 4D). This is illustrated by the quantifications (Fig. 4G–I). Following treatment with Vpx, GagPr55 protein expression in SNAT7 KD macrophages is reduced by a factor of 2.6 for siRNA #1 (mean = 1.48, light grey bar) and by a factor of 1.83 for siRNA #2 (mean = 2.13, orange bar), compared to the control (mean = 3.9, pink bar) (Fig. 4G). Similarly, CAp24 protein expression was reduced by a factor of 2.2 for siRNA #1 (mean = 2.05, light grey bar) and by a factor of 1.36 for siRNA #2 (mean = 3.34, orange bar), compared to the control (mean = 4.52, pink bar) (Fig. 4H).
  
  These differences are therefore consistent between the Western blot and the quantifications. However, they are not significantly different to those observed in cells treated with Vpx and depleted with control siRNA, suggesting that the viral restriction observed in SNAT7 KD cells is primarily due to SAMHD1.
  
  Figure 5: show SAMHD1 and pSAMHD1 levels upon glutamine supplementation.
  
  We thank the reviewer for this comment, we will perform the suggested experiment.
  
  I think the discussion is very thin, mainly summarizing the results; but fails to give broader context or critically discuss the limitations and further directions.
  
  We thank the reviewer for this comment. The discussion will be modified further accordingly.
  
  Looking at the data as a whole, I think the results support a modest functional importance of SNAT7 for HIV-1 production in macrophages. I acknowledge that the experiments in primary macrophages are prone to high variability in different donors and the authors transparently depicted their data. However clearly, I would advice the authors to tune down the extend in which they claim SNAT7-dependency given this huge variability and the sometimes-borderline statistics. We respectfully disagree with the reviewer.
  
  The cells used here imply greater variability than a cell line, but are also more relevant.
  
  Indeed, the effects observed in the late stages of HIV-1 production are:
  
  ~80 % decrease in viral transcription compared to the control (Fig. 2I),
  
  ~85 % decrease in CAp24 protein expression compared to the control, as quantified by western blot (Fig. 2E), or ~90 % by ELISA measurement (Fig. 2F),
  
  a reduction of more than 90 % in the release of infectious particles (Fig. 2G).
  
  These results were all significant across donors, while SNAT7 depletion was always partial (Fig. 2C, between 31 to 62 % of depletion compared to the control in infected cells).
  
  Therefore, the data were obtained from a mixture of depleted and non-depleted macrophages. This means that the results may be underestimated.
  
  Together, our results show that SNAT7 is necessary for HIV-1 production.
  
  However, reading the comments, we realized that our conclusions regarding reverse transcription were too strong. SNAT7 depletion does not affect viral fusion and reverse transcription. The manuscript was modified accordingly.
  
  On top, there are a lot of optional experiments I am sure the authors are aware of that should be done at least in the future.
  
  For instance, how does HIV-1 upregulate SNAT7, is a viral accessory protein involved? What is the mechanism of SNAT7 dependent SAMHD1 phosphorylation? Does SNAT7 (or glutamine) regulate the activity of the SAMHD1 associated kinase / phosphatase) If so, does this impact on other targets of these enzymes? We thank the reviewer for these questions.
  
  To address the role of accessory viral proteins, we have already performed one experiment infecting hMDM with HIV-1 strains deleted for genes such as Nef, Vpr, Vpu and Vif, and have found no clear effect on SNAT7 protein expression compared to WT strains. As an alternative experiment, we could overexpress individual viral genes, such as Nef or Vpr, in HeLa cells and analyze their impact on SNAT7 expression by Western blot.
  
  It is also possible that SNAT7 expression and recycling of lysosomal glutamine are modulated by the macrophage intrinsic immunity in response to HIV-1 infection.
  
  The Thr592 motif of the SAMHD1 protein is phosphorylated by Cyclin A2/CDK1 and type 1 IFN in non-cycling cells, such as MDMs (Cribier et al., 2013). For now, the relationship between SNAT7 and SAMHD1 remains unclear. However, (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This has been added to the discussion to explain the relationship between the 3 partners.
  
  **Referees cross-commenting** I think the comments from the other referees are reasonable and consistent with my assessment
  
  Reviewer #1 (Significance (Required)):
  
  Strength and limitations see above;
  
  Significance: I think this work is of high interest for virologists working in the field of HIV-1 and infection of myeloid cells. In case SNAT7 (and hence glutamine) indeed regulates the phosphorylation of SAMHD1, there could potentially be broad relevance of this work. However unfortunately, this aspect remains underdeveloped and is also not discussed
  
  Field of expertise: HIV-1, immunology, cell biology
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  In this report, Herit and colleagues describe the role of a HIV-1 dependency factor that promotes virus replication in macrophages. The authors suggest that the lysosomal membrane-associated SNAT7 glutamine transporter is a HIV dependency factor, that promotes virus replication by enhancing reverse transcription and Gag synthesis. The authors use transient knock-down approaches in primary macrophages to identify that SNAT7 depletion does not impact viral entry but inhibits early reverse transcription which was reversed by exogenous glutamine addition. While reverse transcription enhancement was likely due to selective increase in phosho-SAMHD1 expression, mechanisms by which SNAT7 enhanced viral gene expression were not clearly defined. These are well-controlled studies that pinpoint the role of SNAT7 in the early steps of viral life cycle and highlight the intricate interplay between macrophage metabolism and HIV-1 replication. While the question that is addressed is important, and the hypothesis overall sound, the data presented needs to be strengthened to support the conclusions. There are numerous weaknesses in data interpretation as well.
  
  Figure 1: SNAT7 expression was selectively enhanced upon differentiation of monocytes into macrophages but absent in CD4+ T cells. Though there is a claim of enhancement of SNAT7 expression upon HIV-1 infection of macrophages, RT-qPCR analysis shows the opposite trend (Fig 1E) and SNAT7 protein expression changes are modest. Statistical analysis in Fig. 1H needs to be revisited. The number of replicates vary for the lysates harvested at different day post infection, which might have an impact on the statistical test. To determine if SNAT7 expression enhancement is dependent on establishment of virus infection, as the authors imply, control lysates of virus infections in presence of replication inhibitors should be included.
  
  We thank the reviewer for this comment. Indeed, there is a modest, but statistically significant increase in SNAT7 protein expression upon HIV-1 infection over time (Fig. 1G, H), without any modulation of SNAT7 gene expression (Fig. 1E). This indicates that the regulation of SNAT7 expression in this context is only at the translation level (i.e. increase of translation or stabilization of the SNAT7 protein).
  
  As mentioned, Fig. 1H aggregates between 3 to 7 independent experiments on different donors depending on the infection time point. SNAT7 protein expression is increased already at 1 day post-infection and until 8 days. The statistical test used here, i.e. 2 way-ANOVA, compared Mock-infected and HIV-1-infected condition for each time point with the same number of donors. In this figure, the comparison is statistically different only at day 6 of the time course (7 donors). We agree that increasing the number of donors of the other time points could help to improve the statistical difference between control and infection condition.
  
  We thank the reviewer for the suggestion mentioning the use of replication inhibitors in this experiment. We plan to use inhibitors of reverse transcription (Nevirapin) and integration (Dolutegravir).
  
  The authors rely exclusively on western blot analysis for HIV-1 Gag expression in cell lysates as a measure of effects of SNAT7 on virus replication. Single cell analysis such as intracellular p24gag analysis by FACS should be included; this will provide a better measure of effects of SNAT7 onHIV-1 infection establishment.
  
  We respectfully disagree with the reviewer for this question. Indeed, to evaluate the effects of SNAT7 on HIV-1 replication, we measured Gag Pr55 and Cap24 using a Western blot approach (Fig. 2B, D and E), but also assessed the quantity of Cap24 in the supernatants and lysates using an ELISA measurement, the quantity of infectious particles using TZM reporter cells, and total viral transcription or more specifically Gag Pr55 transcription using qPCR (Fig. 2F, G and I and Supp. Fig. 2G).
  
  Regarding the quantification of CAp24 at the cell single level, please refer to comment #2 under Reviewer #1.
  
  Knockdown of SNAT7 in MDMs was partial at best; only 30-50% decrease in expression (Fig 2C), but the effects on viral gene expression (Fig. 2I), p24 release and infectious particle production is dramatic (Fig. 2F and G). This discrepancy is not addressed. Does SNAT7 knock-down negatively impact virus particle release? Please note that the representative WB in Fig 2B does not correlate with the quantification in Fig. 2D. There are no p55gag or p24gag bands in SNAT7#1 siRNA condition (Fig. 2B)? Data could also be rearranged to follow the logical sequence of virus replication cycle (viral RNa expression followed by Gag expression, and then release).
  
  We thank the reviewer for this comment. Our samples are indeed a mixture of SNAT7-depleted and non-depleted macrophages and RNA interference in these cells often leads to a decrease of 50 % of the protein expression.
  
  To determine whether SNAT7 is involved in the release of particles, we quantified Cap24 in cell lysates and in the cell culture medium separately, and normalized the results to the total protein content. The absence of SNAT7 reduced the amount of Cap24 measured by ELISA in both samples to the same extent, showing that there is no storage of Cap24-positive viral particles inside the infected macrophages. These data were initially pooled in one graph (Fig. 2F), but separate graphs are now provided in new Supp. Fig. 2 E, F.
  
  Regarding the western blot shown in Fig. 2B, please refer to comment #5 under Reviewer #1.
  
  In the new version of the manuscript, we arranged the figures and placed the later stages of the viral cycle in Fig. 2 and the earlier stages, such as fusion, reverse transcription and transcription, in Fig. 3.
  
  Data interpretation would be greatly improved by including infection controls (RT or integrase inhibitors) to confirm that measurements of viral RNA and Gag are indeed modulated by SNAT7 expression.
  
  We thank the reviewer for this suggestion to include inhibitors of viral replication as controls. In our experiments, cells were Mock-infected in parallel as a negative control of viral detection. We provide the results in the new version of the manuscript to show that (i) there is no detection of viral or Gag RNA in the absence of the virus, (ii) the expression of viral genes measured in HIV-1-infected SNAT7-depleted cells is not different from Mock-infected cells, indicating almost complete inhibition of viral transcription (Fig. 3H and Supp. Fig. 3B), also confirmed at the protein level (Fig. 2B, D-F).
  
  Figure 3: Decrease in SNAT7 expression in macrophages resulted in lower levels of early reverse transcripts. But surprisingly, LRT levels were not as affected by decreases in SNAT7 expression. The authors go on to suggest that decreases in early RT are due to loss of phospho-SAMHD1 and increases in catalytically active form of SAMHD1. Mechanistically this does not make sense: LRT should be similarly affected by increase in catalytically active SAMHD1. dNTP concentrations should be measured to determine if the rescue of RT is dependent on SAMHD1 dNTPase activity.
  
  We thank the reviewer for this comment. LRT concentrations are very low in human macrophages and more challenging to detect than ERT concentrations. This might explain why the differences observed between the SNAT7-depleted and control conditions appear less pronounced for LRT than for ERT.
  
  Furthermore, we cannot rule out the possibility that SNAT7 has a cumulative effect throughout the viral cycle. While reverse transcription remains statistically unaltered, and despite the reduced levels of ERT and LRT in SNAT7-depleted macrophages (Fig. 3 F, G), there is a significant impact on the transcription of viral RNAs (Fig. 2I) and Gag (Supp. Fig. 2G). This step may also be altered by the ribonuclease activity of SAMHD1 (Beloglazova et al., 2013; Ryoo et al., 2014).
  
  Finally, with the help of Dr Baek Kim in Atlanta, we attempted to quantify dNTP concentrations in our human macrophages. Unfortunately, it was not possible to draw any conclusions, as the concentrations of dNTPs extracted from our cells were far too low.
  
  Furthermore, it should be noted that SAMHD1 viral restriction through its phosphorylation at T592 is not correlated with its dNTPase activity (Welbourn et al., 2013; White et al., 2013), but with its ribonuclease activity (Beloglazova et al., 2013; Ryoo et al., 2014). This is supporting why SNAT7, by modulating the ribonuclease activity of SAMHD1, could have a greater effect on viral transcription than on reverse transcription.
  
  There is lack of consistency in the data: p24 release upon SNAT7 depletion is highly variable. While there is a dramatic >90-95% decrease in p24 release (Fig. 2G), the effects are much more moderate in Fig. 4H (50-60% attenuation), even though siRNA-mediated depletion was similar across the data sets. The authors should comment on the variability in their findings.
  
  We thank the reviewer for this comment, but believe that Figure 2E rather than Figure 2G is to be mentioned regarding the quantification of CAp24 by Western blot and to be compared with Figure 4H.
  
  In Fig. 2E, we observed an average reduction of 85 % in CAp24 expression normalized to Clathrin HC expression across different donors for both siRNAs targeting SNAT7. For Fig. 4H, there was a 73 % reduction in CAp24 levels for siRNA #1 and a 56 % reduction for siRNA #2. In addition, it should be noted that the reduction in Gag levels is greater in Fig. 4G (between 77 % and 83 %) than in Fig. 2D (between 55 % and 72 %).
  
  Therefore, there is some variation in the results obtained with the different donors, which could be explained by variations in Gag cleavage among donors, but this does not impact the conclusions for both figures.
  
  SNAT7 is postulated to affect 2 steps in the virus life cycle: reverse transcription and viral transcription. But Vpx-mediated SAMHD1 degradation reversed both. Its not clear to me as to how SAMHD1 degradation impacts the role of SNAT7 in viral transcription. No explanation is provided.
  
  We thank the reviewer for this comment. As suggested, we will perform experiments to assess the impact of Vpx-mediated SAMHD1 degradation on viral transcription.
  
  Exogenous addition of glutamine only partially restored Gag synthesis and p24 release, which could be attributed to increased cytoplasmic levels and viral protein synthesis. What about effects on reverse transcription and viral gene expression?
  
  We thank the reviewer for this comment. We will perform the suggested experiments to assess the impact of glutamine supplementation on viral transcription.
  
  Reviewer #2 (Significance (Required)):
  
  This is a novel finding, as there are limited number of studies on amino acid transporters and HIV-1 replication enhancement in macrophages. Most of the previous work has focused on CD4 T cells. These studies on SNAT7 and HIV-1 infection establishment in macrophages might better inform the influences of macrophage metabolism on HIV-1 persistence and inflammatory responses.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  This study investigates the role of the lysosomal glutamine transporter SLC38A7/SNAT7 in HIV‑1 replication in primary human macrophages. The authors demonstrate that SNAT7 is highly expressed in macrophages and upregulated upon HIV‑1 infection. They show that SNAT7 depletion inhibits HIV‑1 production at the reverse transcription step without affecting viral fusion or global cellular translation/transcription. Mechanistically, SNAT7 knockdown reduces the inhibitory phosphorylation of SAMHD1 at T592, and degradation of SAMHD1 by Vpx fully rescues viral replication. Extracellular glutamine supplementation partially restores HIV‑1 production in SNAT7‑deficient cells. Overall, the authors report interesting observations; however, the mechanistic investigation remains preliminary, raising concerns about whether the data fully support all the conclusions drawn. Major Concerns： 1. The mechanistic depth is insufficient. The authors do not elucidate how glutamine regulates SAMHD1 T592 phosphorylation, whether through metabolite‑mediated control of kinases/phosphatases or via indirect effects.
  
  We thank the reviewer for this comment. It is worth noting that (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity using drugs decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This is now further discussed in the discussion section of the manuscript.
  
  The authors do not measure intracellular dNTP levels upon SNAT7 knockdown, which is the key functional substrate of SAMHD1. They also do not directly demonstrate that glutamine supplementation restores dNTP pools.
  
  We thank the reviewer for this comment. Please, refer to comment #5 under Reviewer #2.
  
  Extracellular glutamine only partially rescues viral production, implying the existence of transport‑independent functions of SNAT7 or additional pathways. This important observation is not discussed.
  
  We thank the reviewer for this comment. The discussion has been modified accordingly.
  
  It is suggested that the key findings be validated in immortalized THP‑1 cells differentiated into macrophage‑like cells by PMA.
  
  We thank the reviewer for this suggestion but don’t really understand why this would strengthen our conclusions. Indeed, despite the known variability between donors and technical limitations to transduce cells, we chose human blood monocyte-derived macrophages as a relevant non-transformed model for HIV-1 infection of macrophages. They also represent to some extent the human diversity.
  
  The Discussion section should be expanded to include the potential translational implications and limitations of the present study.
  
  We thank the reviewer for this comment. The discussion points to some elements of potential translation and limitations of the study.
  
  Reviewer #3 (Significance (Required)):
  
  General assessment: This study identifies the lysosomal glutamine transporter SLC38A7/SNAT7 as a novel host dependency factor for HIV‑1 replication in primary human macrophages. The major strengths include the use of physiologically relevant primary macrophage models, a well-organized experimental pipeline from expression profiling to functional validation, and the establishment of a link between SNAT7, glutamine metabolism, and the HIV restriction factor SAMHD1.
  
  Advance: It extends current understanding of HIV‑1 host dependency factors and immunometabolism by revealing a compartment‑specific metabolic pathway that supports viral reverse transcription.
  
  Audience:This work will primarily interest specialized researchers in HIV‑1 biology, host-virus interactions, restriction factors, and antiviral innate immunity.
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  This study from the Niedergang lab establishes SNAT7 as a host-dependency factor in human macrophages that supports HIV-1 replication. They show a modest increase in SNAT7 levels HIV-1 infected macrophages and suggest that SNAT7 levels are transiently increased. Employing siRNA against SNAT7 they show reduction in HIV-1 protein levels and viral RNAs and claim that there is a block of reverse transcription in SNAT7 KD cells. Focusing on a known HIV-1 restriction factor in macrophages, SAMHD1, they interconnect the SNAT7 depletion with a reduction in phosphorylated, i.e. catalytical inactive SAMHD1 arguing that SNAT7 regulates the phosphorylation and thereby antiviral activity of SAMHD1. Since SNAT7 is a glutamine transporter that provides this AA from lysosomes, they lastly supplement glutamine and this somehow rescues the reduction of HIV-1 production in SNAT7 KD cells.
  
  Major comments:
  
  The strength of this manuscript is the clear focus on primary human macrophages that are HIV-1 infected and the interconnection of HIV-1 replication to the SNAT7 siRNA KD experiments in combination with SAMHD1 depletion and lastly glutamine supplementation. This establishes a stringent and coherent story line. The effects reported are modest; high variability is not a problem since using primary hMDM this is expected and can be addressed by testing several donors and applying stringent statistics.
  
  Having said so, I realize that while they give information on the statistical test used, i.e. one-way ANOVA they miss to explain the post-test used to assess significance (i.e. Bonferroni, Fishers LSD, whatsoever). Please add this information.
  
  We thank the reviewer for this comment. The figure legends have been updated to include more details of all the statistical tests used.
  
  Another issue that might underestimate the effects of HIV-1 infection on SNAT7 levels and vice versa of SNAT7 KD on HIV-1 replication is the non-single cell approach employed, i.e. WBlots. I assume that HIV-1 infection rates in macrophages are not super high, usually not exceeding 20-30%. So indeed the effects the authors observe could be much higher, when checking at the single cell level. I do not know about the SNAT7 ab, but all the other reagents should work via flow cytometry and could hence improve the readout a lot.
  
  We agree with the reviewer and indeed, in previous studies on HIV-1 infection of human macrophages performed in the lab, we observed via immunofluorescence that the proportion of infected cells ranged from 20 to 40 %. At the time of submission, we did not have the possibility to label the native SNAT7 protein by immunofluorescence, as the commercial antibody used only works for western blotting.
  
  In the meantime, we have been validating a new antibody (Proteintech) targeting SNAT7 for immunofluorescence. If this is confirmed, we will be able to detect and quantify HIV-1 p24 by immunofluorescence in SNAT7-depleted human macrophages and control cells, thus confirming our results in single-cell analysis.
  
  Flow cytometry analyses are difficult to perform on primary human macrophages because these cells are highly adherent and must be detached first. The process induces significant cell death and damage. This is why we would prefer to carry out these analyses using immunofluorescence and microscopy on adhered cells. This option will be undoubtedly pursued.
  
  Furthermore the authors never commented about a dose-response effect in terms of HIV-1 infection levels. There is a MOI dependency described for Suppl.Fig.1 C-F, unfortunately the data is missing in the manuscript.
  
  We apologize for this omission. The figures showing the increase in SNAT7 protein expression following HIV-1 infection at MOIs ranging from 0.05 to 0.5 were added to the new version of the manuscript (Supp. Fig. 1 C-F).
  
  Figure1: specify circulating T lymphocytes. I would expect to see levels of SNAT7 in PHA or CD3/CD28 activated lymphocytes versus resting T cells and a time course of SNAT7 levels upon activation. I think even though SNAT7 levels in T cells might be low, they could also be increased by HIV-1 infection and it is essential that the authors test for this. If not, the result is a valid negative control. For this they should employ HIV-1 primary strains with a tropism for T cells, or at least lab-adapted HIV-1 NL4-3
  
  We thank the reviewer for this comment. Circulating T lymphocytes isolated from the blood of healthy donors are now referred to resting lymphocytes in the new version of the manuscript, as opposed to activated T lymphocytes stimulated with IL2 and PHA-P for several days (Fig. 1 A-C).
  
  The expression levels of SNAT7, both at the gene and protein levels, are lower in resting or IL2/PHA-P-activated T cells than in macrophages from the same donors. As suggested, we will perform a kinetic of T-cell activation upon HIV-1 infection to investigate how SNAT7 expression varies in these conditions.
  
  Figure 2 again single cell measurements could reveal much more pronounced effects; it is a bit counterintuitive that siRNA #2 is more efficient in SNAT7 KD but has higher levels of HIV-1 replication in terms of Gag levels. I assume when looking at the stats it is always a comparison to the Ctl treated cells (C-G), but this is not entirely clear. Unify labeling as compared to the stats in Fig.2 I (this also applies for all the other figs).
  
  We thank the reviewer for this comment. Fig. 2B indeed shows one of the different donors analyzed. However, protein quantification across six different donors shows that SNAT7 is more depleted with siRNA #2 (Fig. 2C), and that Gag Pr55 protein levels are consequently more reduced, than with siRNA #1 (Fig. 2D).
  
  We use GraphPad Prism software to perform statistical analysis. Depending on the test used, the software automatically plots the comparison bar and displays the p-value above it. We changed the representation of statistics as suggested.
  
  Figure 3: It is a bit odd that they finally conclude on RT as essential step that is reduced in the absence of SNAT7 and then they fail to provide statistical significance for this (Fig.3 panels F and G). One would expect that RT is much more affected given the huge effects on HIV-1 capsid and particle production shown in Fig.2 F, G and I.
  
  The reviewer is right in pointing that we observed a stronger effect during the later stages of the viral cycle, from transcription of viral RNAs (Fig. 2I and Supp. Fig. 2G) to the production of viral particles in the supernatant (Fig. 2D-G), than during the earlier stage of reverse transcription (Fig. 3F, G). Also, it is also possible that we might have missed the peak in ERT/LRT production, which is transient.
  
  It should be noted that SAMHD1 exhibits both dNTPase (Goldstone et al., 2011) and nuclease (Beloglazova et al., 2013) activities. The ability of SAMHD1 to restrict the virus, through dephosphorylation at T592, is mediated by its RNase activity (Ryoo et al., 2014), and not by the dNTPase activity (Welbourn et al., 2013; White et al., 2013).This could explain why SNAT7 exhibit a stronger impact on viral transcription than on reverse transcription.
  
  Figure 4; again single cell flow measurements of SAMHD1, pSAMHD1 and p24 /SNAT7 might help to more clearly discriminate effects that are specifically induced upon infection or happen in virally infected cells. Maybe alternatively IF?
  
  We thank the reviewer for this suggestion. As mentioned under comment #2, flow cytometry analyses are difficult to perform on strongly adherent primary human macrophages.
  
  With regard to immunofluorescence, there is a technical limitation based on the species in which the antibodies are produced. The antibody that targets the native SNAT7 protein, which is currently being validated in our laboratory, is produced in rabbits. An anti-CAp24 antibody produced in goats can be used. It will then be necessary to co-label the cells with anti SAMHD1 and phospho-SAMHD1produced in mouse. We will try to find options to co-label the cells.
  
  The wblot shown in panel D does not really reflect the point the authors want to make by the quantification in panels G-I. Primary data (D) suggests that SNAT7 KD reduces HIV-1 production even in the absence of SAMHD1. The quantification rather indicates that SNAT7 KD does not affect HIV-1 production in the absence of SAMHD1. This needs clarification/corroboration by orthogonal approaches.
  
  We respectfully disagree with the reviewer.
  
  Figure 4D shows a representative blot of the six donors analysed. As mentioned, the depletion of SNAT7 in the absence of SAMHD1 reduces the production of the viral proteins GagPr55 and CAp24 (see Fig. 4D). This is illustrated by the quantifications (Fig. 4G–I). Following treatment with Vpx, GagPr55 protein expression in SNAT7 KD macrophages is reduced by a factor of 2.6 for siRNA #1 (mean = 1.48, light grey bar) and by a factor of 1.83 for siRNA #2 (mean = 2.13, orange bar), compared to the control (mean = 3.9, pink bar) (Fig. 4G). Similarly, CAp24 protein expression was reduced by a factor of 2.2 for siRNA #1 (mean = 2.05, light grey bar) and by a factor of 1.36 for siRNA #2 (mean = 3.34, orange bar), compared to the control (mean = 4.52, pink bar) (Fig. 4H).
  
  These differences are therefore consistent between the Western blot and the quantifications. However, they are not significantly different to those observed in cells treated with Vpx and depleted with control siRNA, suggesting that the viral restriction observed in SNAT7 KD cells is primarily due to SAMHD1.
  
  Figure 5: show SAMHD1 and pSAMHD1 levels upon glutamine supplementation.
  
  We thank the reviewer for this comment, we will perform the suggested experiment.
  
  I think the discussion is very thin, mainly summarizing the results; but fails to give broader context or critically discuss the limitations and further directions.
  
  We thank the reviewer for this comment. The discussion will be modified further accordingly.
  
  Looking at the data as a whole, I think the results support a modest functional importance of SNAT7 for HIV-1 production in macrophages. I acknowledge that the experiments in primary macrophages are prone to high variability in different donors and the authors transparently depicted their data. However clearly, I would advice the authors to tune down the extend in which they claim SNAT7-dependency given this huge variability and the sometimes-borderline statistics. We respectfully disagree with the reviewer.
  
  The cells used here imply greater variability than a cell line, but are also more relevant.
  
  Indeed, the effects observed in the late stages of HIV-1 production are:
  
  ~80 % decrease in viral transcription compared to the control (Fig. 2I),
  
  ~85 % decrease in CAp24 protein expression compared to the control, as quantified by western blot (Fig. 2E), or ~90 % by ELISA measurement (Fig. 2F),
  
  a reduction of more than 90 % in the release of infectious particles (Fig. 2G).
  
  These results were all significant across donors, while SNAT7 depletion was always partial (Fig. 2C, between 31 to 62 % of depletion compared to the control in infected cells).
  
  Therefore, the data were obtained from a mixture of depleted and non-depleted macrophages. This means that the results may be underestimated.
  
  Together, our results show that SNAT7 is necessary for HIV-1 production.
  
  However, reading the comments, we realized that our conclusions regarding reverse transcription were too strong. SNAT7 depletion does not affect viral fusion and reverse transcription. The manuscript was modified accordingly.
  
  On top, there are a lot of optional experiments I am sure the authors are aware of that should be done at least in the future.
  
  For instance, how does HIV-1 upregulate SNAT7, is a viral accessory protein involved? What is the mechanism of SNAT7 dependent SAMHD1 phosphorylation? Does SNAT7 (or glutamine) regulate the activity of the SAMHD1 associated kinase / phosphatase) If so, does this impact on other targets of these enzymes? We thank the reviewer for these questions.
  
  To address the role of accessory viral proteins, we have already performed one experiment infecting hMDM with HIV-1 strains deleted for genes such as Nef, Vpr, Vpu and Vif, and have found no clear effect on SNAT7 protein expression compared to WT strains. As an alternative experiment, we could overexpress individual viral genes, such as Nef or Vpr, in HeLa cells and analyze their impact on SNAT7 expression by Western blot.
  
  It is also possible that SNAT7 expression and recycling of lysosomal glutamine are modulated by the macrophage intrinsic immunity in response to HIV-1 infection.
  
  The Thr592 motif of the SAMHD1 protein is phosphorylated by Cyclin A2/CDK1 and type 1 IFN in non-cycling cells, such as MDMs (Cribier et al., 2013). For now, the relationship between SNAT7 and SAMHD1 remains unclear. However, (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This has been added to the discussion to explain the relationship between the 3 partners.
  
  **Referees cross-commenting** I think the comments from the other referees are reasonable and consistent with my assessment
  
  Reviewer #1 (Significance (Required)):
  
  Strength and limitations see above;
  
  Significance: I think this work is of high interest for virologists working in the field of HIV-1 and infection of myeloid cells. In case SNAT7 (and hence glutamine) indeed regulates the phosphorylation of SAMHD1, there could potentially be broad relevance of this work. However unfortunately, this aspect remains underdeveloped and is also not discussed
  
  Field of expertise: HIV-1, immunology, cell biology
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  In this report, Herit and colleagues describe the role of a HIV-1 dependency factor that promotes virus replication in macrophages. The authors suggest that the lysosomal membrane-associated SNAT7 glutamine transporter is a HIV dependency factor, that promotes virus replication by enhancing reverse transcription and Gag synthesis. The authors use transient knock-down approaches in primary macrophages to identify that SNAT7 depletion does not impact viral entry but inhibits early reverse transcription which was reversed by exogenous glutamine addition. While reverse transcription enhancement was likely due to selective increase in phosho-SAMHD1 expression, mechanisms by which SNAT7 enhanced viral gene expression were not clearly defined. These are well-controlled studies that pinpoint the role of SNAT7 in the early steps of viral life cycle and highlight the intricate interplay between macrophage metabolism and HIV-1 replication. While the question that is addressed is important, and the hypothesis overall sound, the data presented needs to be strengthened to support the conclusions. There are numerous weaknesses in data interpretation as well.
  
  Figure 1: SNAT7 expression was selectively enhanced upon differentiation of monocytes into macrophages but absent in CD4+ T cells. Though there is a claim of enhancement of SNAT7 expression upon HIV-1 infection of macrophages, RT-qPCR analysis shows the opposite trend (Fig 1E) and SNAT7 protein expression changes are modest. Statistical analysis in Fig. 1H needs to be revisited. The number of replicates vary for the lysates harvested at different day post infection, which might have an impact on the statistical test. To determine if SNAT7 expression enhancement is dependent on establishment of virus infection, as the authors imply, control lysates of virus infections in presence of replication inhibitors should be included.
  
  We thank the reviewer for this comment. Indeed, there is a modest, but statistically significant increase in SNAT7 protein expression upon HIV-1 infection over time (Fig. 1G, H), without any modulation of SNAT7 gene expression (Fig. 1E). This indicates that the regulation of SNAT7 expression in this context is only at the translation level (i.e. increase of translation or stabilization of the SNAT7 protein).
  
  As mentioned, Fig. 1H aggregates between 3 to 7 independent experiments on different donors depending on the infection time point. SNAT7 protein expression is increased already at 1 day post-infection and until 8 days. The statistical test used here, i.e. 2 way-ANOVA, compared Mock-infected and HIV-1-infected condition for each time point with the same number of donors. In this figure, the comparison is statistically different only at day 6 of the time course (7 donors). We agree that increasing the number of donors of the other time points could help to improve the statistical difference between control and infection condition.
  
  We thank the reviewer for the suggestion mentioning the use of replication inhibitors in this experiment. We plan to use inhibitors of reverse transcription (Nevirapin) and integration (Dolutegravir).
  
  The authors rely exclusively on western blot analysis for HIV-1 Gag expression in cell lysates as a measure of effects of SNAT7 on virus replication. Single cell analysis such as intracellular p24gag analysis by FACS should be included; this will provide a better measure of effects of SNAT7 onHIV-1 infection establishment.
  
  We respectfully disagree with the reviewer for this question. Indeed, to evaluate the effects of SNAT7 on HIV-1 replication, we measured Gag Pr55 and Cap24 using a Western blot approach (Fig. 2B, D and E), but also assessed the quantity of Cap24 in the supernatants and lysates using an ELISA measurement, the quantity of infectious particles using TZM reporter cells, and total viral transcription or more specifically Gag Pr55 transcription using qPCR (Fig. 2F, G and I and Supp. Fig. 2G).
  
  Regarding the quantification of CAp24 at the cell single level, please refer to comment #2 under Reviewer #1.
  
  Knockdown of SNAT7 in MDMs was partial at best; only 30-50% decrease in expression (Fig 2C), but the effects on viral gene expression (Fig. 2I), p24 release and infectious particle production is dramatic (Fig. 2F and G). This discrepancy is not addressed. Does SNAT7 knock-down negatively impact virus particle release? Please note that the representative WB in Fig 2B does not correlate with the quantification in Fig. 2D. There are no p55gag or p24gag bands in SNAT7#1 siRNA condition (Fig. 2B)? Data could also be rearranged to follow the logical sequence of virus replication cycle (viral RNa expression followed by Gag expression, and then release).
  
  We thank the reviewer for this comment. Our samples are indeed a mixture of SNAT7-depleted and non-depleted macrophages and RNA interference in these cells often leads to a decrease of 50 % of the protein expression.
  
  To determine whether SNAT7 is involved in the release of particles, we quantified Cap24 in cell lysates and in the cell culture medium separately, and normalized the results to the total protein content. The absence of SNAT7 reduced the amount of Cap24 measured by ELISA in both samples to the same extent, showing that there is no storage of Cap24-positive viral particles inside the infected macrophages. These data were initially pooled in one graph (Fig. 2F), but separate graphs are now provided in new Supp. Fig. 2 E, F.
  
  Regarding the western blot shown in Fig. 2B, please refer to comment #5 under Reviewer #1.
  
  In the new version of the manuscript, we arranged the figures and placed the later stages of the viral cycle in Fig. 2 and the earlier stages, such as fusion, reverse transcription and transcription, in Fig. 3.
  
  Data interpretation would be greatly improved by including infection controls (RT or integrase inhibitors) to confirm that measurements of viral RNA and Gag are indeed modulated by SNAT7 expression.
  
  We thank the reviewer for this suggestion to include inhibitors of viral replication as controls. In our experiments, cells were Mock-infected in parallel as a negative control of viral detection. We provide the results in the new version of the manuscript to show that (i) there is no detection of viral or Gag RNA in the absence of the virus, (ii) the expression of viral genes measured in HIV-1-infected SNAT7-depleted cells is not different from Mock-infected cells, indicating almost complete inhibition of viral transcription (Fig. 3H and Supp. Fig. 3B), also confirmed at the protein level (Fig. 2B, D-F).
  
  Figure 3: Decrease in SNAT7 expression in macrophages resulted in lower levels of early reverse transcripts. But surprisingly, LRT levels were not as affected by decreases in SNAT7 expression. The authors go on to suggest that decreases in early RT are due to loss of phospho-SAMHD1 and increases in catalytically active form of SAMHD1. Mechanistically this does not make sense: LRT should be similarly affected by increase in catalytically active SAMHD1. dNTP concentrations should be measured to determine if the rescue of RT is dependent on SAMHD1 dNTPase activity.
  
  We thank the reviewer for this comment. LRT concentrations are very low in human macrophages and more challenging to detect than ERT concentrations. This might explain why the differences observed between the SNAT7-depleted and control conditions appear less pronounced for LRT than for ERT.
  
  Furthermore, we cannot rule out the possibility that SNAT7 has a cumulative effect throughout the viral cycle. While reverse transcription remains statistically unaltered, and despite the reduced levels of ERT and LRT in SNAT7-depleted macrophages (Fig. 3 F, G), there is a significant impact on the transcription of viral RNAs (Fig. 2I) and Gag (Supp. Fig. 2G). This step may also be altered by the ribonuclease activity of SAMHD1 (Beloglazova et al., 2013; Ryoo et al., 2014).
  
  Finally, with the help of Dr Baek Kim in Atlanta, we attempted to quantify dNTP concentrations in our human macrophages. Unfortunately, it was not possible to draw any conclusions, as the concentrations of dNTPs extracted from our cells were far too low.
  
  Furthermore, it should be noted that SAMHD1 viral restriction through its phosphorylation at T592 is not correlated with its dNTPase activity (Welbourn et al., 2013; White et al., 2013), but with its ribonuclease activity (Beloglazova et al., 2013; Ryoo et al., 2014). This is supporting why SNAT7, by modulating the ribonuclease activity of SAMHD1, could have a greater effect on viral transcription than on reverse transcription.
  
  There is lack of consistency in the data: p24 release upon SNAT7 depletion is highly variable. While there is a dramatic >90-95% decrease in p24 release (Fig. 2G), the effects are much more moderate in Fig. 4H (50-60% attenuation), even though siRNA-mediated depletion was similar across the data sets. The authors should comment on the variability in their findings.
  
  We thank the reviewer for this comment, but believe that Figure 2E rather than Figure 2G is to be mentioned regarding the quantification of CAp24 by Western blot and to be compared with Figure 4H.
  
  In Fig. 2E, we observed an average reduction of 85 % in CAp24 expression normalized to Clathrin HC expression across different donors for both siRNAs targeting SNAT7. For Fig. 4H, there was a 73 % reduction in CAp24 levels for siRNA #1 and a 56 % reduction for siRNA #2. In addition, it should be noted that the reduction in Gag levels is greater in Fig. 4G (between 77 % and 83 %) than in Fig. 2D (between 55 % and 72 %).
  
  Therefore, there is some variation in the results obtained with the different donors, which could be explained by variations in Gag cleavage among donors, but this does not impact the conclusions for both figures.
  
  SNAT7 is postulated to affect 2 steps in the virus life cycle: reverse transcription and viral transcription. But Vpx-mediated SAMHD1 degradation reversed both. Its not clear to me as to how SAMHD1 degradation impacts the role of SNAT7 in viral transcription. No explanation is provided.
  
  We thank the reviewer for this comment. As suggested, we will perform experiments to assess the impact of Vpx-mediated SAMHD1 degradation on viral transcription.
  
  Exogenous addition of glutamine only partially restored Gag synthesis and p24 release, which could be attributed to increased cytoplasmic levels and viral protein synthesis. What about effects on reverse transcription and viral gene expression?
  
  We thank the reviewer for this comment. We will perform the suggested experiments to assess the impact of glutamine supplementation on viral transcription.
  
  Reviewer #2 (Significance (Required)):
  
  This is a novel finding, as there are limited number of studies on amino acid transporters and HIV-1 replication enhancement in macrophages. Most of the previous work has focused on CD4 T cells. These studies on SNAT7 and HIV-1 infection establishment in macrophages might better inform the influences of macrophage metabolism on HIV-1 persistence and inflammatory responses.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  This study investigates the role of the lysosomal glutamine transporter SLC38A7/SNAT7 in HIV‑1 replication in primary human macrophages. The authors demonstrate that SNAT7 is highly expressed in macrophages and upregulated upon HIV‑1 infection. They show that SNAT7 depletion inhibits HIV‑1 production at the reverse transcription step without affecting viral fusion or global cellular translation/transcription. Mechanistically, SNAT7 knockdown reduces the inhibitory phosphorylation of SAMHD1 at T592, and degradation of SAMHD1 by Vpx fully rescues viral replication. Extracellular glutamine supplementation partially restores HIV‑1 production in SNAT7‑deficient cells. Overall, the authors report interesting observations; however, the mechanistic investigation remains preliminary, raising concerns about whether the data fully support all the conclusions drawn. Major Concerns： 1. The mechanistic depth is insufficient. The authors do not elucidate how glutamine regulates SAMHD1 T592 phosphorylation, whether through metabolite‑mediated control of kinases/phosphatases or via indirect effects.
  
  We thank the reviewer for this comment. It is worth noting that (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity using drugs decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This is now further discussed in the discussion section of the manuscript.
  
  The authors do not measure intracellular dNTP levels upon SNAT7 knockdown, which is the key functional substrate of SAMHD1. They also do not directly demonstrate that glutamine supplementation restores dNTP pools.
  
  We thank the reviewer for this comment. Please, refer to comment #5 under Reviewer #2.
  
  Extracellular glutamine only partially rescues viral production, implying the existence of transport‑independent functions of SNAT7 or additional pathways. This important observation is not discussed.
  
  We thank the reviewer for this comment. The discussion has been modified accordingly.
  
  It is suggested that the key findings be validated in immortalized THP‑1 cells differentiated into macrophage‑like cells by PMA.
  
  We thank the reviewer for this suggestion but don’t really understand why this would strengthen our conclusions. Indeed, despite the known variability between donors and technical limitations to transduce cells, we chose human blood monocyte-derived macrophages as a relevant non-transformed model for HIV-1 infection of macrophages. They also represent to some extent the human diversity.
  
  The Discussion section should be expanded to include the potential translational implications and limitations of the present study.
  
  We thank the reviewer for this comment. The discussion points to some elements of potential translation and limitations of the study.
  
  Reviewer #3 (Significance (Required)):
  
  General assessment: This study identifies the lysosomal glutamine transporter SLC38A7/SNAT7 as a novel host dependency factor for HIV‑1 replication in primary human macrophages. The major strengths include the use of physiologically relevant primary macrophage models, a well-organized experimental pipeline from expression profiling to functional validation, and the establishment of a link between SNAT7, glutamine metabolism, and the HIV restriction factor SAMHD1.
  
  Advance: It extends current understanding of HIV‑1 host dependency factors and immunometabolism by revealing a compartment‑specific metabolic pathway that supports viral reverse transcription.
  
  Audience:This work will primarily interest specialized researchers in HIV‑1 biology, host-virus interactions, restriction factors, and antiviral innate immunity.
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  This study from the Niedergang lab establishes SNAT7 as a host-dependency factor in human macrophages that supports HIV-1 replication. They show a modest increase in SNAT7 levels HIV-1 infected macrophages and suggest that SNAT7 levels are transiently increased. Employing siRNA against SNAT7 they show reduction in HIV-1 protein levels and viral RNAs and claim that there is a block of reverse transcription in SNAT7 KD cells. Focusing on a known HIV-1 restriction factor in macrophages, SAMHD1, they interconnect the SNAT7 depletion with a reduction in phosphorylated, i.e. catalytical inactive SAMHD1 arguing that SNAT7 regulates the phosphorylation and thereby antiviral activity of SAMHD1. Since SNAT7 is a glutamine transporter that provides this AA from lysosomes, they lastly supplement glutamine and this somehow rescues the reduction of HIV-1 production in SNAT7 KD cells.
  
  Major comments:
  
  The strength of this manuscript is the clear focus on primary human macrophages that are HIV-1 infected and the interconnection of HIV-1 replication to the SNAT7 siRNA KD experiments in combination with SAMHD1 depletion and lastly glutamine supplementation. This establishes a stringent and coherent story line. The effects reported are modest; high variability is not a problem since using primary hMDM this is expected and can be addressed by testing several donors and applying stringent statistics.
  
  Having said so, I realize that while they give information on the statistical test used, i.e. one-way ANOVA they miss to explain the post-test used to assess significance (i.e. Bonferroni, Fishers LSD, whatsoever). Please add this information.
  
  We thank the reviewer for this comment. The figure legends have been updated to include more details of all the statistical tests used.
  
  Another issue that might underestimate the effects of HIV-1 infection on SNAT7 levels and vice versa of SNAT7 KD on HIV-1 replication is the non-single cell approach employed, i.e. WBlots. I assume that HIV-1 infection rates in macrophages are not super high, usually not exceeding 20-30%. So indeed the effects the authors observe could be much higher, when checking at the single cell level. I do not know about the SNAT7 ab, but all the other reagents should work via flow cytometry and could hence improve the readout a lot.
  
  We agree with the reviewer and indeed, in previous studies on HIV-1 infection of human macrophages performed in the lab, we observed via immunofluorescence that the proportion of infected cells ranged from 20 to 40 %. At the time of submission, we did not have the possibility to label the native SNAT7 protein by immunofluorescence, as the commercial antibody used only works for western blotting.
  
  In the meantime, we have been validating a new antibody (Proteintech) targeting SNAT7 for immunofluorescence. If this is confirmed, we will be able to detect and quantify HIV-1 p24 by immunofluorescence in SNAT7-depleted human macrophages and control cells, thus confirming our results in single-cell analysis.
  
  Flow cytometry analyses are difficult to perform on primary human macrophages because these cells are highly adherent and must be detached first. The process induces significant cell death and damage. This is why we would prefer to carry out these analyses using immunofluorescence and microscopy on adhered cells. This option will be undoubtedly pursued.
  
  Furthermore the authors never commented about a dose-response effect in terms of HIV-1 infection levels. There is a MOI dependency described for Suppl.Fig.1 C-F, unfortunately the data is missing in the manuscript.
  
  We apologize for this omission. The figures showing the increase in SNAT7 protein expression following HIV-1 infection at MOIs ranging from 0.05 to 0.5 were added to the new version of the manuscript (Supp. Fig. 1 C-F).
  
  Figure1: specify circulating T lymphocytes. I would expect to see levels of SNAT7 in PHA or CD3/CD28 activated lymphocytes versus resting T cells and a time course of SNAT7 levels upon activation. I think even though SNAT7 levels in T cells might be low, they could also be increased by HIV-1 infection and it is essential that the authors test for this. If not, the result is a valid negative control. For this they should employ HIV-1 primary strains with a tropism for T cells, or at least lab-adapted HIV-1 NL4-3
  
  We thank the reviewer for this comment. Circulating T lymphocytes isolated from the blood of healthy donors are now referred to resting lymphocytes in the new version of the manuscript, as opposed to activated T lymphocytes stimulated with IL2 and PHA-P for several days (Fig. 1 A-C).
  
  The expression levels of SNAT7, both at the gene and protein levels, are lower in resting or IL2/PHA-P-activated T cells than in macrophages from the same donors. As suggested, we will perform a kinetic of T-cell activation upon HIV-1 infection to investigate how SNAT7 expression varies in these conditions.
  
  Figure 2 again single cell measurements could reveal much more pronounced effects; it is a bit counterintuitive that siRNA #2 is more efficient in SNAT7 KD but has higher levels of HIV-1 replication in terms of Gag levels. I assume when looking at the stats it is always a comparison to the Ctl treated cells (C-G), but this is not entirely clear. Unify labeling as compared to the stats in Fig.2 I (this also applies for all the other figs).
  
  We thank the reviewer for this comment. Fig. 2B indeed shows one of the different donors analyzed. However, protein quantification across six different donors shows that SNAT7 is more depleted with siRNA #2 (Fig. 2C), and that Gag Pr55 protein levels are consequently more reduced, than with siRNA #1 (Fig. 2D).
  
  We use GraphPad Prism software to perform statistical analysis. Depending on the test used, the software automatically plots the comparison bar and displays the p-value above it. We changed the representation of statistics as suggested.
  
  Figure 3: It is a bit odd that they finally conclude on RT as essential step that is reduced in the absence of SNAT7 and then they fail to provide statistical significance for this (Fig.3 panels F and G). One would expect that RT is much more affected given the huge effects on HIV-1 capsid and particle production shown in Fig.2 F, G and I.
  
  The reviewer is right in pointing that we observed a stronger effect during the later stages of the viral cycle, from transcription of viral RNAs (Fig. 2I and Supp. Fig. 2G) to the production of viral particles in the supernatant (Fig. 2D-G), than during the earlier stage of reverse transcription (Fig. 3F, G). Also, it is also possible that we might have missed the peak in ERT/LRT production, which is transient.
  
  It should be noted that SAMHD1 exhibits both dNTPase (Goldstone et al., 2011) and nuclease (Beloglazova et al., 2013) activities. The ability of SAMHD1 to restrict the virus, through dephosphorylation at T592, is mediated by its RNase activity (Ryoo et al., 2014), and not by the dNTPase activity (Welbourn et al., 2013; White et al., 2013).This could explain why SNAT7 exhibit a stronger impact on viral transcription than on reverse transcription.
  
  Figure 4; again single cell flow measurements of SAMHD1, pSAMHD1 and p24 /SNAT7 might help to more clearly discriminate effects that are specifically induced upon infection or happen in virally infected cells. Maybe alternatively IF?
  
  We thank the reviewer for this suggestion. As mentioned under comment #2, flow cytometry analyses are difficult to perform on strongly adherent primary human macrophages.
  
  With regard to immunofluorescence, there is a technical limitation based on the species in which the antibodies are produced. The antibody that targets the native SNAT7 protein, which is currently being validated in our laboratory, is produced in rabbits. An anti-CAp24 antibody produced in goats can be used. It will then be necessary to co-label the cells with anti SAMHD1 and phospho-SAMHD1produced in mouse. We will try to find options to co-label the cells.
  
  The wblot shown in panel D does not really reflect the point the authors want to make by the quantification in panels G-I. Primary data (D) suggests that SNAT7 KD reduces HIV-1 production even in the absence of SAMHD1. The quantification rather indicates that SNAT7 KD does not affect HIV-1 production in the absence of SAMHD1. This needs clarification/corroboration by orthogonal approaches.
  
  We respectfully disagree with the reviewer.
  
  Figure 4D shows a representative blot of the six donors analysed. As mentioned, the depletion of SNAT7 in the absence of SAMHD1 reduces the production of the viral proteins GagPr55 and CAp24 (see Fig. 4D). This is illustrated by the quantifications (Fig. 4G–I). Following treatment with Vpx, GagPr55 protein expression in SNAT7 KD macrophages is reduced by a factor of 2.6 for siRNA #1 (mean = 1.48, light grey bar) and by a factor of 1.83 for siRNA #2 (mean = 2.13, orange bar), compared to the control (mean = 3.9, pink bar) (Fig. 4G). Similarly, CAp24 protein expression was reduced by a factor of 2.2 for siRNA #1 (mean = 2.05, light grey bar) and by a factor of 1.36 for siRNA #2 (mean = 3.34, orange bar), compared to the control (mean = 4.52, pink bar) (Fig. 4H).
  
  These differences are therefore consistent between the Western blot and the quantifications. However, they are not significantly different to those observed in cells treated with Vpx and depleted with control siRNA, suggesting that the viral restriction observed in SNAT7 KD cells is primarily due to SAMHD1.
  
  Figure 5: show SAMHD1 and pSAMHD1 levels upon glutamine supplementation.
  
  We thank the reviewer for this comment, we will perform the suggested experiment.
  
  I think the discussion is very thin, mainly summarizing the results; but fails to give broader context or critically discuss the limitations and further directions.
  
  We thank the reviewer for this comment. The discussion will be modified further accordingly.
  
  Looking at the data as a whole, I think the results support a modest functional importance of SNAT7 for HIV-1 production in macrophages. I acknowledge that the experiments in primary macrophages are prone to high variability in different donors and the authors transparently depicted their data. However clearly, I would advice the authors to tune down the extend in which they claim SNAT7-dependency given this huge variability and the sometimes-borderline statistics. We respectfully disagree with the reviewer.
  
  The cells used here imply greater variability than a cell line, but are also more relevant.
  
  Indeed, the effects observed in the late stages of HIV-1 production are:
  
  ~80 % decrease in viral transcription compared to the control (Fig. 2I),
  
  ~85 % decrease in CAp24 protein expression compared to the control, as quantified by western blot (Fig. 2E), or ~90 % by ELISA measurement (Fig. 2F),
  
  a reduction of more than 90 % in the release of infectious particles (Fig. 2G).
  
  These results were all significant across donors, while SNAT7 depletion was always partial (Fig. 2C, between 31 to 62 % of depletion compared to the control in infected cells).
  
  Therefore, the data were obtained from a mixture of depleted and non-depleted macrophages. This means that the results may be underestimated.
  
  Together, our results show that SNAT7 is necessary for HIV-1 production.
  
  However, reading the comments, we realized that our conclusions regarding reverse transcription were too strong. SNAT7 depletion does not affect viral fusion and reverse transcription. The manuscript was modified accordingly.
  
  On top, there are a lot of optional experiments I am sure the authors are aware of that should be done at least in the future.
  
  For instance, how does HIV-1 upregulate SNAT7, is a viral accessory protein involved? What is the mechanism of SNAT7 dependent SAMHD1 phosphorylation? Does SNAT7 (or glutamine) regulate the activity of the SAMHD1 associated kinase / phosphatase) If so, does this impact on other targets of these enzymes? We thank the reviewer for these questions.
  
  To address the role of accessory viral proteins, we have already performed one experiment infecting hMDM with HIV-1 strains deleted for genes such as Nef, Vpr, Vpu and Vif, and have found no clear effect on SNAT7 protein expression compared to WT strains. As an alternative experiment, we could overexpress individual viral genes, such as Nef or Vpr, in HeLa cells and analyze their impact on SNAT7 expression by Western blot.
  
  It is also possible that SNAT7 expression and recycling of lysosomal glutamine are modulated by the macrophage intrinsic immunity in response to HIV-1 infection.
  
  The Thr592 motif of the SAMHD1 protein is phosphorylated by Cyclin A2/CDK1 and type 1 IFN in non-cycling cells, such as MDMs (Cribier et al., 2013). For now, the relationship between SNAT7 and SAMHD1 remains unclear. However, (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This has been added to the discussion to explain the relationship between the 3 partners.
  
  **Referees cross-commenting** I think the comments from the other referees are reasonable and consistent with my assessment
  
  Reviewer #1 (Significance (Required)):
  
  Strength and limitations see above;
  
  Significance: I think this work is of high interest for virologists working in the field of HIV-1 and infection of myeloid cells. In case SNAT7 (and hence glutamine) indeed regulates the phosphorylation of SAMHD1, there could potentially be broad relevance of this work. However unfortunately, this aspect remains underdeveloped and is also not discussed
  
  Field of expertise: HIV-1, immunology, cell biology
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  In this report, Herit and colleagues describe the role of a HIV-1 dependency factor that promotes virus replication in macrophages. The authors suggest that the lysosomal membrane-associated SNAT7 glutamine transporter is a HIV dependency factor, that promotes virus replication by enhancing reverse transcription and Gag synthesis. The authors use transient knock-down approaches in primary macrophages to identify that SNAT7 depletion does not impact viral entry but inhibits early reverse transcription which was reversed by exogenous glutamine addition. While reverse transcription enhancement was likely due to selective increase in phosho-SAMHD1 expression, mechanisms by which SNAT7 enhanced viral gene expression were not clearly defined. These are well-controlled studies that pinpoint the role of SNAT7 in the early steps of viral life cycle and highlight the intricate interplay between macrophage metabolism and HIV-1 replication. While the question that is addressed is important, and the hypothesis overall sound, the data presented needs to be strengthened to support the conclusions. There are numerous weaknesses in data interpretation as well.
  
  Figure 1: SNAT7 expression was selectively enhanced upon differentiation of monocytes into macrophages but absent in CD4+ T cells. Though there is a claim of enhancement of SNAT7 expression upon HIV-1 infection of macrophages, RT-qPCR analysis shows the opposite trend (Fig 1E) and SNAT7 protein expression changes are modest. Statistical analysis in Fig. 1H needs to be revisited. The number of replicates vary for the lysates harvested at different day post infection, which might have an impact on the statistical test. To determine if SNAT7 expression enhancement is dependent on establishment of virus infection, as the authors imply, control lysates of virus infections in presence of replication inhibitors should be included.
  
  We thank the reviewer for this comment. Indeed, there is a modest, but statistically significant increase in SNAT7 protein expression upon HIV-1 infection over time (Fig. 1G, H), without any modulation of SNAT7 gene expression (Fig. 1E). This indicates that the regulation of SNAT7 expression in this context is only at the translation level (i.e. increase of translation or stabilization of the SNAT7 protein).
  
  As mentioned, Fig. 1H aggregates between 3 to 7 independent experiments on different donors depending on the infection time point. SNAT7 protein expression is increased already at 1 day post-infection and until 8 days. The statistical test used here, i.e. 2 way-ANOVA, compared Mock-infected and HIV-1-infected condition for each time point with the same number of donors. In this figure, the comparison is statistically different only at day 6 of the time course (7 donors). We agree that increasing the number of donors of the other time points could help to improve the statistical difference between control and infection condition.
  
  We thank the reviewer for the suggestion mentioning the use of replication inhibitors in this experiment. We plan to use inhibitors of reverse transcription (Nevirapin) and integration (Dolutegravir).
  
  The authors rely exclusively on western blot analysis for HIV-1 Gag expression in cell lysates as a measure of effects of SNAT7 on virus replication. Single cell analysis such as intracellular p24gag analysis by FACS should be included; this will provide a better measure of effects of SNAT7 onHIV-1 infection establishment.
  
  We respectfully disagree with the reviewer for this question. Indeed, to evaluate the effects of SNAT7 on HIV-1 replication, we measured Gag Pr55 and Cap24 using a Western blot approach (Fig. 2B, D and E), but also assessed the quantity of Cap24 in the supernatants and lysates using an ELISA measurement, the quantity of infectious particles using TZM reporter cells, and total viral transcription or more specifically Gag Pr55 transcription using qPCR (Fig. 2F, G and I and Supp. Fig. 2G).
  
  Regarding the quantification of CAp24 at the cell single level, please refer to comment #2 under Reviewer #1.
  
  Knockdown of SNAT7 in MDMs was partial at best; only 30-50% decrease in expression (Fig 2C), but the effects on viral gene expression (Fig. 2I), p24 release and infectious particle production is dramatic (Fig. 2F and G). This discrepancy is not addressed. Does SNAT7 knock-down negatively impact virus particle release? Please note that the representative WB in Fig 2B does not correlate with the quantification in Fig. 2D. There are no p55gag or p24gag bands in SNAT7#1 siRNA condition (Fig. 2B)? Data could also be rearranged to follow the logical sequence of virus replication cycle (viral RNa expression followed by Gag expression, and then release).
  
  We thank the reviewer for this comment. Our samples are indeed a mixture of SNAT7-depleted and non-depleted macrophages and RNA interference in these cells often leads to a decrease of 50 % of the protein expression.
  
  To determine whether SNAT7 is involved in the release of particles, we quantified Cap24 in cell lysates and in the cell culture medium separately, and normalized the results to the total protein content. The absence of SNAT7 reduced the amount of Cap24 measured by ELISA in both samples to the same extent, showing that there is no storage of Cap24-positive viral particles inside the infected macrophages. These data were initially pooled in one graph (Fig. 2F), but separate graphs are now provided in new Supp. Fig. 2 E, F.
  
  Regarding the western blot shown in Fig. 2B, please refer to comment #5 under Reviewer #1.
  
  In the new version of the manuscript, we arranged the figures and placed the later stages of the viral cycle in Fig. 2 and the earlier stages, such as fusion, reverse transcription and transcription, in Fig. 3.
  
  Data interpretation would be greatly improved by including infection controls (RT or integrase inhibitors) to confirm that measurements of viral RNA and Gag are indeed modulated by SNAT7 expression.
  
  We thank the reviewer for this suggestion to include inhibitors of viral replication as controls. In our experiments, cells were Mock-infected in parallel as a negative control of viral detection. We provide the results in the new version of the manuscript to show that (i) there is no detection of viral or Gag RNA in the absence of the virus, (ii) the expression of viral genes measured in HIV-1-infected SNAT7-depleted cells is not different from Mock-infected cells, indicating almost complete inhibition of viral transcription (Fig. 3H and Supp. Fig. 3B), also confirmed at the protein level (Fig. 2B, D-F).
  
  Figure 3: Decrease in SNAT7 expression in macrophages resulted in lower levels of early reverse transcripts. But surprisingly, LRT levels were not as affected by decreases in SNAT7 expression. The authors go on to suggest that decreases in early RT are due to loss of phospho-SAMHD1 and increases in catalytically active form of SAMHD1. Mechanistically this does not make sense: LRT should be similarly affected by increase in catalytically active SAMHD1. dNTP concentrations should be measured to determine if the rescue of RT is dependent on SAMHD1 dNTPase activity.
  
  We thank the reviewer for this comment. LRT concentrations are very low in human macrophages and more challenging to detect than ERT concentrations. This might explain why the differences observed between the SNAT7-depleted and control conditions appear less pronounced for LRT than for ERT.
  
  Furthermore, we cannot rule out the possibility that SNAT7 has a cumulative effect throughout the viral cycle. While reverse transcription remains statistically unaltered, and despite the reduced levels of ERT and LRT in SNAT7-depleted macrophages (Fig. 3 F, G), there is a significant impact on the transcription of viral RNAs (Fig. 2I) and Gag (Supp. Fig. 2G). This step may also be altered by the ribonuclease activity of SAMHD1 (Beloglazova et al., 2013; Ryoo et al., 2014).
  
  Finally, with the help of Dr Baek Kim in Atlanta, we attempted to quantify dNTP concentrations in our human macrophages. Unfortunately, it was not possible to draw any conclusions, as the concentrations of dNTPs extracted from our cells were far too low.
  
  Furthermore, it should be noted that SAMHD1 viral restriction through its phosphorylation at T592 is not correlated with its dNTPase activity (Welbourn et al., 2013; White et al., 2013), but with its ribonuclease activity (Beloglazova et al., 2013; Ryoo et al., 2014). This is supporting why SNAT7, by modulating the ribonuclease activity of SAMHD1, could have a greater effect on viral transcription than on reverse transcription.
  
  There is lack of consistency in the data: p24 release upon SNAT7 depletion is highly variable. While there is a dramatic >90-95% decrease in p24 release (Fig. 2G), the effects are much more moderate in Fig. 4H (50-60% attenuation), even though siRNA-mediated depletion was similar across the data sets. The authors should comment on the variability in their findings.
  
  We thank the reviewer for this comment, but believe that Figure 2E rather than Figure 2G is to be mentioned regarding the quantification of CAp24 by Western blot and to be compared with Figure 4H.
  
  In Fig. 2E, we observed an average reduction of 85 % in CAp24 expression normalized to Clathrin HC expression across different donors for both siRNAs targeting SNAT7. For Fig. 4H, there was a 73 % reduction in CAp24 levels for siRNA #1 and a 56 % reduction for siRNA #2. In addition, it should be noted that the reduction in Gag levels is greater in Fig. 4G (between 77 % and 83 %) than in Fig. 2D (between 55 % and 72 %).
  
  Therefore, there is some variation in the results obtained with the different donors, which could be explained by variations in Gag cleavage among donors, but this does not impact the conclusions for both figures.
  
  SNAT7 is postulated to affect 2 steps in the virus life cycle: reverse transcription and viral transcription. But Vpx-mediated SAMHD1 degradation reversed both. Its not clear to me as to how SAMHD1 degradation impacts the role of SNAT7 in viral transcription. No explanation is provided.
  
  We thank the reviewer for this comment. As suggested, we will perform experiments to assess the impact of Vpx-mediated SAMHD1 degradation on viral transcription.
  
  Exogenous addition of glutamine only partially restored Gag synthesis and p24 release, which could be attributed to increased cytoplasmic levels and viral protein synthesis. What about effects on reverse transcription and viral gene expression?
  
  We thank the reviewer for this comment. We will perform the suggested experiments to assess the impact of glutamine supplementation on viral transcription.
  
  Reviewer #2 (Significance (Required)):
  
  This is a novel finding, as there are limited number of studies on amino acid transporters and HIV-1 replication enhancement in macrophages. Most of the previous work has focused on CD4 T cells. These studies on SNAT7 and HIV-1 infection establishment in macrophages might better inform the influences of macrophage metabolism on HIV-1 persistence and inflammatory responses.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  This study investigates the role of the lysosomal glutamine transporter SLC38A7/SNAT7 in HIV‑1 replication in primary human macrophages. The authors demonstrate that SNAT7 is highly expressed in macrophages and upregulated upon HIV‑1 infection. They show that SNAT7 depletion inhibits HIV‑1 production at the reverse transcription step without affecting viral fusion or global cellular translation/transcription. Mechanistically, SNAT7 knockdown reduces the inhibitory phosphorylation of SAMHD1 at T592, and degradation of SAMHD1 by Vpx fully rescues viral replication. Extracellular glutamine supplementation partially restores HIV‑1 production in SNAT7‑deficient cells. Overall, the authors report interesting observations; however, the mechanistic investigation remains preliminary, raising concerns about whether the data fully support all the conclusions drawn. Major Concerns： 1. The mechanistic depth is insufficient. The authors do not elucidate how glutamine regulates SAMHD1 T592 phosphorylation, whether through metabolite‑mediated control of kinases/phosphatases or via indirect effects.
  
  We thank the reviewer for this comment. It is worth noting that (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity using drugs decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This is now further discussed in the discussion section of the manuscript.
  
  The authors do not measure intracellular dNTP levels upon SNAT7 knockdown, which is the key functional substrate of SAMHD1. They also do not directly demonstrate that glutamine supplementation restores dNTP pools.
  
  We thank the reviewer for this comment. Please, refer to comment #5 under Reviewer #2.
  
  Extracellular glutamine only partially rescues viral production, implying the existence of transport‑independent functions of SNAT7 or additional pathways. This important observation is not discussed.
  
  We thank the reviewer for this comment. The discussion has been modified accordingly.
  
  It is suggested that the key findings be validated in immortalized THP‑1 cells differentiated into macrophage‑like cells by PMA.
  
  We thank the reviewer for this suggestion but don’t really understand why this would strengthen our conclusions. Indeed, despite the known variability between donors and technical limitations to transduce cells, we chose human blood monocyte-derived macrophages as a relevant non-transformed model for HIV-1 infection of macrophages. They also represent to some extent the human diversity.
  
  The Discussion section should be expanded to include the potential translational implications and limitations of the present study.
  
  We thank the reviewer for this comment. The discussion points to some elements of potential translation and limitations of the study.
  
  Reviewer #3 (Significance (Required)):
  
  General assessment: This study identifies the lysosomal glutamine transporter SLC38A7/SNAT7 as a novel host dependency factor for HIV‑1 replication in primary human macrophages. The major strengths include the use of physiologically relevant primary macrophage models, a well-organized experimental pipeline from expression profiling to functional validation, and the establishment of a link between SNAT7, glutamine metabolism, and the HIV restriction factor SAMHD1.
  
  Advance: It extends current understanding of HIV‑1 host dependency factors and immunometabolism by revealing a compartment‑specific metabolic pathway that supports viral reverse transcription.
  
  Audience:This work will primarily interest specialized researchers in HIV‑1 biology, host-virus interactions, restriction factors, and antiviral innate immunity.
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  This study from the Niedergang lab establishes SNAT7 as a host-dependency factor in human macrophages that supports HIV-1 replication. They show a modest increase in SNAT7 levels HIV-1 infected macrophages and suggest that SNAT7 levels are transiently increased. Employing siRNA against SNAT7 they show reduction in HIV-1 protein levels and viral RNAs and claim that there is a block of reverse transcription in SNAT7 KD cells. Focusing on a known HIV-1 restriction factor in macrophages, SAMHD1, they interconnect the SNAT7 depletion with a reduction in phosphorylated, i.e. catalytical inactive SAMHD1 arguing that SNAT7 regulates the phosphorylation and thereby antiviral activity of SAMHD1. Since SNAT7 is a glutamine transporter that provides this AA from lysosomes, they lastly supplement glutamine and this somehow rescues the reduction of HIV-1 production in SNAT7 KD cells.
  
  Major comments:
  
  The strength of this manuscript is the clear focus on primary human macrophages that are HIV-1 infected and the interconnection of HIV-1 replication to the SNAT7 siRNA KD experiments in combination with SAMHD1 depletion and lastly glutamine supplementation. This establishes a stringent and coherent story line. The effects reported are modest; high variability is not a problem since using primary hMDM this is expected and can be addressed by testing several donors and applying stringent statistics.
  
  Having said so, I realize that while they give information on the statistical test used, i.e. one-way ANOVA they miss to explain the post-test used to assess significance (i.e. Bonferroni, Fishers LSD, whatsoever). Please add this information.
  
  We thank the reviewer for this comment. The figure legends have been updated to include more details of all the statistical tests used.
  
  Another issue that might underestimate the effects of HIV-1 infection on SNAT7 levels and vice versa of SNAT7 KD on HIV-1 replication is the non-single cell approach employed, i.e. WBlots. I assume that HIV-1 infection rates in macrophages are not super high, usually not exceeding 20-30%. So indeed the effects the authors observe could be much higher, when checking at the single cell level. I do not know about the SNAT7 ab, but all the other reagents should work via flow cytometry and could hence improve the readout a lot.
  
  We agree with the reviewer and indeed, in previous studies on HIV-1 infection of human macrophages performed in the lab, we observed via immunofluorescence that the proportion of infected cells ranged from 20 to 40 %. At the time of submission, we did not have the possibility to label the native SNAT7 protein by immunofluorescence, as the commercial antibody used only works for western blotting.
  
  In the meantime, we have been validating a new antibody (Proteintech) targeting SNAT7 for immunofluorescence. If this is confirmed, we will be able to detect and quantify HIV-1 p24 by immunofluorescence in SNAT7-depleted human macrophages and control cells, thus confirming our results in single-cell analysis.
  
  Flow cytometry analyses are difficult to perform on primary human macrophages because these cells are highly adherent and must be detached first. The process induces significant cell death and damage. This is why we would prefer to carry out these analyses using immunofluorescence and microscopy on adhered cells. This option will be undoubtedly pursued.
  
  Furthermore the authors never commented about a dose-response effect in terms of HIV-1 infection levels. There is a MOI dependency described for Suppl.Fig.1 C-F, unfortunately the data is missing in the manuscript.
  
  We apologize for this omission. The figures showing the increase in SNAT7 protein expression following HIV-1 infection at MOIs ranging from 0.05 to 0.5 were added to the new version of the manuscript (Supp. Fig. 1 C-F).
  
  Figure1: specify circulating T lymphocytes. I would expect to see levels of SNAT7 in PHA or CD3/CD28 activated lymphocytes versus resting T cells and a time course of SNAT7 levels upon activation. I think even though SNAT7 levels in T cells might be low, they could also be increased by HIV-1 infection and it is essential that the authors test for this. If not, the result is a valid negative control. For this they should employ HIV-1 primary strains with a tropism for T cells, or at least lab-adapted HIV-1 NL4-3
  
  We thank the reviewer for this comment. Circulating T lymphocytes isolated from the blood of healthy donors are now referred to resting lymphocytes in the new version of the manuscript, as opposed to activated T lymphocytes stimulated with IL2 and PHA-P for several days (Fig. 1 A-C).
  
  The expression levels of SNAT7, both at the gene and protein levels, are lower in resting or IL2/PHA-P-activated T cells than in macrophages from the same donors. As suggested, we will perform a kinetic of T-cell activation upon HIV-1 infection to investigate how SNAT7 expression varies in these conditions.
  
  Figure 2 again single cell measurements could reveal much more pronounced effects; it is a bit counterintuitive that siRNA #2 is more efficient in SNAT7 KD but has higher levels of HIV-1 replication in terms of Gag levels. I assume when looking at the stats it is always a comparison to the Ctl treated cells (C-G), but this is not entirely clear. Unify labeling as compared to the stats in Fig.2 I (this also applies for all the other figs).
  
  We thank the reviewer for this comment. Fig. 2B indeed shows one of the different donors analyzed. However, protein quantification across six different donors shows that SNAT7 is more depleted with siRNA #2 (Fig. 2C), and that Gag Pr55 protein levels are consequently more reduced, than with siRNA #1 (Fig. 2D).
  
  We use GraphPad Prism software to perform statistical analysis. Depending on the test used, the software automatically plots the comparison bar and displays the p-value above it. We changed the representation of statistics as suggested.
  
  Figure 3: It is a bit odd that they finally conclude on RT as essential step that is reduced in the absence of SNAT7 and then they fail to provide statistical significance for this (Fig.3 panels F and G). One would expect that RT is much more affected given the huge effects on HIV-1 capsid and particle production shown in Fig.2 F, G and I.
  
  The reviewer is right in pointing that we observed a stronger effect during the later stages of the viral cycle, from transcription of viral RNAs (Fig. 2I and Supp. Fig. 2G) to the production of viral particles in the supernatant (Fig. 2D-G), than during the earlier stage of reverse transcription (Fig. 3F, G). Also, it is also possible that we might have missed the peak in ERT/LRT production, which is transient.
  
  It should be noted that SAMHD1 exhibits both dNTPase (Goldstone et al., 2011) and nuclease (Beloglazova et al., 2013) activities. The ability of SAMHD1 to restrict the virus, through dephosphorylation at T592, is mediated by its RNase activity (Ryoo et al., 2014), and not by the dNTPase activity (Welbourn et al., 2013; White et al., 2013).This could explain why SNAT7 exhibit a stronger impact on viral transcription than on reverse transcription.
  
  Figure 4; again single cell flow measurements of SAMHD1, pSAMHD1 and p24 /SNAT7 might help to more clearly discriminate effects that are specifically induced upon infection or happen in virally infected cells. Maybe alternatively IF?
  
  We thank the reviewer for this suggestion. As mentioned under comment #2, flow cytometry analyses are difficult to perform on strongly adherent primary human macrophages.
  
  With regard to immunofluorescence, there is a technical limitation based on the species in which the antibodies are produced. The antibody that targets the native SNAT7 protein, which is currently being validated in our laboratory, is produced in rabbits. An anti-CAp24 antibody produced in goats can be used. It will then be necessary to co-label the cells with anti SAMHD1 and phospho-SAMHD1produced in mouse. We will try to find options to co-label the cells.
  
  The wblot shown in panel D does not really reflect the point the authors want to make by the quantification in panels G-I. Primary data (D) suggests that SNAT7 KD reduces HIV-1 production even in the absence of SAMHD1. The quantification rather indicates that SNAT7 KD does not affect HIV-1 production in the absence of SAMHD1. This needs clarification/corroboration by orthogonal approaches.
  
  We respectfully disagree with the reviewer.
  
  Figure 4D shows a representative blot of the six donors analysed. As mentioned, the depletion of SNAT7 in the absence of SAMHD1 reduces the production of the viral proteins GagPr55 and CAp24 (see Fig. 4D). This is illustrated by the quantifications (Fig. 4G–I). Following treatment with Vpx, GagPr55 protein expression in SNAT7 KD macrophages is reduced by a factor of 2.6 for siRNA #1 (mean = 1.48, light grey bar) and by a factor of 1.83 for siRNA #2 (mean = 2.13, orange bar), compared to the control (mean = 3.9, pink bar) (Fig. 4G). Similarly, CAp24 protein expression was reduced by a factor of 2.2 for siRNA #1 (mean = 2.05, light grey bar) and by a factor of 1.36 for siRNA #2 (mean = 3.34, orange bar), compared to the control (mean = 4.52, pink bar) (Fig. 4H).
  
  These differences are therefore consistent between the Western blot and the quantifications. However, they are not significantly different to those observed in cells treated with Vpx and depleted with control siRNA, suggesting that the viral restriction observed in SNAT7 KD cells is primarily due to SAMHD1.
  
  Figure 5: show SAMHD1 and pSAMHD1 levels upon glutamine supplementation.
  
  We thank the reviewer for this comment, we will perform the suggested experiment.
  
  I think the discussion is very thin, mainly summarizing the results; but fails to give broader context or critically discuss the limitations and further directions.
  
  We thank the reviewer for this comment. The discussion will be modified further accordingly.
  
  Looking at the data as a whole, I think the results support a modest functional importance of SNAT7 for HIV-1 production in macrophages. I acknowledge that the experiments in primary macrophages are prone to high variability in different donors and the authors transparently depicted their data. However clearly, I would advice the authors to tune down the extend in which they claim SNAT7-dependency given this huge variability and the sometimes-borderline statistics. We respectfully disagree with the reviewer.
  
  The cells used here imply greater variability than a cell line, but are also more relevant.
  
  Indeed, the effects observed in the late stages of HIV-1 production are:
  
  ~80 % decrease in viral transcription compared to the control (Fig. 2I),
  
  ~85 % decrease in CAp24 protein expression compared to the control, as quantified by western blot (Fig. 2E), or ~90 % by ELISA measurement (Fig. 2F),
  
  a reduction of more than 90 % in the release of infectious particles (Fig. 2G).
  
  These results were all significant across donors, while SNAT7 depletion was always partial (Fig. 2C, between 31 to 62 % of depletion compared to the control in infected cells).
  
  Therefore, the data were obtained from a mixture of depleted and non-depleted macrophages. This means that the results may be underestimated.
  
  Together, our results show that SNAT7 is necessary for HIV-1 production.
  
  However, reading the comments, we realized that our conclusions regarding reverse transcription were too strong. SNAT7 depletion does not affect viral fusion and reverse transcription. The manuscript was modified accordingly.
  
  On top, there are a lot of optional experiments I am sure the authors are aware of that should be done at least in the future.
  
  For instance, how does HIV-1 upregulate SNAT7, is a viral accessory protein involved? What is the mechanism of SNAT7 dependent SAMHD1 phosphorylation? Does SNAT7 (or glutamine) regulate the activity of the SAMHD1 associated kinase / phosphatase) If so, does this impact on other targets of these enzymes? We thank the reviewer for these questions.
  
  To address the role of accessory viral proteins, we have already performed one experiment infecting hMDM with HIV-1 strains deleted for genes such as Nef, Vpr, Vpu and Vif, and have found no clear effect on SNAT7 protein expression compared to WT strains. As an alternative experiment, we could overexpress individual viral genes, such as Nef or Vpr, in HeLa cells and analyze their impact on SNAT7 expression by Western blot.
  
  It is also possible that SNAT7 expression and recycling of lysosomal glutamine are modulated by the macrophage intrinsic immunity in response to HIV-1 infection.
  
  The Thr592 motif of the SAMHD1 protein is phosphorylated by Cyclin A2/CDK1 and type 1 IFN in non-cycling cells, such as MDMs (Cribier et al., 2013). For now, the relationship between SNAT7 and SAMHD1 remains unclear. However, (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This has been added to the discussion to explain the relationship between the 3 partners.
  
  **Referees cross-commenting** I think the comments from the other referees are reasonable and consistent with my assessment
  
  Reviewer #1 (Significance (Required)):
  
  Strength and limitations see above;
  
  Significance: I think this work is of high interest for virologists working in the field of HIV-1 and infection of myeloid cells. In case SNAT7 (and hence glutamine) indeed regulates the phosphorylation of SAMHD1, there could potentially be broad relevance of this work. However unfortunately, this aspect remains underdeveloped and is also not discussed
  
  Field of expertise: HIV-1, immunology, cell biology
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  In this report, Herit and colleagues describe the role of a HIV-1 dependency factor that promotes virus replication in macrophages. The authors suggest that the lysosomal membrane-associated SNAT7 glutamine transporter is a HIV dependency factor, that promotes virus replication by enhancing reverse transcription and Gag synthesis. The authors use transient knock-down approaches in primary macrophages to identify that SNAT7 depletion does not impact viral entry but inhibits early reverse transcription which was reversed by exogenous glutamine addition. While reverse transcription enhancement was likely due to selective increase in phosho-SAMHD1 expression, mechanisms by which SNAT7 enhanced viral gene expression were not clearly defined. These are well-controlled studies that pinpoint the role of SNAT7 in the early steps of viral life cycle and highlight the intricate interplay between macrophage metabolism and HIV-1 replication. While the question that is addressed is important, and the hypothesis overall sound, the data presented needs to be strengthened to support the conclusions. There are numerous weaknesses in data interpretation as well.
  
  Figure 1: SNAT7 expression was selectively enhanced upon differentiation of monocytes into macrophages but absent in CD4+ T cells. Though there is a claim of enhancement of SNAT7 expression upon HIV-1 infection of macrophages, RT-qPCR analysis shows the opposite trend (Fig 1E) and SNAT7 protein expression changes are modest. Statistical analysis in Fig. 1H needs to be revisited. The number of replicates vary for the lysates harvested at different day post infection, which might have an impact on the statistical test. To determine if SNAT7 expression enhancement is dependent on establishment of virus infection, as the authors imply, control lysates of virus infections in presence of replication inhibitors should be included.
  
  We thank the reviewer for this comment. Indeed, there is a modest, but statistically significant increase in SNAT7 protein expression upon HIV-1 infection over time (Fig. 1G, H), without any modulation of SNAT7 gene expression (Fig. 1E). This indicates that the regulation of SNAT7 expression in this context is only at the translation level (i.e. increase of translation or stabilization of the SNAT7 protein).
  
  As mentioned, Fig. 1H aggregates between 3 to 7 independent experiments on different donors depending on the infection time point. SNAT7 protein expression is increased already at 1 day post-infection and until 8 days. The statistical test used here, i.e. 2 way-ANOVA, compared Mock-infected and HIV-1-infected condition for each time point with the same number of donors. In this figure, the comparison is statistically different only at day 6 of the time course (7 donors). We agree that increasing the number of donors of the other time points could help to improve the statistical difference between control and infection condition.
  
  We thank the reviewer for the suggestion mentioning the use of replication inhibitors in this experiment. We plan to use inhibitors of reverse transcription (Nevirapin) and integration (Dolutegravir).
  
  The authors rely exclusively on western blot analysis for HIV-1 Gag expression in cell lysates as a measure of effects of SNAT7 on virus replication. Single cell analysis such as intracellular p24gag analysis by FACS should be included; this will provide a better measure of effects of SNAT7 onHIV-1 infection establishment.
  
  We respectfully disagree with the reviewer for this question. Indeed, to evaluate the effects of SNAT7 on HIV-1 replication, we measured Gag Pr55 and Cap24 using a Western blot approach (Fig. 2B, D and E), but also assessed the quantity of Cap24 in the supernatants and lysates using an ELISA measurement, the quantity of infectious particles using TZM reporter cells, and total viral transcription or more specifically Gag Pr55 transcription using qPCR (Fig. 2F, G and I and Supp. Fig. 2G).
  
  Regarding the quantification of CAp24 at the cell single level, please refer to comment #2 under Reviewer #1.
  
  Knockdown of SNAT7 in MDMs was partial at best; only 30-50% decrease in expression (Fig 2C), but the effects on viral gene expression (Fig. 2I), p24 release and infectious particle production is dramatic (Fig. 2F and G). This discrepancy is not addressed. Does SNAT7 knock-down negatively impact virus particle release? Please note that the representative WB in Fig 2B does not correlate with the quantification in Fig. 2D. There are no p55gag or p24gag bands in SNAT7#1 siRNA condition (Fig. 2B)? Data could also be rearranged to follow the logical sequence of virus replication cycle (viral RNa expression followed by Gag expression, and then release).
  
  We thank the reviewer for this comment. Our samples are indeed a mixture of SNAT7-depleted and non-depleted macrophages and RNA interference in these cells often leads to a decrease of 50 % of the protein expression.
  
  To determine whether SNAT7 is involved in the release of particles, we quantified Cap24 in cell lysates and in the cell culture medium separately, and normalized the results to the total protein content. The absence of SNAT7 reduced the amount of Cap24 measured by ELISA in both samples to the same extent, showing that there is no storage of Cap24-positive viral particles inside the infected macrophages. These data were initially pooled in one graph (Fig. 2F), but separate graphs are now provided in new Supp. Fig. 2 E, F.
  
  Regarding the western blot shown in Fig. 2B, please refer to comment #5 under Reviewer #1.
  
  In the new version of the manuscript, we arranged the figures and placed the later stages of the viral cycle in Fig. 2 and the earlier stages, such as fusion, reverse transcription and transcription, in Fig. 3.
  
  Data interpretation would be greatly improved by including infection controls (RT or integrase inhibitors) to confirm that measurements of viral RNA and Gag are indeed modulated by SNAT7 expression.
  
  We thank the reviewer for this suggestion to include inhibitors of viral replication as controls. In our experiments, cells were Mock-infected in parallel as a negative control of viral detection. We provide the results in the new version of the manuscript to show that (i) there is no detection of viral or Gag RNA in the absence of the virus, (ii) the expression of viral genes measured in HIV-1-infected SNAT7-depleted cells is not different from Mock-infected cells, indicating almost complete inhibition of viral transcription (Fig. 3H and Supp. Fig. 3B), also confirmed at the protein level (Fig. 2B, D-F).
  
  Figure 3: Decrease in SNAT7 expression in macrophages resulted in lower levels of early reverse transcripts. But surprisingly, LRT levels were not as affected by decreases in SNAT7 expression. The authors go on to suggest that decreases in early RT are due to loss of phospho-SAMHD1 and increases in catalytically active form of SAMHD1. Mechanistically this does not make sense: LRT should be similarly affected by increase in catalytically active SAMHD1. dNTP concentrations should be measured to determine if the rescue of RT is dependent on SAMHD1 dNTPase activity.
  
  We thank the reviewer for this comment. LRT concentrations are very low in human macrophages and more challenging to detect than ERT concentrations. This might explain why the differences observed between the SNAT7-depleted and control conditions appear less pronounced for LRT than for ERT.
  
  Furthermore, we cannot rule out the possibility that SNAT7 has a cumulative effect throughout the viral cycle. While reverse transcription remains statistically unaltered, and despite the reduced levels of ERT and LRT in SNAT7-depleted macrophages (Fig. 3 F, G), there is a significant impact on the transcription of viral RNAs (Fig. 2I) and Gag (Supp. Fig. 2G). This step may also be altered by the ribonuclease activity of SAMHD1 (Beloglazova et al., 2013; Ryoo et al., 2014).
  
  Finally, with the help of Dr Baek Kim in Atlanta, we attempted to quantify dNTP concentrations in our human macrophages. Unfortunately, it was not possible to draw any conclusions, as the concentrations of dNTPs extracted from our cells were far too low.
  
  Furthermore, it should be noted that SAMHD1 viral restriction through its phosphorylation at T592 is not correlated with its dNTPase activity (Welbourn et al., 2013; White et al., 2013), but with its ribonuclease activity (Beloglazova et al., 2013; Ryoo et al., 2014). This is supporting why SNAT7, by modulating the ribonuclease activity of SAMHD1, could have a greater effect on viral transcription than on reverse transcription.
  
  There is lack of consistency in the data: p24 release upon SNAT7 depletion is highly variable. While there is a dramatic >90-95% decrease in p24 release (Fig. 2G), the effects are much more moderate in Fig. 4H (50-60% attenuation), even though siRNA-mediated depletion was similar across the data sets. The authors should comment on the variability in their findings.
  
  We thank the reviewer for this comment, but believe that Figure 2E rather than Figure 2G is to be mentioned regarding the quantification of CAp24 by Western blot and to be compared with Figure 4H.
  
  In Fig. 2E, we observed an average reduction of 85 % in CAp24 expression normalized to Clathrin HC expression across different donors for both siRNAs targeting SNAT7. For Fig. 4H, there was a 73 % reduction in CAp24 levels for siRNA #1 and a 56 % reduction for siRNA #2. In addition, it should be noted that the reduction in Gag levels is greater in Fig. 4G (between 77 % and 83 %) than in Fig. 2D (between 55 % and 72 %).
  
  Therefore, there is some variation in the results obtained with the different donors, which could be explained by variations in Gag cleavage among donors, but this does not impact the conclusions for both figures.
  
  SNAT7 is postulated to affect 2 steps in the virus life cycle: reverse transcription and viral transcription. But Vpx-mediated SAMHD1 degradation reversed both. Its not clear to me as to how SAMHD1 degradation impacts the role of SNAT7 in viral transcription. No explanation is provided.
  
  We thank the reviewer for this comment. As suggested, we will perform experiments to assess the impact of Vpx-mediated SAMHD1 degradation on viral transcription.
  
  Exogenous addition of glutamine only partially restored Gag synthesis and p24 release, which could be attributed to increased cytoplasmic levels and viral protein synthesis. What about effects on reverse transcription and viral gene expression?
  
  We thank the reviewer for this comment. We will perform the suggested experiments to assess the impact of glutamine supplementation on viral transcription.
  
  Reviewer #2 (Significance (Required)):
  
  This is a novel finding, as there are limited number of studies on amino acid transporters and HIV-1 replication enhancement in macrophages. Most of the previous work has focused on CD4 T cells. These studies on SNAT7 and HIV-1 infection establishment in macrophages might better inform the influences of macrophage metabolism on HIV-1 persistence and inflammatory responses.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  This study investigates the role of the lysosomal glutamine transporter SLC38A7/SNAT7 in HIV‑1 replication in primary human macrophages. The authors demonstrate that SNAT7 is highly expressed in macrophages and upregulated upon HIV‑1 infection. They show that SNAT7 depletion inhibits HIV‑1 production at the reverse transcription step without affecting viral fusion or global cellular translation/transcription. Mechanistically, SNAT7 knockdown reduces the inhibitory phosphorylation of SAMHD1 at T592, and degradation of SAMHD1 by Vpx fully rescues viral replication. Extracellular glutamine supplementation partially restores HIV‑1 production in SNAT7‑deficient cells. Overall, the authors report interesting observations; however, the mechanistic investigation remains preliminary, raising concerns about whether the data fully support all the conclusions drawn. Major Concerns： 1. The mechanistic depth is insufficient. The authors do not elucidate how glutamine regulates SAMHD1 T592 phosphorylation, whether through metabolite‑mediated control of kinases/phosphatases or via indirect effects.
  
  We thank the reviewer for this comment. It is worth noting that (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity using drugs decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This is now further discussed in the discussion section of the manuscript.
  
  The authors do not measure intracellular dNTP levels upon SNAT7 knockdown, which is the key functional substrate of SAMHD1. They also do not directly demonstrate that glutamine supplementation restores dNTP pools.
  
  We thank the reviewer for this comment. Please, refer to comment #5 under Reviewer #2.
  
  Extracellular glutamine only partially rescues viral production, implying the existence of transport‑independent functions of SNAT7 or additional pathways. This important observation is not discussed.
  
  We thank the reviewer for this comment. The discussion has been modified accordingly.
  
  It is suggested that the key findings be validated in immortalized THP‑1 cells differentiated into macrophage‑like cells by PMA.
  
  We thank the reviewer for this suggestion but don’t really understand why this would strengthen our conclusions. Indeed, despite the known variability between donors and technical limitations to transduce cells, we chose human blood monocyte-derived macrophages as a relevant non-transformed model for HIV-1 infection of macrophages. They also represent to some extent the human diversity.
  
  The Discussion section should be expanded to include the potential translational implications and limitations of the present study.
  
  We thank the reviewer for this comment. The discussion points to some elements of potential translation and limitations of the study.
  
  Reviewer #3 (Significance (Required)):
  
  General assessment: This study identifies the lysosomal glutamine transporter SLC38A7/SNAT7 as a novel host dependency factor for HIV‑1 replication in primary human macrophages. The major strengths include the use of physiologically relevant primary macrophage models, a well-organized experimental pipeline from expression profiling to functional validation, and the establishment of a link between SNAT7, glutamine metabolism, and the HIV restriction factor SAMHD1.
  
  Advance: It extends current understanding of HIV‑1 host dependency factors and immunometabolism by revealing a compartment‑specific metabolic pathway that supports viral reverse transcription.
  
  Audience:This work will primarily interest specialized researchers in HIV‑1 biology, host-virus interactions, restriction factors, and antiviral innate immunity.
  
  2.15.1.0 Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  This study from the Niedergang lab establishes SNAT7 as a host-dependency factor in human macrophages that supports HIV-1 replication. They show a modest increase in SNAT7 levels HIV-1 infected macrophages and suggest that SNAT7 levels are transiently increased. Employing siRNA against SNAT7 they show reduction in HIV-1 protein levels and viral RNAs and claim that there is a block of reverse transcription in SNAT7 KD cells. Focusing on a known HIV-1 restriction factor in macrophages, SAMHD1, they interconnect the SNAT7 depletion with a reduction in phosphorylated, i.e. catalytical inactive SAMHD1 arguing that SNAT7 regulates the phosphorylation and thereby antiviral activity of SAMHD1. Since SNAT7 is a glutamine transporter that provides this AA from lysosomes, they lastly supplement glutamine and this somehow rescues the reduction of HIV-1 production in SNAT7 KD cells.
  
  Major comments:
  
  The strength of this manuscript is the clear focus on primary human macrophages that are HIV-1 infected and the interconnection of HIV-1 replication to the SNAT7 siRNA KD experiments in combination with SAMHD1 depletion and lastly glutamine supplementation. This establishes a stringent and coherent story line. The effects reported are modest; high variability is not a problem since using primary hMDM this is expected and can be addressed by testing several donors and applying stringent statistics.
  
  Having said so, I realize that while they give information on the statistical test used, i.e. one-way ANOVA they miss to explain the post-test used to assess significance (i.e. Bonferroni, Fishers LSD, whatsoever). Please add this information.
  
  We thank the reviewer for this comment. The figure legends have been updated to include more details of all the statistical tests used.
  
  Another issue that might underestimate the effects of HIV-1 infection on SNAT7 levels and vice versa of SNAT7 KD on HIV-1 replication is the non-single cell approach employed, i.e. WBlots. I assume that HIV-1 infection rates in macrophages are not super high, usually not exceeding 20-30%. So indeed the effects the authors observe could be much higher, when checking at the single cell level. I do not know about the SNAT7 ab, but all the other reagents should work via flow cytometry and could hence improve the readout a lot.
  
  We agree with the reviewer and indeed, in previous studies on HIV-1 infection of human macrophages performed in the lab, we observed via immunofluorescence that the proportion of infected cells ranged from 20 to 40 %. At the time of submission, we did not have the possibility to label the native SNAT7 protein by immunofluorescence, as the commercial antibody used only works for western blotting.
  
  In the meantime, we have been validating a new antibody (Proteintech) targeting SNAT7 for immunofluorescence. If this is confirmed, we will be able to detect and quantify HIV-1 p24 by immunofluorescence in SNAT7-depleted human macrophages and control cells, thus confirming our results in single-cell analysis.
  
  Flow cytometry analyses are difficult to perform on primary human macrophages because these cells are highly adherent and must be detached first. The process induces significant cell death and damage. This is why we would prefer to carry out these analyses using immunofluorescence and microscopy on adhered cells. This option will be undoubtedly pursued.
  
  Furthermore the authors never commented about a dose-response effect in terms of HIV-1 infection levels. There is a MOI dependency described for Suppl.Fig.1 C-F, unfortunately the data is missing in the manuscript.
  
  We apologize for this omission. The figures showing the increase in SNAT7 protein expression following HIV-1 infection at MOIs ranging from 0.05 to 0.5 were added to the new version of the manuscript (Supp. Fig. 1 C-F).
  
  Figure1: specify circulating T lymphocytes. I would expect to see levels of SNAT7 in PHA or CD3/CD28 activated lymphocytes versus resting T cells and a time course of SNAT7 levels upon activation. I think even though SNAT7 levels in T cells might be low, they could also be increased by HIV-1 infection and it is essential that the authors test for this. If not, the result is a valid negative control. For this they should employ HIV-1 primary strains with a tropism for T cells, or at least lab-adapted HIV-1 NL4-3
  
  We thank the reviewer for this comment. Circulating T lymphocytes isolated from the blood of healthy donors are now referred to resting lymphocytes in the new version of the manuscript, as opposed to activated T lymphocytes stimulated with IL2 and PHA-P for several days (Fig. 1 A-C).
  
  The expression levels of SNAT7, both at the gene and protein levels, are lower in resting or IL2/PHA-P-activated T cells than in macrophages from the same donors. As suggested, we will perform a kinetic of T-cell activation upon HIV-1 infection to investigate how SNAT7 expression varies in these conditions.
  
  Figure 2 again single cell measurements could reveal much more pronounced effects; it is a bit counterintuitive that siRNA #2 is more efficient in SNAT7 KD but has higher levels of HIV-1 replication in terms of Gag levels. I assume when looking at the stats it is always a comparison to the Ctl treated cells (C-G), but this is not entirely clear. Unify labeling as compared to the stats in Fig.2 I (this also applies for all the other figs).
  
  We thank the reviewer for this comment. Fig. 2B indeed shows one of the different donors analyzed. However, protein quantification across six different donors shows that SNAT7 is more depleted with siRNA #2 (Fig. 2C), and that Gag Pr55 protein levels are consequently more reduced, than with siRNA #1 (Fig. 2D).
  
  We use GraphPad Prism software to perform statistical analysis. Depending on the test used, the software automatically plots the comparison bar and displays the p-value above it. We changed the representation of statistics as suggested.
  
  Figure 3: It is a bit odd that they finally conclude on RT as essential step that is reduced in the absence of SNAT7 and then they fail to provide statistical significance for this (Fig.3 panels F and G). One would expect that RT is much more affected given the huge effects on HIV-1 capsid and particle production shown in Fig.2 F, G and I.
  
  The reviewer is right in pointing that we observed a stronger effect during the later stages of the viral cycle, from transcription of viral RNAs (Fig. 2I and Supp. Fig. 2G) to the production of viral particles in the supernatant (Fig. 2D-G), than during the earlier stage of reverse transcription (Fig. 3F, G). Also, it is also possible that we might have missed the peak in ERT/LRT production, which is transient.
  
  It should be noted that SAMHD1 exhibits both dNTPase (Goldstone et al., 2011) and nuclease (Beloglazova et al., 2013) activities. The ability of SAMHD1 to restrict the virus, through dephosphorylation at T592, is mediated by its RNase activity (Ryoo et al., 2014), and not by the dNTPase activity (Welbourn et al., 2013; White et al., 2013).This could explain why SNAT7 exhibit a stronger impact on viral transcription than on reverse transcription.
  
  Figure 4; again single cell flow measurements of SAMHD1, pSAMHD1 and p24 /SNAT7 might help to more clearly discriminate effects that are specifically induced upon infection or happen in virally infected cells. Maybe alternatively IF?
  
  We thank the reviewer for this suggestion. As mentioned under comment #2, flow cytometry analyses are difficult to perform on strongly adherent primary human macrophages.
  
  With regard to immunofluorescence, there is a technical limitation based on the species in which the antibodies are produced. The antibody that targets the native SNAT7 protein, which is currently being validated in our laboratory, is produced in rabbits. An anti-CAp24 antibody produced in goats can be used. It will then be necessary to co-label the cells with anti SAMHD1 and phospho-SAMHD1produced in mouse. We will try to find options to co-label the cells.
  
  The wblot shown in panel D does not really reflect the point the authors want to make by the quantification in panels G-I. Primary data (D) suggests that SNAT7 KD reduces HIV-1 production even in the absence of SAMHD1. The quantification rather indicates that SNAT7 KD does not affect HIV-1 production in the absence of SAMHD1. This needs clarification/corroboration by orthogonal approaches.
  
  We respectfully disagree with the reviewer.
  
  Figure 4D shows a representative blot of the six donors analysed. As mentioned, the depletion of SNAT7 in the absence of SAMHD1 reduces the production of the viral proteins GagPr55 and CAp24 (see Fig. 4D). This is illustrated by the quantifications (Fig. 4G–I). Following treatment with Vpx, GagPr55 protein expression in SNAT7 KD macrophages is reduced by a factor of 2.6 for siRNA #1 (mean = 1.48, light grey bar) and by a factor of 1.83 for siRNA #2 (mean = 2.13, orange bar), compared to the control (mean = 3.9, pink bar) (Fig. 4G). Similarly, CAp24 protein expression was reduced by a factor of 2.2 for siRNA #1 (mean = 2.05, light grey bar) and by a factor of 1.36 for siRNA #2 (mean = 3.34, orange bar), compared to the control (mean = 4.52, pink bar) (Fig. 4H).
  
  These differences are therefore consistent between the Western blot and the quantifications. However, they are not significantly different to those observed in cells treated with Vpx and depleted with control siRNA, suggesting that the viral restriction observed in SNAT7 KD cells is primarily due to SAMHD1.
  
  Figure 5: show SAMHD1 and pSAMHD1 levels upon glutamine supplementation.
  
  We thank the reviewer for this comment, we will perform the suggested experiment.
  
  I think the discussion is very thin, mainly summarizing the results; but fails to give broader context or critically discuss the limitations and further directions.
  
  We thank the reviewer for this comment. The discussion will be modified further accordingly.
  
  Looking at the data as a whole, I think the results support a modest functional importance of SNAT7 for HIV-1 production in macrophages. I acknowledge that the experiments in primary macrophages are prone to high variability in different donors and the authors transparently depicted their data. However clearly, I would advice the authors to tune down the extend in which they claim SNAT7-dependency given this huge variability and the sometimes-borderline statistics. We respectfully disagree with the reviewer.
  
  The cells used here imply greater variability than a cell line, but are also more relevant.
  
  Indeed, the effects observed in the late stages of HIV-1 production are:
  
  ~80 % decrease in viral transcription compared to the control (Fig. 2I),
  
  ~85 % decrease in CAp24 protein expression compared to the control, as quantified by western blot (Fig. 2E), or ~90 % by ELISA measurement (Fig. 2F),
  
  a reduction of more than 90 % in the release of infectious particles (Fig. 2G).
  
  These results were all significant across donors, while SNAT7 depletion was always partial (Fig. 2C, between 31 to 62 % of depletion compared to the control in infected cells).
  
  Therefore, the data were obtained from a mixture of depleted and non-depleted macrophages. This means that the results may be underestimated.
  
  Together, our results show that SNAT7 is necessary for HIV-1 production.
  
  However, reading the comments, we realized that our conclusions regarding reverse transcription were too strong. SNAT7 depletion does not affect viral fusion and reverse transcription. The manuscript was modified accordingly.
  
  On top, there are a lot of optional experiments I am sure the authors are aware of that should be done at least in the future.
  
  For instance, how does HIV-1 upregulate SNAT7, is a viral accessory protein involved? What is the mechanism of SNAT7 dependent SAMHD1 phosphorylation? Does SNAT7 (or glutamine) regulate the activity of the SAMHD1 associated kinase / phosphatase) If so, does this impact on other targets of these enzymes? We thank the reviewer for these questions.
  
  To address the role of accessory viral proteins, we have already performed one experiment infecting hMDM with HIV-1 strains deleted for genes such as Nef, Vpr, Vpu and Vif, and have found no clear effect on SNAT7 protein expression compared to WT strains. As an alternative experiment, we could overexpress individual viral genes, such as Nef or Vpr, in HeLa cells and analyze their impact on SNAT7 expression by Western blot.
  
  It is also possible that SNAT7 expression and recycling of lysosomal glutamine are modulated by the macrophage intrinsic immunity in response to HIV-1 infection.
  
  The Thr592 motif of the SAMHD1 protein is phosphorylated by Cyclin A2/CDK1 and type 1 IFN in non-cycling cells, such as MDMs (Cribier et al., 2013). For now, the relationship between SNAT7 and SAMHD1 remains unclear. However, (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This has been added to the discussion to explain the relationship between the 3 partners.
  
  **Referees cross-commenting** I think the comments from the other referees are reasonable and consistent with my assessment
  
  Reviewer #1 (Significance (Required)):
  
  Strength and limitations see above;
  
  Significance: I think this work is of high interest for virologists working in the field of HIV-1 and infection of myeloid cells. In case SNAT7 (and hence glutamine) indeed regulates the phosphorylation of SAMHD1, there could potentially be broad relevance of this work. However unfortunately, this aspect remains underdeveloped and is also not discussed
  
  Field of expertise: HIV-1, immunology, cell biology
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  In this report, Herit and colleagues describe the role of a HIV-1 dependency factor that promotes virus replication in macrophages. The authors suggest that the lysosomal membrane-associated SNAT7 glutamine transporter is a HIV dependency factor, that promotes virus replication by enhancing reverse transcription and Gag synthesis. The authors use transient knock-down approaches in primary macrophages to identify that SNAT7 depletion does not impact viral entry but inhibits early reverse transcription which was reversed by exogenous glutamine addition. While reverse transcription enhancement was likely due to selective increase in phosho-SAMHD1 expression, mechanisms by which SNAT7 enhanced viral gene expression were not clearly defined. These are well-controlled studies that pinpoint the role of SNAT7 in the early steps of viral life cycle and highlight the intricate interplay between macrophage metabolism and HIV-1 replication. While the question that is addressed is important, and the hypothesis overall sound, the data presented needs to be strengthened to support the conclusions. There are numerous weaknesses in data interpretation as well.
  
  Figure 1: SNAT7 expression was selectively enhanced upon differentiation of monocytes into macrophages but absent in CD4+ T cells. Though there is a claim of enhancement of SNAT7 expression upon HIV-1 infection of macrophages, RT-qPCR analysis shows the opposite trend (Fig 1E) and SNAT7 protein expression changes are modest. Statistical analysis in Fig. 1H needs to be revisited. The number of replicates vary for the lysates harvested at different day post infection, which might have an impact on the statistical test. To determine if SNAT7 expression enhancement is dependent on establishment of virus infection, as the authors imply, control lysates of virus infections in presence of replication inhibitors should be included.
  
  We thank the reviewer for this comment. Indeed, there is a modest, but statistically significant increase in SNAT7 protein expression upon HIV-1 infection over time (Fig. 1G, H), without any modulation of SNAT7 gene expression (Fig. 1E). This indicates that the regulation of SNAT7 expression in this context is only at the translation level (i.e. increase of translation or stabilization of the SNAT7 protein).
  
  As mentioned, Fig. 1H aggregates between 3 to 7 independent experiments on different donors depending on the infection time point. SNAT7 protein expression is increased already at 1 day post-infection and until 8 days. The statistical test used here, i.e. 2 way-ANOVA, compared Mock-infected and HIV-1-infected condition for each time point with the same number of donors. In this figure, the comparison is statistically different only at day 6 of the time course (7 donors). We agree that increasing the number of donors of the other time points could help to improve the statistical difference between control and infection condition.
  
  We thank the reviewer for the suggestion mentioning the use of replication inhibitors in this experiment. We plan to use inhibitors of reverse transcription (Nevirapin) and integration (Dolutegravir).
  
  The authors rely exclusively on western blot analysis for HIV-1 Gag expression in cell lysates as a measure of effects of SNAT7 on virus replication. Single cell analysis such as intracellular p24gag analysis by FACS should be included; this will provide a better measure of effects of SNAT7 onHIV-1 infection establishment.
  
  We respectfully disagree with the reviewer for this question. Indeed, to evaluate the effects of SNAT7 on HIV-1 replication, we measured Gag Pr55 and Cap24 using a Western blot approach (Fig. 2B, D and E), but also assessed the quantity of Cap24 in the supernatants and lysates using an ELISA measurement, the quantity of infectious particles using TZM reporter cells, and total viral transcription or more specifically Gag Pr55 transcription using qPCR (Fig. 2F, G and I and Supp. Fig. 2G).
  
  Regarding the quantification of CAp24 at the cell single level, please refer to comment #2 under Reviewer #1.
  
  Knockdown of SNAT7 in MDMs was partial at best; only 30-50% decrease in expression (Fig 2C), but the effects on viral gene expression (Fig. 2I), p24 release and infectious particle production is dramatic (Fig. 2F and G). This discrepancy is not addressed. Does SNAT7 knock-down negatively impact virus particle release? Please note that the representative WB in Fig 2B does not correlate with the quantification in Fig. 2D. There are no p55gag or p24gag bands in SNAT7#1 siRNA condition (Fig. 2B)? Data could also be rearranged to follow the logical sequence of virus replication cycle (viral RNa expression followed by Gag expression, and then release).
  
  We thank the reviewer for this comment. Our samples are indeed a mixture of SNAT7-depleted and non-depleted macrophages and RNA interference in these cells often leads to a decrease of 50 % of the protein expression.
  
  To determine whether SNAT7 is involved in the release of particles, we quantified Cap24 in cell lysates and in the cell culture medium separately, and normalized the results to the total protein content. The absence of SNAT7 reduced the amount of Cap24 measured by ELISA in both samples to the same extent, showing that there is no storage of Cap24-positive viral particles inside the infected macrophages. These data were initially pooled in one graph (Fig. 2F), but separate graphs are now provided in new Supp. Fig. 2 E, F.
  
  Regarding the western blot shown in Fig. 2B, please refer to comment #5 under Reviewer #1.
  
  In the new version of the manuscript, we arranged the figures and placed the later stages of the viral cycle in Fig. 2 and the earlier stages, such as fusion, reverse transcription and transcription, in Fig. 3.
  
  Data interpretation would be greatly improved by including infection controls (RT or integrase inhibitors) to confirm that measurements of viral RNA and Gag are indeed modulated by SNAT7 expression.
  
  We thank the reviewer for this suggestion to include inhibitors of viral replication as controls. In our experiments, cells were Mock-infected in parallel as a negative control of viral detection. We provide the results in the new version of the manuscript to show that (i) there is no detection of viral or Gag RNA in the absence of the virus, (ii) the expression of viral genes measured in HIV-1-infected SNAT7-depleted cells is not different from Mock-infected cells, indicating almost complete inhibition of viral transcription (Fig. 3H and Supp. Fig. 3B), also confirmed at the protein level (Fig. 2B, D-F).
  
  Figure 3: Decrease in SNAT7 expression in macrophages resulted in lower levels of early reverse transcripts. But surprisingly, LRT levels were not as affected by decreases in SNAT7 expression. The authors go on to suggest that decreases in early RT are due to loss of phospho-SAMHD1 and increases in catalytically active form of SAMHD1. Mechanistically this does not make sense: LRT should be similarly affected by increase in catalytically active SAMHD1. dNTP concentrations should be measured to determine if the rescue of RT is dependent on SAMHD1 dNTPase activity.
  
  We thank the reviewer for this comment. LRT concentrations are very low in human macrophages and more challenging to detect than ERT concentrations. This might explain why the differences observed between the SNAT7-depleted and control conditions appear less pronounced for LRT than for ERT.
  
  Furthermore, we cannot rule out the possibility that SNAT7 has a cumulative effect throughout the viral cycle. While reverse transcription remains statistically unaltered, and despite the reduced levels of ERT and LRT in SNAT7-depleted macrophages (Fig. 3 F, G), there is a significant impact on the transcription of viral RNAs (Fig. 2I) and Gag (Supp. Fig. 2G). This step may also be altered by the ribonuclease activity of SAMHD1 (Beloglazova et al., 2013; Ryoo et al., 2014).
  
  Finally, with the help of Dr Baek Kim in Atlanta, we attempted to quantify dNTP concentrations in our human macrophages. Unfortunately, it was not possible to draw any conclusions, as the concentrations of dNTPs extracted from our cells were far too low.
  
  Furthermore, it should be noted that SAMHD1 viral restriction through its phosphorylation at T592 is not correlated with its dNTPase activity (Welbourn et al., 2013; White et al., 2013), but with its ribonuclease activity (Beloglazova et al., 2013; Ryoo et al., 2014). This is supporting why SNAT7, by modulating the ribonuclease activity of SAMHD1, could have a greater effect on viral transcription than on reverse transcription.
  
  There is lack of consistency in the data: p24 release upon SNAT7 depletion is highly variable. While there is a dramatic >90-95% decrease in p24 release (Fig. 2G), the effects are much more moderate in Fig. 4H (50-60% attenuation), even though siRNA-mediated depletion was similar across the data sets. The authors should comment on the variability in their findings.
  
  We thank the reviewer for this comment, but believe that Figure 2E rather than Figure 2G is to be mentioned regarding the quantification of CAp24 by Western blot and to be compared with Figure 4H.
  
  In Fig. 2E, we observed an average reduction of 85 % in CAp24 expression normalized to Clathrin HC expression across different donors for both siRNAs targeting SNAT7. For Fig. 4H, there was a 73 % reduction in CAp24 levels for siRNA #1 and a 56 % reduction for siRNA #2. In addition, it should be noted that the reduction in Gag levels is greater in Fig. 4G (between 77 % and 83 %) than in Fig. 2D (between 55 % and 72 %).
  
  Therefore, there is some variation in the results obtained with the different donors, which could be explained by variations in Gag cleavage among donors, but this does not impact the conclusions for both figures.
  
  SNAT7 is postulated to affect 2 steps in the virus life cycle: reverse transcription and viral transcription. But Vpx-mediated SAMHD1 degradation reversed both. Its not clear to me as to how SAMHD1 degradation impacts the role of SNAT7 in viral transcription. No explanation is provided.
  
  We thank the reviewer for this comment. As suggested, we will perform experiments to assess the impact of Vpx-mediated SAMHD1 degradation on viral transcription.
  
  Exogenous addition of glutamine only partially restored Gag synthesis and p24 release, which could be attributed to increased cytoplasmic levels and viral protein synthesis. What about effects on reverse transcription and viral gene expression?
  
  We thank the reviewer for this comment. We will perform the suggested experiments to assess the impact of glutamine supplementation on viral transcription.
  
  Reviewer #2 (Significance (Required)):
  
  This is a novel finding, as there are limited number of studies on amino acid transporters and HIV-1 replication enhancement in macrophages. Most of the previous work has focused on CD4 T cells. These studies on SNAT7 and HIV-1 infection establishment in macrophages might better inform the influences of macrophage metabolism on HIV-1 persistence and inflammatory responses.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  This study investigates the role of the lysosomal glutamine transporter SLC38A7/SNAT7 in HIV‑1 replication in primary human macrophages. The authors demonstrate that SNAT7 is highly expressed in macrophages and upregulated upon HIV‑1 infection. They show that SNAT7 depletion inhibits HIV‑1 production at the reverse transcription step without affecting viral fusion or global cellular translation/transcription. Mechanistically, SNAT7 knockdown reduces the inhibitory phosphorylation of SAMHD1 at T592, and degradation of SAMHD1 by Vpx fully rescues viral replication. Extracellular glutamine supplementation partially restores HIV‑1 production in SNAT7‑deficient cells. Overall, the authors report interesting observations; however, the mechanistic investigation remains preliminary, raising concerns about whether the data fully support all the conclusions drawn. Major Concerns： 1. The mechanistic depth is insufficient. The authors do not elucidate how glutamine regulates SAMHD1 T592 phosphorylation, whether through metabolite‑mediated control of kinases/phosphatases or via indirect effects.
  
  We thank the reviewer for this comment. It is worth noting that (Meng et al., 2022) demonstrated that SNAT7 positively regulates mTORC1 activity at the lysosomal membrane through release of lysosomal glutamine, and (Dias et al., 2024) showed that inhibiting mTORC1 activity using drugs decreases SAMHD1 Thr592 phosphorylation in hMDM. Therefore, we could speculate that the absence of SNAT7 down-regulates mTORC1 activity, which then leads to decreased SAMHD1 phosphorylation. This is now further discussed in the discussion section of the manuscript.
  
  The authors do not measure intracellular dNTP levels upon SNAT7 knockdown, which is the key functional substrate of SAMHD1. They also do not directly demonstrate that glutamine supplementation restores dNTP pools.
  
  We thank the reviewer for this comment. Please, refer to comment #5 under Reviewer #2.
  
  Extracellular glutamine only partially rescues viral production, implying the existence of transport‑independent functions of SNAT7 or additional pathways. This important observation is not discussed.
  
  We thank the reviewer for this comment. The discussion has been modified accordingly.
  
  It is suggested that the key findings be validated in immortalized THP‑1 cells differentiated into macrophage‑like cells by PMA.
  
  We thank the reviewer for this suggestion but don’t really understand why this would strengthen our conclusions. Indeed, despite the known variability between donors and technical limitations to transduce cells, we chose human blood monocyte-derived macrophages as a relevant non-transformed model for HIV-1 infection of macrophages. They also represent to some extent the human diversity.
  
  The Discussion section should be expanded to include the potential translational implications and limitations of the present study.
  
  We thank the reviewer for this comment. The discussion points to some elements of potential translation and limitations of the study.
  
  Reviewer #3 (Significance (Required)):
  
  General assessment: This study identifies the lysosomal glutamine transporter SLC38A7/SNAT7 as a novel host dependency factor for HIV‑1 replication in primary human macrophages. The major strengths include the use of physiologically relevant primary macrophage models, a well-organized experimental pipeline from expression profiling to functional validation, and the establishment of a link between SNAT7, glutamine metabolism, and the HIV restriction factor SAMHD1.
  
  Advance: It extends current understanding of HIV‑1 host dependency factors and immunometabolism by revealing a compartment‑specific metabolic pathway that supports viral reverse transcription.
  
  Audience:This work will primarily interest specialized researchers in HIV‑1 biology, host-virus interactions, restriction factors, and antiviral innate immunity.
  
  2.15.1.0
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.03.06.709337
ethanzuckerman.com ethanzuckerman.com

Gramsci's Nightmare: AI, Platform Power and the Automation of Cultural Hegemony - Ethan Zuckerman

1
1. badriahajjar 03 Jul 2026
  
  in Public
  
  AIs rarely admit they don’t know something, instead they paper the absence over with something they do know. We may not be able to answer questions about how Indonesians see the world, but LLMs will happily disguise those useful absences with opinions of how Americans imagine Indonesians see the world.
  
  I think AI literacy or awareness needs to be more popular than it currently is. There is this burden on us as individuals to be vigilant of AI yet it has been embraced so quickly across several aspects of life. AI literacy needs to be a priority.
Visit annotations in context

Annotators

badriahajjar

URL

ethanzuckerman.com/2025/12/05/gramscis-nightmare-ai-platform-power-and-the-automation-of-cultural-hegemony/
arxiv.org arxiv.org

If Grid Cells are the Answer, What is the Question? A Review of Normative Grid Cell Theory

2
1. Public_Reviews 03 Jul 2026
  
  in eLife (unscoped)
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This review by Dorrell and Whittington covers a number of aspects related to normative modeling of grid cells. They begin by discussing key experimental insights on grid cell phenomenology. Then, they discuss how grid cells can be used to perform path integration and how they size up as efficient codes of space. These two sections then lead the authors to discuss how combining path integration and efficient coding objectives leads to models of axis-aligned grid cells in a single module. Discussion on non-linear objectives leading to multi-modules is presented. The review ends with several outstanding questions and an optimistic outlook of how normative models (particularly, task-optimized RNNs) can be used as tools for advancing understanding in neuroscience.
  
  Strengths:
  
  (1) The review is timely and covers an area that has seen a lot of recent activity. This discussion around many of the different results (and kinds of models), I think, will be generally helpful for the field.
  
  (2) Although I think the story could be a little more coherently made (see below), in general I enjoyed the author's flow from efficient coding -> efficient coding + path integration -> efficient coding + path integration + non-linear objective. This framing supports the specific conclusion the authors arrive at.
  
  (3) I also really liked the message that the review made of how normative modeling, despite some of its challenges/limitations, can be used effectively in neuroscience. The discussion of cycling between "experimental" modeling (e.g., vanilla RNNs) and theoretically-grounded models was nice, and I think it helps demonstrate the value of this approach.
  
  (4) Showing how the metric loss could be seen as a bandpass filter (Figure 3C) was nice and a contribution of the review.
  
  (5) While the focus of P4 (conjunctive HD-grid cells) felt initially a little cast aside, the discussion around "brain and task-optimised RNNs with standard architectural choices use fundamentally different path-integration mechanism" was nice and I think helpful for steering the community to an interesting open problem.
  
  (6) Identifying how "non-linear functionality" can lead to multi-modules was nice and not something that I have seen as clearly presented before.
  
  Weaknesses:
  
  (1) The authors view the experimental evidence for grid cells being linked to path integration as "specific and strong" and that the " key computational feature that defines entorhinal cortex [is] path-integration". I think experimentalists (at least the ones I work with) would push back on that. First, it's hard to isolate path integration in rodent experiments. So while Gil et al. (2018) did about as good a job as you could do, there are still other interpretations of the results that are not purely path integration dependent. And second, as the authors point out later in the review, there is experimental work finding that grid cells are disrupted in large environments and 3D. Path integration certainly happens (to some extent) in these spaces, which begs the question of how it is achieved with weakened grid coding. Thus, I think reducing the claims about how strongly grid cells are experimentally linked to path integration is called for.
  
  (2) The authors introduce the idea of efficient coding of space and discuss how grid cells are not optimal. It is later clarified (Sec. 5.3) that multi-module codes can be efficient (even if not the most optimal). I was confused reading Section 3, because in Section 2 the multiple modules are discussed, but then in Section 3, they are dropped, and only a single module is being considered. Equation 2 was also a little confusing to me. Alpha is not defined, and I would have thought that it would be x^Tx' - g(x)^T g(x') and not x^Tx' g(x)^T g(x'). Given that there is no page limit here, I think a little more detail in Section 3 would be helpful.
  
  (3) In Section 3, the authors make use of P2 (translation invariance within a module) to rule out (or, at least, question) certain models/approaches. While this is certainly a standard assumption made in theoretical work, it is not very well supported by experimental findings. In particular, Diehl et al. (2017), Ismakov et al. (2017), and Dunn et al. (2017) all found that individual grid fields systematically vary in their peak firing rate. In addition, Redman et al. (2025) found that, within a given module, there was a small but robust diversity of grid orientations and spacings. These suggest that grid cells within a single module may actually be able to encode properties of local space and give some support to normative models that find efficient space coding with grid cells by finding non-axis-aligned grid fields. I think this is all important to mention because: a) it provides more biological nuance to the question about spatial coding; b) it provides more ways in which to test models. For instance, in Redman et al. (2025), the Sorscher et al. (2022) model was shown to produce variability in grid properties that loosely matched what was found in real data. For tests like this (e.g., how much does a model reproduce variability in grid firing field peak rates), I think it is going to be important for continuing to evaluate models.
  
  (4) The focus of the review, I know, is grid cells, but of course, grid cells are part of the MEC and the larger hippocampal network. I totally understand, at some level, you have to make a decision of what to model, but it seems that there are other functional classes of neurons (border cells, head direction cells) that all play an important role in path integration. And while the models the authors consider at the end of the review capture properties of grid cells really well, they do so at the cost of not modeling anything else. The authors mention this in the context of the models not capturing conjunctive grid-head direction cells, but I think the point is a deeper one, and more discussion of at what level it makes sense to consider grid cells only is important.
  
  (5) As I mentioned in the Strengths section, I did enjoy the flow of the paper on how path integration + efficiency is needed to get grid single modules and path integration + efficiency + non-linearity is needed to get multiple grid modules. This creates the story that adding more of these theory-driven constraints helps lead to more "accurate" models of grid cells. But one alternative view is that, if path integration + efficiency is enough to get a single grid module (but only a single grid module), then maybe the utility (or need) of multiple grid modules comes from something else. That is, instead of saying "we need more constraints to get multiple modules", it could be evidence for "we need to re-think whether multiple modules might need a different theory to explain". While I understand this is a big picture question that maybe isn't entirely fair to ask of the authors, I think: 1) the authors do a nice job of positioning their review as a kind of discussion on what normative modeling can provide to neuroscience, so having this discussion on when the failure of a model to capture ALL aspects of the biological features motivates further constraints as opposed to a new approach, would be useful; 2) this question connects with the title of the paper, i.e. "what is the question?"
  
  Review 2
2. Public_Reviews 03 Jul 2026
  
  in eLife (unscoped)
  
  Author response:
  
  We thank the reviewers for their time and attention which will significantly improve the paper. Further, we are grateful for their appreciation of our goals and work. In sum, the reviewers point to our overstated discussion of experimental evidence which we will tone down, some slightly confusing points of argumentation which we will clarify, and some discussion points on the role of normative theories that we will add text to address. We believe this will improve the paper significantly and hope you agree!
  
  Major Concern: Experimental Support for Path-Integration is not as strong as suggested
  
  The major point raised by all reviewers (reviewer 1 comment 1, reviewer 2 comment 1, reviewer 3’s only weakness) was that our presentation of the experimental perturbation evidence for path-integration is stronger than the reality. On reflection, we agree with this evaluation. We thank the reviewers for raising it; we will moderate our writing and include the sensible caveats raised. In sum, we still think that the convergence of evidence points to path-integration: first, disruptions to grid cells lead to path-integration problems, though these perturbations admittedly aren’t perfectly precise; second, normative theories of path-integration lead to grid cells and predict grid cell behaviour; third, mechanistic models of path-integration match grid cell behaviour and predict connectivity subsequently measured in entorhinal cortex. However, the evidence is not as all-encompassing as we suggested.
  
  That said, we’d like to further comment on one point. It is argued (reviewer 1, comment 1) that there are other theories of grid cell function, and that we discuss these theories. We discuss efficient-coding only models of grid cells and emphasise strongly why we reject them. We also briefly discuss oscillatory-interference models of path-integration and our reasons for not pursuing them further. As such, the reviewer is correct that our reading of literature strongly points us towards path-integration rather than other theories. We will slightly change the framing of the paper to make it clear that we are making a case. However, we are not aware of other theories the reviewer might be referring to. If the reviewer can point us to the other suggested theories that we do not address we would be happy to evaluate and include them.
  
  We now turn to the remaining comments, and how we plan to address them.
  
  Reviewer 1, Comment 2 – There could be multiple roles for grid cells
  
  The reviewer is indeed right that grid cells might perform multiple functions. This could just mean that the same computational motif (e.g. path-integration) is reused across different computations though that introduces no changes to the required normative theory. A stronger claim would be that grid cells perform both path-integration and some other function. This, according to a normative perspective, would most likely change how grid cells were optimally structured. We use the fact that large parts of the grid cell code can be captured with only path-integration as an argument against additional roles for grid cells. That said, there exist properties of grid cells not well-captured by path-integration which could well be smoking guns for additional roles of grid cells. The review already discusses both discrepancies between grid cells in three and two dimensions, and inhomogeneities in the grid in complex environments, and we will add two more (heading direction and peak-to-peak/angular variability, discussed below) that we are grateful to the reviewers for raising, and we discuss each of these in detail below.
  
  That said, whether these are necessarily arguments against purely path-integration or a reflection of interesting mappings of the core path-integration mechanism to the measurements we make remains to be seen. We would argue that both 3D grid cells (as explained below: there appear to be 2D slices in which grid cells behave as you’d expect) and spatial inhomogeneities (as explained in the paper: mappings of torus to world can introduce warping) can be explained without reference to additional computational roles of grid cells, which remain to us the most parsimonious explanation. We discuss next the slight update to path-integration only that the heading direction story suggest. But in sum, our view is that these discrepancies are likely not fatal for our path-integration-centric view of grid cells, but may well suggest some very interesting clarifications.
  
  Reviewer 1, Comment 4 – The system has two heading signals: true & internal, why?
  
  The reviewer is right to point to the puzzle over true vs. purely internal heading direction and which drives grid cells. We believe recent work from Abraham Vollan has effectively solved this puzzle: there appear to be two parallel circuits, one theta-modulated and following internal heading direction, another theta-unmodulated and aligning more with true heading direction. We will make sure to include discussion of this exciting work in our revised submission. This serves as a good example of an update we concede to the most austere version of the path-integration only view. Rather, it seems there are two parallel path-integrators working with different heading signals. The reasons for this remain unclear, but seem to be related to attention and planning (Vollan et al. 2026).
  
  Reviewer 2, Comment 3: Real Grid Cells have peak-to-peak variability & Angular variability
  
  The reviewer is right to point to the discrepancy in peak-to-peak firing rate and angles within a module that we did not adequately address. First, it is Sorscher’s RNN models, not nonnegative PCA that can generate a distribution of grid angles (Redman et al. 2025), which suggests that path-integration and such variability are compatible. We emphasise this point because the non-path-integration results from nonnegative PCA produce grid cells oriented at 30 degree offsets, something not measured even when you’re careful as in Redman et al. 2025. Thus, this becomes an interesting target for future work: perhaps using theories of path-integration up to an error threshold (rather than perfect) such angular diversity would be recovered. We will include this in our discussion. Further, we will include discussion of peak-to-peak variability that, as yet, has no obvious role.
  
  Reviewer 2, Comment 1: grid cells are inhomogeneous in 3D or complex environments, doesn’t that break the theory?
  
  Disrupted grid coding in extended or 3D environments indeed deserve more discussion, which we will add. In particular, we will add recent evidence that grid cells in 3D can be understood via the correct sequence of 2D projections(Qi & Yartsev, 2026). These two phenomena seem, to us, consistent with a path-integration only view of grid cells, as discussed above, and we hope to make this position clearer.
  
  Reviewer 2, Comment 5: Couldn’t there be other reasons for multiple modules?
  
  We have suggested a consistent normative framework in which multiple modules are explained through their role in non-linear coding. We think this elegant, and the most parsimonious current theory. We could, of course, be wrong. The discrepancies pointed to above might be good clues to follow to work out what else these modules might be doing, but currently these alternative explanations seem not to exist. We will text to clarify this.
  
  Reviewer 1, Comment 3: The review confuses computational and parameter parts of normative theory
  
  We disagree with the reviewer’s dichotomisation of normative theory. We view a normative theory as the complete procedure that produces the predictions. Almost all such theories have parameters and hence fitting a theory to data comprises both elements (a) [computational role] and (b) [specific parameters] identified by the reviewer. Occasionally theories have no parameters in the traditional sense, e.g. Rebecca et al.; instead they have heavy assumptions that play an equivalent role. It is true that, as the reviewer says, Sorscher et al.’s work was criticised for producing grid cells only for specific parameter values. We never found this as damning as Schaeffer et al. argued: simply it says that that theory is only correct within the given parameter range. Rather, arbitrating between models, parameters, or assumptions seems the same basic process: see what they predict and keep working with models while they remain useful ways to understand measured phenomena. If a model with very specific parameter values remains useful, that seems okay. In fact, we argued extensively why we think the nonnegative PCA model is not a useful model, but this was for completely different reasons. To us this story just reinforces the importance of hygiene in normative research: perform parameter sweeps and clarify how they constrain the claims you are making, carefully arbitrate what models can capture. Indeed, that is the whole goal of this review. We might be misunderstanding and, if so, we welcome correction.
  
  Reviewer 2, Comment 4: Normative Models of Cells Beyond Grid Cells
  
  The reviewer is right that extending these models to other cell types is an interesting area for further work, and that other cell types do seem to be involved in aspects of navigational computations both in RNNs and the brain. We will include a discussion to this effect in the revised manuscript. That said, we think the modularity of grid cells and their tight-linking to path-integration calculations should also be appreciated as a win!
  
  Reviewer 2, Comment 2: Multi-modularity is not cleanly explained
  
  We thank the reviewer for the comments, we agree. We will clarify the story regarding multiple modules, and will explain the equation further.
  
  Reviewer 1, Comment 5: the early introduction of phase-shifted Grid Cells seem the perfect place to normatively argue for Path-integration!
  
  We agree with the reviewer that this point can be made both normatively (‘oh look! If I try to do this optimally, I get translations!’) or, as we did early in the paper, mechanistically (‘oh look! With these cells I can do this!’). Indeed, a large part of the point of our paper is that path-integration is what is required to normatively derive phase-shifted grid modules, something discussed by Rebecca et al., our earlier work, and RNN studies, and appreciated for two decades. The earlier part of the paper does not discuss these papers as that section is aimed at giving intuition for the solution (mechanism). Later sections then heavily discuss the normative angle. We hope that division of labour makes sense.
  
  Finally, we will refine our summary of Rebecca et al. The reviewer is right that neurons don’t have to be discrete, we apologise for that error, but our understanding is that the only meaningful role of a neuron in Rebecca et al.’s work is the region in which is active, effectively making every neuron a binary unit, which seems dubious. We will clarify that by “predict velocity from each current and next encoding” we mean that the normative constraint they enforce is axiom 1: sequential activity of sets of neurons i then j can be uniquely interpreted as a trajectory, i.e. a step or velocity. Their work is elegant, and we will try to do more justice to it in the revision.
  
  To conclude, we thank the reviewers for their extensive comments, and look forward to releasing a version that addresses their concerns.
  
  AuthorResponse
Visit annotations in context

Tags

Review 2

AuthorResponse

Annotators

Public_Reviews

URL

arxiv.org/abs/2601.12424
www.biorxiv.org www.biorxiv.org

CROP2, a Retriever-PROPPIN Complex Mediating Protein Export from Endosomes to the Plasma Membrane

1
1. Public_Reviews 03 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  WIPI1 is a PROPPIN family protein that has been implicated in Retromer-mediated membrane fission events. Although the cargos that it has been tested to be important for are diverse, one of the cargos that is unaffected is Beta1-Integrin. This leads the authors to assess another PROPPIN family protein - WIPI2, which is a homolog of WIPI1. KD using siRNA is effective and had no consequences on LAMP1, EGFR trafficking or GLUT1 trafficking. Integrin-B1, however, had a large and significant defect in its recycling from the endosome, with a clear endosomal colocalisation. Complementation experiments with WT WIPI2 recovered the phenotype, but various mutant WIPI2 complements resulted in elongated tubules, and there was also a dominant negative effect of the mutant. Integrin is a classic retreiver cargo, so the authors rationalise that WIPI2 may be playing a role with retreiver that WIPI1 plays with retromer. To assess this, they perform a set of immunoprecipitations. SNX17, the retreiver-associated sorting nexin, co-IPs with WIPI2 in a VPS26C-dependent manner. VPS26C but not VPS26 co-IPs with WIPI2, and the reciprocal with WIPI1. These interactions were not present for the FSSS mutation of WIPI2. WIPI2 localises to Rab11 endosomes mainly, as does retriever. Mutations of WIPI2 not only affected WIPI2 localisation, but also VPS35L mutations, indicating that there is a functional relationship between the two.
  
  On the whole, I find the manuscript compelling. The manuscript is very clearly written, the results are convincing and well performed. The flow of experiments is logical, and although not comprehensive in the subsequent mechanistic understanding, the fundamental findings are important and convincing. My comments below are, on the whole, minor and are intended to support the communication of the findings to the field.
  
  We are happy that the reviewer has received our work quite positively.
  
  (1) The IP interaction data were convincing; however, for me and some others, an interaction is only convincing when performed in vitro, and understood at a structural level. I do not suggest the authors do that in this case; however, I think, at a minimum, some sensible moderation of claims would be useful here.
  
  Indeed, quantitative in vitro data on the affinities would be a nice addition. However, we have significant trouble to recombinantly express and purify well-behaved WIPI2 in sufficient quantities for such studies. We keep working in this direction but are not there yet.
  
  We have now inserted a phrase into the discussion section highlighting this limitation: "Our immunoprecipitation assays cannot distinguish and more detailed structural and interaction studies with pure compounds will be necessary to elucidate the nature of this interaction". We nevertheless think that the the isoform specificity of the IPs, the effect of the point mutations in WIPI2 on these interactions, and the functional effects in vivo lend signficant support to the notion of a complex even if there is no proof of direct binding of WIPI2 to Retriever.
  
  (2) I found the final localisation data and its interpretation confusing. My interpretation of that data would not be that the retreiver is relocalised, but rather that there is less of both recruited to the membrane and the remaining localisation distribution is shifted. In addition, I am not quite sure of the model here - is the idea that WIPI2 recruits retreiver, if that is the case, I find it hard to resolve with its role as a mediator of fission. Clarity would be appreciated here.
  
  We are not quite sure what "final" localisation data the reviewer refers to, but we guess it is Fig. 9. This figure primarily provides in vivo evidence supporting the connection between Retriever and WIPI2. It does this by showing that the S67 substitution shifts both proteins. In WIPI2 wildtype cells, WIPI2 and VPS35L strongly colocalize in Rab11 compartments. S67 substitutions in WIPI2 abolish this localisation; WIPI2 shifts mainly to Rab5 compartments, where VPS35L shows only a moderate increase, and to Rab7 compartments, where VPS35L shows no increase at all.
  
  We do not understand the reviewer's interpretation that less Retriever would be recruited to the membranes in the S67 variants. VPS35L remains completely associated with punctate, presumably membrane-bounded structures also in the mutants, providing no evidence for a detachment from the membrane. The same is observed in a WIPI2 knockdown. Therefore, we did not claim that WIPI2 is the main factor recruiting Retriever to the membrane, for which our experiments yield no hints. This does not exclude that the interaction of WIPI2 could strengthen membrane recruitment, or that two pools of Retriever exist, one interacting with Snx17 and another interacting with WIPI2, and that both link to each other in a coat. We did not dwell on this in the discussion because our experiments cannot distinguish these possibilities and were not conceived to analyse membrane recruitment of Retriever.
  
  (3) I am concerned that the repeats being compared for statistical analysis are not biological repeats but technical repeats (cells in the same experiment). I should think the idea of the statistical comparison is to show experimental reproducibility and variability across biological repeats. Therefore, I would expect an appropriate number of biological repeats (3 or more minimum), to be the data compared in the statistical analysis and graphs. I think it is appropriate to average the technical repeats from each biological repeat. I find these to be useful resources https://doi.org/10.1083/jcb.202401074, https://doi.org/10.1083/jcb.200611141
  
  The repeats being compared are biological repeats from independent experiments. This is described in Methods, where the reviewer may not have seen it. In order to make the independent experiments more evident in the figures, we have now colour coded the individual cell measurements from the three independent experiments. This allows to visualize both the individual data points, the average from each experiment and the variability across the independent experiments.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript from De Leo and Mayer presents evidence that the PROPPIN protein, WIPI2, associates with the Retriever complex, and is required for the proper transport of the SNX17-Retriever cargo, beta1-integrin. This finding fits with prior papers from the Mayer lab, which showed that a related PROPPIN, WIPI1, is required for the transport of some SNX27-Retromer cargo, including GLUT1. The retromer and retriever complexes are architecturally similar. Importantly, they act at the same endosomes, and each transports cargo from endosomes to the plasma membrane. Thus, the possibility that each also requires a structurally related PROPPIN is of interest. However, the manuscript is incomplete, and the main claims are only partially supported.
  
  Strengths:
  
  The topic that PROPPIN proteins are important for the function of the Retromer and Retriever complexes expands our view of the trafficking complex.
  
  Weaknesses:
  
  Many important controls are missing. Several points that are made in the manuscript are only supported through a single approach.
  
  We made a serious effort and implemented many suggestions of this reviewer, but orthogonal approaches are not always available or accessible.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The manuscript of Mayer and colleagues analyzes the function of WIPI proteins in mammalian cells. The authors previously identified CROP as a complex consisting of WIPI1 and the retromer complex, primarily in yeast cells. In mammalian cells, both WIPI1 and WIPI2 exist, whereas retromer has a homologous complex termed retriever. They now find that WIPI2 can form a complex with retriever subunits. They named this complex CROP2. Their data further indicate that CROP2 and CROP1 have distinct substrate specificities as knockdown of CROP2 subunits affects beta1 integrin sorting, whereas knockdown of CROP1 affects EGFR and GLUT1. They further identify a similar sequence (FSSS) in both WIPI1 and WIPI2, which is required for their specific binding to retromer and retriever.
  
  Strengths:
  
  CROP1 and CROP2 seem to use similar features for their formation, and have different substrates, which is convincingly shown.
  
  Weaknesses:
  
  The analysis lacks information that this is a complex as claimed. It can be deduced from the interaction analysis, but was not shown.
  
  It is of course desirable to obtain a detailed structural and in vitro characterisation of this interaction, which we have not provided because we currently do not have sufficient amounts of well-behaved source material for this. We nevertheless think that the interaction we show, which is strictly isoform-specific and dependent on single amino acid substitutions in a motif that in CROP1 is necessary for the interaction its recombinant subunits, supports that CROP2 is a similar a complex. We don't show a direct interaction but also don't claim in the manuscript that the interaction between WIPI2 and Retriever is direct and independent of additional factors.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  As you will see, the reviewers generally value the contribution to the field, but they feel that some claims require additional experimental support.
  
  (1) I have summarized the major points below.
  
  (a) Both reviewers 1 and 2 agree that the quality of localization data presented in Figure 9 and S5-S7, and the interpretation of the data, could be improved. See comment 2 from reviewer 1 and comments 23, 24 and 25 from reviewer 2. They not only suggest ways to improve the presentation of the data, but additionally suggest improving the staining of the Rab11 marker and additionally explain the lack of co-localization between VPS35 and Rab5, which has been reported in the literature.
  
  This impression was due to the fact that some figures showed projections of image stacks, which was not indicated clearly in the figure legend. We have changed this and now show single image planes throughout all figures.
  
  (b) Both reviewers 1 and 3 note that the evidence supporting a functional WIPI2-Retriever complex in vivo is currently weak. We agree that additional biochemical data demonstrating the presence of the CROP1 and CROP2 complexes in vivo would strengthen the central message of the paper and elevate it to a more fundamental discovery.
  
  We understood that the reviewers did not ask for further in vivo evidence but would welcome structural characterisation of the complex and quantitative binding data in vitro with purified proteins. Structural characterisation is out of scope of our study and in vitro binding studies have remained hampered by the fact that WIPI2 is hard to express and purify and not well behaved in vitro.
  
  (c) All reviewers agree that the authors should carefully repeat their statistical analysis to account for the number of biological replicates. Reviewer 1 suggests publications that the authors could refer to.
  
  The reviewers have probably overlooked the respective description in the methods section, where it had been stated that we analysed biological replicates from independent experiments. In graphs showing measurements from individual cells we now make this evident through colour coded dots, in which each colour represents data points stemming from an independent experiment. This makes it evident that the variance from experiment to experiment is low. The means (n = 3) were generally compared using a two-tailed unpaired t-test.
  
  (d) Reviewer 2 additionally has various minor points that would greatly improve the readability and presentation of the work, and we recommend addressing (comments 1, 2, 3, 4, 12, 15, 17, 20, 27, 28, 29). All reviewers, in general, provide great minor suggestions. It would be great if the CROP1 and 2 complexes could be clearly introduced in each figure. We also agree that the WIPI2 CT labelling is confused and should be changed to "control" or similar.
  
  Many of the points raised by this reviewer were actually quite minor or questions of personal preference, not major problems as stated in the review. Nevertheless, we found a number of useful suggestions in this review and have addressed these points as detailed in the response to reviewer 2.
  
  (2) In addition to the major shared concerns laid out in the points above, reviewer 2 has some further minor suggestions:
  
  (a) Comment 6. Could the author explain the discrepancies between the example blot shown in Figure 1D and the quantification (1E).
  
  The two have actually been quite consistent. The reviewer might have mistaken the marker lane as the 0 min reference value to arrive at this impression. We have now removed the marker lane to avoid this.
  
  (b) Comment 9 - could the authors clarify how surface labelling experiments were carried out?
  
  This had been clearly described in the methods section, where this reviewer has probably not seen it.
  
  (c) Comment 11 - The reviewer suggests normalizing the surface levels of markers to the cell area and not per cell. This is a reasonable suggestion.
  
  The analysis had already been performed as proposed. This had been clearly described in the methods section, which the reviewer may not have looked at.
  
  (d) Comment 19 "In Figure S4, the authors observe tubular structures. The authors should perform immunofluorescence with endosomal markers such as EEA1, LAMP1 and Retromer to determine the nature of the tubulovesicular structures." The authors could try a Rab4 or Rab11 overexpression plasmid to show whether these are elongated recycling tubules.
  
  This has now been added.
  
  Reviewer #1 (Recommendations for the authors):
  
  Minor comments:
  
  (1) The figures are not colourblind friendly, and should be changed to be so. Additionally, single colour images should be grayscale.
  
  That was a good learning opportunity. We adapted the colour schemes of the images to make them more colourblind friendly, now using magenta, green, and white for the overlaps. In doing so we have relied on published recommendations, but we have not found a colourblind colleague to check the efficacy of this change.
  
  (2) WIPI2^CT labels are confusing, as people may think they are a mutant. I suggest changing to "control" or similar.
  
  These have been changed.
  
  (3) "The effect was comparable to that of a knockdown of SNX17 (Figure 3 A, B)." On page 6. Based on this sentence, I was expecting to see a comparison to SNX17 KD, but it was not there as far as I can tell.
  
  This statement referred to a publication by P.Cullen and collaborators. We have changed the wording and inserted the (missing) reference to make this clear.
  
  Reviewer #2 (Recommendations for the authors):
  
  The manuscript is modest. In addition, many of the claims should be better supported by the addition of orthogonal data. Moreover, the quality of some of the data presented needs to be improved. Overall, the manuscript requires better descriptions of the methods. In many figures, it was not clear how the experiments were performed.
  
  The experimental descriptions that the reviewer refers to had been provided in the Methods section, where this reviewer may have overlooked them.
  
  The paper should also be better organized. Some less important findings are in the main figures, whereas some critical results are in the supplemental figures. In addition, there were multiple issues with the readability of the paper, and the authors should consider using a professional editor to make the paper easier to read.
  
  We had given the paper to colleagues who found it clear, and also Reviewer 1 has underlined its clarity. Nevertheless, we have re-phrased the manuscript in some parts to optimise it.
  
  One of the main claims in the paper is that the FSSS motif of WIPI2, as well as a conserved amphipathic helix, is critical for WIPI2 function in the CROP2 complex. It is notable that these are the same regions that are also critical for the role of WIPI2 in autophagy (Gubas et al., 2024 PMID: 39152217). The authors should include this information in the manuscript and cite the paper.
  
  Indeed. We mention this now in the introduction of the revised version.
  
  Additional Major Issues:
  
  While some of the issues raised below are actually minor and/or matters of personal preference, several comments led us to improve and correct the figures and we thank this reviewer for the constructive suggestions.
  
  (1) In Figure 1, it appears from the representative images that WIPI2 KD cells have higher levels of EGFR (Figure 1A and 1B). Is this correct?
  
  To some degree. This increase is not systematic. A moderate increase has been observed only in 2 experiments out of 4. Therefore, we did not investigate this.
  
  (2) Also in Figure 1, the colocalization is difficult to see. The authors should add the separate channels in addition to the merged images. Since the point is supposed to be that there is no impact on EGFR, all of this data could go into the supplement.
  
  We had considered this already for the original version but dismissed the idea. The overlap is quantified in Fig. 1C, which provides the relevant values from four experiments. Fig. 1A/B provide only sample pictures, which also permit to see overlap (yellow) 0 and 5 min after the induction of degradation, which vanishes at later timepoints. Separating the channels would quadruple the space that this figure occupies, which would not be practical and not change the point to be made.
  
  (3) The scale bars for each panel differ from each other. To better assess the data, the exact same magnification should be shown for each panel.
  
  Corrected
  
  (4) Figure 1C is confusing. The authors should explain which lines correspond to EEA1 and LAMP1.
  
  Corrected
  
  (5) In Figure 1D, the authors show different blots for control and WIPI2 KD. Could the authors compare WIPI2 and EGFR in the same blot? Without a comparison on the same blot, it is impossible to know whether the starting levels of EGFR are the same. Moreover, the quantitation in Figure 1E sets the value for each cell line to 100%. Instead, the starting levels in each cell line should be compared. The authors should use the amount of EGFR at zero time in the control cells to define 100%, and then indicate the relative initial EGFR levels in the WIPI2KD cells.
  
  A new blot is shown now and the quantification has been performed as proposed.
  
  (6) The quantification in Figure 1E does not match the representative blot shown in Figure 1D. According to the graph, the rate of degradation of EGFR is similar in both cell lines. But the representative blot shows that there are large differences.
  
  We do not understand this comment. The representative blot shows similar kinetics for both. Perhaps the reviewer got confused by the fact that a marker lane was still present on the left blot and not labelled as such. The new version of the figure corrects this.
  
  (7) The blot showing the WIP2 knockdown in Figure 1D has a lot of background. However, the blot of the WIPI2 knockdown in Figure S1 looks very good. The authors should make sure that they load enough sample and use a good antibody for the experiments in Figure 1.
  
  The new blot that we added in response to comment 5 corrects this.
  
  (8) In Figure 2 and Figure 3A, the cells are too confluent. This is an issue because the cells might not be metabolically active. In addition, the signal is saturated. The authors should make sure that all of the data is collected on cells that are not too confluent.
  
  The confluency of the culture cannot be judged from single frames, which were selected to show several cells. We had controlled confluency and underlined in the Methods section that “For microscopy, the cells were plated on 18-mm-diameter glass coverslips on 24-well plates and grown for 2 or 3 days according to the protocol of DNA or siRNA transfection by reaching a confluency of 70-80%”. The reviewer may not have seen this.
  
  (9) One main issue with these figures, especially the non-permeablized cells, is that it is impossible to assess how much of the signal is on the cell surface. The authors should provide the methods that they used to prevent inadvertent permeabilization of the cells. Were these experiments performed at 4 degrees? The authors should include a control of an antibody to a protein that is not found on the cell surface.
  
  There is an internal control in that the non-permeabilised WIPI2KD cells, which have been treated with the same antibody, show no much less staining than the control cells (Fig. 3A). In WIPI2KD cells, integrin becomes accessible for antibody staining only upon detergent permeabilization. This demonstrates that our procedure does not lead to significant inadvertent permeabilization of the cells.
  
  (10) The authors should perform surface biotinylation assays as an orthogonal approach to determine GLUT1 levels and beta1-integrin levels at the cell surface, respectively.
  
  There is a strong, qualitative difference in the surface labelling of beta1-integrin that is not observed for GLUT1. Given that, it is not obvious to us what additional argument would be provided by surface biotinylation or subfractionation experiments.
  
  (11) In quantifying surface levels of GLUT1 or beta1-integrin by microscopy, the authors should normalize to the cell area, rather than per cell.
  
  The reviewer has probably not seen that the Methods section states that the cell area has been used for normalisation.
  
  (12) In Figure 3, the nuclear DAPI stain in the KD cells is much less bright than in the control cells. The authors should make sure to choose representative images.
  
  The nuclear DAPI signal has been visible in all cells. Depending on the position of the nucleus, is shape and dimension in the z-direction, individual nuclei can show different degrees of staining. The images shown are representative. We have adjusted the settings now to make the nuclei in the WIPI2KD cells easier to spot.
  
  (13) For the immunofluorescence studies, the authors should be using single z planes rather than maximum projection.
  
  Images have been exchanged by single planes.
  
  (14) For the experiments in Figure 3, the authors should check the total levels of EEA1 and LAMP1 by western blot to test whether WIPI2 KD affects the levels of these proteins. If these organelle marker proteins are impacted, this could impact the colocalization measurements shown in Figures 3C and D.
  
  We have measured the total fluorescence intensity of EEA1 and LAMP1 in the images. It shows no significant difference between control and WIPI2 knockdown cells (new Fig. 3F, H).
  
  (15) In Figure 4A, the helical representation is rotated in the WIPI2-Sloop; the orientation of the residues that are not mutated should stay the same.
  
  Yes. Done.
  
  (16) In Figure 4B and 4C, cells that were not transfected with WIPI2 WT or WIPI2 Sloop should be shown.
  
  Since the transfection efficiency is limited, the fields contain both non-transfected (lacking green fluorescence) and transfected cells (showing green fluorescence). We have now marked transfected cells with an asterisk.
  
  (17) The cells in the lower panel of 4B have an unusual morphology and are much more round. The authors should choose cells that are representative of each experimental condition.
  
  We now provide another field.
  
  (18) In Figure 4C, it looks like the magnification of the top panels is different from the bottom panels. The same magnification for all the panels should be shown (and the size of the scale bars should be the same.
  
  Corrected
  
  (19) In Figure S4, the authors observe tubular structures. The authors should perform immunofluorescence with endosomal markers such as EEA1, LAMP1 and Retromer to determine the nature of the tubulovesicular structures.
  
  We have done this (new Fig. S4). Rab4 is on tubules. Rab5 on the structures from which the tubules emanate.
  
  (20) In Figure 5A, the top scale bar is missing.
  
  Corrected.
  
  (21) In Figure 5B, the confluency is too high.
  
  See our response above. A single field does not permit to judge this. Confluency was controlled for all cultures. The cultures were not confluent.
  
  (22) The IP studies shown in Figures 6, 7 and 8, should be accompanied by colocalization studies.
  
  Colocalization measurments have now been integrated into the manuscript (Figs. S5, S6). They are consistent with the IP data.
  
  (23) Figure 9 was very confusing and should be broken up into multiple figures. Data showing that localization did not change in any of the cell lines can be put in figures that are distinct from figures that show that localization changed in the various mutants. Figures that show no change can go in the supplement.
  
  Since every panel of Fig. 9 shows a statistically significant difference we left the figure unchanged.
  
  (23) Representative figures should be shown in the same figure as the corresponding graph. In addition, the order of the colocalization data shown in the graphs and figures should match the order described in the text.
  
  We consider the graphs of Fig. 9 as the relevant information. Representative images are just illustration. Integrating them with the graphs would make it necessary to split everything up into multiple figures, making it harder to compare the different combinations. Therefore, we left the figures unchanged.
  
  (24) In Figure S7, the Rab11 signal looks continuous, which makes the colocalization analysis meaningless. The authors should determine how to take images that can be evaluated. On a more minor note, the zoomed panels should be labeled as well.
  
  This is a result of having shown a projections of multiple planes. The images have now been replaced by single plane images. Zoomed panels have been labelled and the scale bar added.
  
  (25) The low colocalization of VPS35L with Rab5 is surprising, as SNX17 has been previously shown to co-localize with early endosomes positive for EEA1. This result may have occurred due to overexpression because the authors chose to utilize plasmids that express a tagged protein. There are antibodies to each of the endogenous proteins, and this is what should be used for this set of experiments.
  
  This comment made us control the analysis performed for these images, which by mistake had been performed on z-projections rather than on single planes. This distorted the values. The re-analysed data shows a higher colocalisation with Rab5, but it remains inferior to colocalisation with Rab11.
  
  (26) The authors should determine whether β1-integrin colocalizes with WIPI2 in endosomal compartments.
  
  This was done. WIPI2 colocalizes with beta-integrin on EEA1-and SNX17-positive strcutures but not positive for LAMP1 (Fig. 3E/F).
  
  Minor points
  
  (27) In one of the panels in Figure 1A, "30 min" is duplicated.
  
  Removed
  
  (28) In Figures 5C and 5D, the y-axis should indicate that this is surface β1integrin.
  
  Changed and added “surface”
  
  (29) In Figure 9 there is a typo in panel A. It is VPS35L and not VPS35.
  
  Corrected
  
  Reviewer #3 (Recommendations for the authors):
  
  This is an overall convincing study, which shows that the two complexes, CROP1 and CROP2 function at different membranes and serve different substrates. While I agree with their localization analysis, I have one key issue. The authors claim that each of the two forms a complex and base this on their specific pull-down and western blot analyses.
  
  I find it important that they show that both indeed form stable complexes in vivo, using pull-down and mass spectrometry approaches. They have all the necessary tools in hand and could use WIPI1 and WIPI2 to demonstrate the existence of the two complexes. The FSSS mutants of each are good controls for such an analysis.
  
  The manuscript actually presents the demanded in vivo experiments. Figs. 6 to 8 show pull-downs of WIPI1 and WIPI2 from cells, including also the FSSS mutant. While we haven't analysed this interaction by mass spectrometry, the Western blot analysis confirms the analysis. Cooperation of these proteins is further supported by the in vivo phenotypes, where the S67A substitution in WIPI2 produces a similar phenotype on integrin beta1 localisation as inactivation of Retriever.
  
  A second aspect is the general presentation. The paper would be a lot more accessible if the subunits of each complex (CROP1 and CROP2) were also introduced in the figures of each part. For readers, a final model is helpful to put the data into context and show where each complex operates in the cell.
  
  We have introduced a scheme of the respective complexes, including the names of the compunds, in Figs. 6 and 7 to avoid confusion.
  
  Finally, it is not clear how the statistics compare to repeats in their data. This should be clarified.
  
  This had been described in methods. Statistics has always been done on biological replicates stemming from independent experiments. We have added a cartoon (Fig. 10) depicting the trafficking pathways affected by CROP1 and CROP2.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.10.08.681146v2
www.biorxiv.org www.biorxiv.org

Early visual cortex supports one-shot episodic memory via spatially tuned reactivation

1
1. Public_Reviews 03 Jul 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper reports the findings of a neuroimaging experiment that tested the hypothesis that the cortex, specifically early visual areas, reinstates the content from single events during our lives. The researchers tested this hypothesis by presenting to-be-remembered pictures of objects at spatial locations on the computer screen and then testing subjects with both recall and recognition. They show that during memory testing, the spatial location of the object can be decoded from the pattern of cortical BOLD responses measured with fMRI. They go on to show that the spatial tuning is higher during recognition than recall, that the tuning is correlated with memory retrieval accuracy, and that the retrieved precision is predicted by the encoded precision, particularly in the higher-level visual areas. Thus, the paper finds evidence of cortical reinstatement of details from a single event in a human life.
  
  Strengths:
  
  This is a strong manuscript that I have had the luxury of commenting on during a round of review at another prestigious journal. As a result, the authors have already made changes to address previous comments about highlighting the complementary learning systems approach more to motivate the alternative prediction that the cortex should only show evidence of reinstatement after repeated presentations. In addition, the authors have fleshed out the discussion of working memory in this task. They also revised their review of the literature to include citations suggesting spatial locations are normal parts of our episodic representations, likely obligatory in nature, as my group and others have argued in completely unrelated work. I applaud the authors for being responsive to a previous round of review and using the comments to address relatively minor issues with the paper, even though they moved on to a different journal. Thus, I found the paper even stronger than at first approach, and at first blush, the results were intriguing and the paper well written.
  
  Weaknesses:
  
  There is a logical perspective in the narrative that seems to unnecessarily weaken the paper. The paper shows evidence consistent with the conclusion that mnemonic representations are contained in early visual cortex, but then argues that those representations are not actually stored therein. For example, the first half of the last sentence of the conclusions (see page 19 of the manuscript). I understand the perspective that subcortical mechanisms must be involved in the act of retrieval, given the neuropsychology and other evidence. But if storage is elsewhere with the same fidelity so as to code this information, then how would such a memory system work? The MTL neurons would need to have the real, precise representation of all the orientations encoded at all the retinotopic locations, a mirror to V1 in terms of precision, because that's the actual memory representation being retrieved, so its fidelity will be limited by what is stored in the file, so to speak. Then, at retrieval, the paper proposes that the brain just reactivates the encoding context in V1 to help with the response output and ensure the precision of the behavioral responses. This must mean that the hippocampus/MTL has cells and networks with tuning functions that match the precision in all the cortical sensory systems that they are integrating context across, given the episodic memory models like Polyn and colleagues (2009, Psych Rev). So, there are little MTL maps that are completely redundant with V1, M1, A1, S1, etc.? Why such redundancy?
  
  Why not propose that what the subcortical systems do is to encode a unique pattern for that episode, that is separated from others, that just links (or provides pointers to, in computer science jargon) the contextual details stored in the cortical networks themselves? In this way, we can explain why neglected patients also neglect their memories of the town square. This has always been my interpretation of the results of the Polyn et al. (2006, Science) paper and the models tested with those whole-brain results. That is, you see widespread cortical context reinstatement during (one-shot) free recall events that included visual selective cortex for faces when faces were being recalled, but included a broad network, probably V1, and activating sounds in A1, body posture in M1, etc., though the latter three examples did not discriminate between categories of memoranda, in their experiments. Given that you show that activity in V1 during retrieval looks like it is being used, you should propose that the early cortex really participates in memory storage functions. V1 neurons are wired up to neurons of other selectivities in a competitive network with plastic synaptic connections. How would experience be prevented from changing activity in the cortex? Yes, cortical changes slow after the critical periods, as studied in the classic eye suturing experiments to study ocular dominance, but changes in cortical representations do not stop with maturity, with the pinwheel centers looking like they are context sensitive, thus, changing rapidly to events across time (Okamoto, Ikezoe, et al., 2011, Sci Reports). The brain would need a no-plasticity mechanism, and instead, it looks like the cortex can completely rewire even in adulthood (Buonomano & Merzenich, 1998, Annu Rev Neuro).
  
  I believe that the paper needs to describe the strong/radical interpretation of the current findings; that they are consistent with the view that the entire brain may be a memory structure, with encoding linking representations across sensory cortices. But also activating semantic and lexical systems, emotional networks encoding those aspects of context which we know can sometimes strongly drive effects, a nice prediction that could be made in the discussion/conclusions. Here you are looking at how precise the visual reinstatement is in V1 during retrieval following one exposure. One parsimonious mechanism to explain this effect is that the brain stores details of events using the neurons that do the high-fidelity perception of the event. Given that our goal is to stimulate thinking among fellow scientists so that this paper can be a citation classic, I think the paper should be revised so that it paints a complete picture of the theoretical possibilities of its findings.
  
  Review 1
Visit annotations in context

Tags

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.04.647327v5
www.biorxiv.org www.biorxiv.org

Trpv4 links environmental temperature to testicular differentiation in hermaphroditic ricefield eel

1
1. Public_Reviews 02 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This preprint investigates the molecular mechanism by which warm temperature induces female-to-male sex reversal in the ricefield eel (Monopterus albus), a protogynous hermaphroditic fish of significant aquacultural value in China. The study identifies Trpv4 - a temperature-sensitive Ca²⁺ channel - as a putative thermosensor linking environmental temperature to sex determination. The authors propose that Trpv4 causes Ca²⁺influx, leading to activation of Stat3 (pStat3). pStat3 then transcriptionally upregulates the histone demethylase Kdm6b (aka Jmjd3), leading to increased dmrt1 gene expression and ovo-testes development. This work aims to bridge ecological cues with molecular and epigenetic regulators of sex change and has potential implications for sex control in aquaculture.
  
  Strengths:
  
  (1) This study proposes the first mechanistic pathway linking thermal cues to natural sex reversal in adult ricefield eel, extending the temperature-dependent sex determination paradigm beyond embryonic reptiles and saltwater fish
  
  (2) The findings could have applications for aquaculture, where skewed sex ratios apparently limit breeding efficiency
  
  Weaknesses:
  
  Although the revised manuscript represents an improvement over the original version, substantial weaknesses remain.
  
  We thank you for the critical comments. We have responded to your concerns by a point by point manner, and please see detail below.
  
  Scientific Concerns
  
  (1) Western blot normalization and exposure: The loading controls (GAPDH) in Fig. S3C appear overexposed, as do several Foxl2 blots. Because these signals are likely outside the linear range, I am not convinced that normalization is reliable. This raises concerns about the validity of the quantified results.
  
  We thank you for the concerns. We have repeated the experiments, and new blots were loaded in Fig.S3C.
  
  (2) Antibody validation and referencing (Line 776): The authors need to refer explicitly to figures demonstrating antibody validation. At present, these data are provided only as a supplementary file that is not cited in the manuscript. In addition, the Sox9a antibody appears to yield indistinguishable signals in control and RNAi conditions, suggesting that it may not recognize eel Sox9a. This issue is not addressed by the authors. Furthermore, antibody validation Western blots should be quantified.
  
  We thank you for the comments. We have repeated the siRNA experiments to show the specificity of the antibodies used. This file, named as the supplementary file 1, is now cited in “WB analysis” in the Materials and Method part. As required, the antibody validation of WB are uploaded in the supplementary file 1. Antibody validation for WB are now quantified, and please see the new figure 3 and supplementary Figure 3.
  
  (3) Unclear sample sizes (N values): Sample sizes remain unclear for several figures:
  
  (a) Fig. 3F - No N value is provided. Each graph shows three data points; does this indicate that only three samples were quantified? If ten samples were collected, why were all not quantified?
  
  We apologize for the confusion. Three data points were previously used to shown data of 3 replicates. In new figure 3F, 10 randomly selected sections were imaged, and the data are shown. In the revised manuscript, the sample numbers (the N values) are added, and all the information can be found in the figure legend.
  
  (b) Fig. 4 - No N values are reported.
  
  Now N values are added. Please see the figure legend.
  
  (c) Fig. 5A - Again, only three data points are shown per group, despite the apparent availability of twelve samples. The rationale for this discrepancy is not explained.
  
  We apologize for the wrong data representation. Now all the data points are shown in Figure 5.
  
  (4) qRT-PCR normalization: The manuscript does not specify the reference gene(s) used for qRT-PCR normalization. Although expression levels are reported as "relative," neither the identity of the reference gene(s) nor the justification for their selection is provided.
  
  We now have specify the reference gene in “Quantitative real-time PCR (qPCR) experiments” part in the Materials and Methods section.
  
  (5) Specificity of key antibodies: While the authors have made some effort to validate anti-Amh, anti-Sox9, and anti-Dmrt antibodies, the results remain incomplete. The Amh and Dmrt antibodies detect reduced protein levels following knockdown of their respective targets, which is encouraging. However, the Sox9a antibody shows no difference between control and RNAi conditions, suggesting it does not recognize eel Sox9. This is not acknowledged in the manuscript. In addition, no validation data are presented for Foxl2. Antibody validation data must be clearly referenced in the main text and presented in an interpretable and quantitative manner.
  
  The antibody specificity is very important. For that reason, we have generated at least two different antibodies for each target protein, using full-length or small peptide as antigen. We have repeated the experiments for key antibodies such as Dmrt1 and Sox9a. IF and WB results clearly showed the specificity of the antibodies.
  
  Author response image 1.
  
  Foxl2 antibody has also been reported in ricefield eel (Hu et al. SCIENTIFIC REPORTS | 4: 6884 | DOI: 10.1038/srep06884, Molecular cloning and analysis of gonadal expression of Foxl2 in the ricefield eel Monopterus albus).
  
  After short term warm temperature exposure, only a small portion of somatic cells in ovary may be induced to express the male markers. As different techniques have different capacity (sensitivity), some techniques were more easy to detect that change. For instance, qPCR and WB are ready to detect it, whereas IF is a little difficult in obtaining good quality data.
  
  (6) Immunofluorescence data quality: The immunofluorescence images remain difficult to interpret. I strongly encourage the authors to enlarge the image panels and to present monochrome images (white signal on black background). The current presentation severely limits interpretability.
  
  We thank you for the comments. We think that our IF images are of decent quality. Due to the limits of the Figure space (already busy for Figure 3), enlarging the image panels or presenting additional monochrome images will compromise the quality of other data. Alternatively, if you still concern its quality, we can put it in the supplementary.
  
  Author response image 2.
  
  (7) Unreferenced supplementary figure: Fig. S4 is included in the submission but is not referenced anywhere in the manuscript text.
  
  We now have renamed the supplementary Figures. And we have double checked the text to make sure all Figure information is correctly referenced. Figure S4 is removed, as it is not necessary.
  
  (8) Fig. 5B image resolution: The micrographs in Fig. 5B are too small to allow meaningful evaluation of the data.
  
  Now new Figure 5B images with higher resolution were shown.
  
  (9) Unexplained data inclusion (Fig. 5E): Fig. 5E includes a pERK blot that is not mentioned in the Results section. The rationale for including these data is unclear.
  
  Previous work have shown that FGF/ERK signaling may play a role in sex change of ricefield eel (in Chinese). We therefore examined the Erk activity to explore whether it is involved in sex reversal. The results showed that pErk was comparable between ovary and ovotestis. At your suggestion, we decided to remove the data.
  
  (10) Poor blot quality (Fig. S3C): The blots in Fig. S3C exhibit high background and overexposure. I am concerned about the reliability of the quantification shown in panel D.
  
  The experiments have been repeated at least three times, and similar results were obtained. We now have replaced some of the WB that were of high background or overexposure.
  
  (11) Poor blot quality (Fig. S5G): The Stat3 blots in Fig. S5G contain numerous white artifacts, raising concerns about their suitability for normalization in panel H.<br />
  
  We now have repeated the experiments, and uploaded a new representative blot with better quality.
  
  (12) Missing controls (Fig. 6E): Fig. 6E lacks controls for HO-3867 and Colivelin treatments alone. Without these controls, it is not possible to determine whether the reported effects are meaningful.
  
  We thank you for the comments. We now have added the data required (with HO-3867 and Colivelin treatments alone).
  
  (13) Graphical presentation: The use of a light blue-to-pink gradient in bar graphs throughout the manuscript does not aid interpretation. I recommend using more distinct colors (e.g., red, orange, green, blue, purple, gray, black) to improve clarity.
  
  We thank you for the comments. We now have changed the blue-to-pink gradient to more distinct color system to better present the data. Please see the detail in the revised Figures.
  
  In summary, the interpretation of the study remains limited by persistent issues related to data presentation, image quality, and reagent specificity.
  
  We thank you for the critical comments about our data, in particular for antibody specificity and image quality, and the detailed instruction for how to better present the data. Answering your questions have greatly improved the quality of the manuscript. We admit that due to the technique challenging (with different conditions and different doses of small molecules) and higher cost of animal experiments, some of the WB or IF experiments may not be of high standards.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Editorial Concerns
  
  (1) Overstatement of conclusions: In lines 16-18, the authors state that Trpv4 "mediates" warm temperature-driven sex reversal. This claim is too strong given the data and should be toned down.
  
  We agree with our editorial comment about the overstatement. Now it reads “Trpv4 links environmental temperature to testicular differentiation in ricefield eel”.
  
  (2) Misuse of statistical language (Line 213): The term "significant" is used where statistical significance was not measured. The wording should be revised.
  
  We thank you for the point, and now have replaced “significant” to “marked”.
  
  (3) Terminology (Line 238): The term "co-expression" is inaccurate in this context. I suggest replacing it with "co-upregulation."
  
  We thank you for the point, and have changed it accordingly.
  
  (4) Drug description errors (Lines 241-242): The manuscript incorrectly identifies which drug functions as an agonist and which as an antagonist. This caused considerable confusion and must be corrected.
  
  We have carefully checked the sentence, and it was correct, as RN1734 and GSK1016790A are known Trpv4 specific antagonist and agonist, respectively.
  
  (5) Gene examples missing (Lines 247-250): The authors should explicitly name the testis-biased and ovary-biased genes referred to in this section.
  
  We thank you for the point, and now it reads “warm temperature exposure increased the expression of testicular differentiation genes such as dmrt1 and gsdf, accompanied by moderately decreased expression of ovarian differentiation genes such as cyp19a1a and foxl2”.
  
  (6) Lack of experimental context (Lines 322-324): Rather than simply listing the drugs used, the authors should briefly explain what each compound inhibits or activates and why it was employed.
  
  We have described this in the manuscript. The information of pStat3 activator and inhibitor has been described in Lines 305-309, as “HO-3867, a curcumin analogue, is a selective pStat3 inhibitor, which blocks pStat3 activity by directly binding to Stat3 DNA binding domain, and Colivelin is a potent synthetic peptide activator of pStat3, which increases pStat3 levels by acting through the GP130/IL6ST complex”, and the rationale has been stated in lines 32--322 as “To functionally demonstrate that pStat3 signaling is downstream of Trpv4, rescue experiments were performed by injecting into ovaries with individual and combined small molecules”.
  
  (7) Discussion of evolutionary differences: The Discussion misses an important opportunity to address why Stat3 activates kdm6b in ricefield eel but represses it in turtles. It is difficult to reconcile how the same transcription factor could exert opposite effects on the same gene during sex determination without additional context. A comparison of kdm6b regulation and sequence conservation between turtles and ricefield eel would strengthen this section.
  
  We have downloaded the promoter sequences of red eared turtle and ricefield eel. Based on the DNA sequences (Author response image 3), the similarity (conservation) was low between the two species.
  
  Author response image 3.
  
  It was appeared that DNA around the Stat3 binding sites in turtle are GC rich (CpG island), which may be subjected to DNA methylation modification, whereas the DNA in ricefield eel are not GC rich.The observations imply that the role of pStat3 is to promote the repression of kdm6b in turtle but the activation of kdm6b in ricefield eel.
  
  Moreover, our unpublished data showed that Trpv4-controlled calcium signaling is required to remove the repressive histone modification H3K27me3 at the kdm6b gene. If pStat3 is downstream of Trpv4 in this case, it supports again that Trpv4-pStat3 axis activate kdm6b in ricefield eel.
  
  Warm temperature promotes female sex in turtle but male sex in ricefield eel. If pStat3 is mediating Trpv4, it is not surprising that it represses kdm6b in turtle but activate it in ricefield eel.
  
  Based on above, we have added some sentences in the discussion part, and it reads “We reasoned that a yet-unidentified co-factor may determine whether Stat3 is a transcriptional repressor or activator. A comparison of promoter sequences of kdm6b between turtle and ricefield eel supported this”.
  
  (8) Supplementary figure formatting: Supplementary figures should be provided in accordance with eLife formatting guidelines.
  
  We have now formatted the supplementary figures that are in accordance with eLife formatting requirement. Please see the new uploaded supplementary figures.
  
  In sum, the interpretations are still limited by the above concerns regarding data presentation and reagent specificity.
  
  We thank our editor for the inspiring comments. We believe we have addressed all the major concerns by our editor.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.06.09.658756v5
www.medrxiv.org www.medrxiv.org

Mood computational mechanisms underlying increased risk behavior in suicidal patients

1
1. Public_Reviews 01 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  eLife Assessment
  
  This valuable study combined careful computational modeling, a large patient sample, and replication in an independent general population sample to provide a computational account of a difference in risk-taking between people who have attempted suicide and those who have not. It is proposed that this difference reflects a general change in the approach to risky (high-reward) options and a lower emotional response to certain rewards. Evidence for the specificity of the effect to suicide, however, is incomplete, which would require additional analyses.
  
  We thank the editors and reviewers for this important assessment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.
  
  Moreover, as Reviewer 3 pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M<sub>1</sub>), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.
  
  Beyond these specific findings, this work highlights the broader utility of computational modelling and mood to better understand behavioral effect, showing how to use both mood and choice data to better comprehend a psychiatric issue.
  
  Please see Tables S7, S8, S9 and our revisions below:.
  
  Page 17:
  
  “Within patients, this group effect on gambling rate remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.024; also see Figure S11, Table S7 and Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, (ps < 0.001), we performed Principal Components Analysis (PCA) to extract main components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. To further control for anxiety and depression, linear regression using these components as covariates revealed that the group effect on gambling rate remained significant (p = 0.024; Table S9).”
  
  Pages 18-19:
  
  “Within patients, this group effect on the approach parameter remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.027; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on approach parameter remained significant (p = 0.027; Table S9).”
  
  Page 21:
  
  “Within patients, this group effect on βCR remained significant after controlling for gambling rate, earnings, mood-related outcome effect, mood drift effect, sex, illness duration, family history, diagnosis, and various medications use (ps < 0.032), as well as general symptoms (e.g., depression and anxiety; p = 0.001; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on this mood parameter remained significant (p = 0.001; Table S9).”
  
  Page 27:
  
  “Beyond these specific findings, this work highlights the broader utility of computational modelling and mood to better understand behavioral effect, showing how to use both mood and choice data to better comprehend a psychiatric issue.”
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors use a gambling task with momentary mood ratings from Rutledge et al. and compare computational models of choice and mood to identify markers of decisional and affective impairments underlying risk-prone behavior in adolescents with suicidal thoughts and behaviors (STB). The results show that adolescents with STB show enhanced gambling behavior (choosing the gamble rather than the sure amount), and this is driven by a bias towards the largest possible win rather than insensitivity to possible losses. Moreover, this group shows a diminished effect of receiving a certain reward (in the non-gambling trials) on mood. The results were replicated in an undifferentiated online sample where participants were divided into groups with or without STB based on their self-report of suicidal ideation on one question in the Beck Depression Inventory self-report instrument. The authors suggest, therefore, that adolescents with decreased sensitivity to certain rewards may need to be monitored more closely for STB due to their increased propensity to take risky decisions aimed at (expected) gains (such as relief from an unbearable situation through suicide), regardless of the potential losses.
  
  Strengths:
  
  (1) The study uses a previously validated task design and replicates previously found results through well-explained model-free and model-based analyses.
  
  (2) Sampling choice is optimal, with adolescents at high risk; an ideal cohort to target early preventative diagnoses and treatments for suicide.
  
  (3) Replication of the results in an online cohort increases confidence in the findings.
  
  (4) The models considered for comparison are thorough and well-motivated. The chosen models allow for teasing apart which decision and mood sensitivity parameters relate to risky decision-making across groups based on their hypotheses.
  
  (5) Novel finding of mood (in)sensitivity to non-risky rewards and its relationship with risk behavior in STB.
  
  Weaknesses:
  
  (1) The sample size of 25 for the S- group was justified based on previous studies (lines 181-183); however, all three papers cited mention that their sample was low powered as a study limitation.
  
  We thank the Reviewer for rising this concern. We agree that the sample size for S<sup>-</sup> group (n=25) is modest, and the prior studies we cited also acknowledged limited power. We wanted to point out that we obtained a comparable sample size to a prior study. In the revision, we therefore updated the section to justify this sample size in which we acknowledge the limited power of our study in the limitation section. Please see our clarification below:
  
  Page 32:
  
  “Third, despite replicating our main results in an independent dataset (n=747), the modest S<sup>-</sup> subgroup size (n=25) has a limited statistical power.”
  
  (2) Modeling in the mediation analysis focused on predicting risk behavior in this task from the model-derived bias for gains and suicidal symptom scores. However, the prediction of clinical interest is of suicidal behaviors from task parameters/behavior - as a psychiatrist or psychologist, I would want to use this task to potentially determine who is at higher risk of attempting suicide and therefore needs to be more closely watched rather than the other way around (predicting behavior in the task from their symptom profile). Unfortunately, the analyses presented do not show that this prediction can be made using the current task. I was left wondering: is there a correlation between beta_gain and STB? It is also important to test for the same relationships between task parameters and behavior in the healthy control group, or to clarify that the recommendations for potential clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. Indeed, in line 672, the authors claim their results provide "computational markers for general suicidal tendency among adolescents", but this was not shown here, as there were no models predicting STB within patient groups or across patients and healthy controls.
  
  Thank you for these thoughtful comments. Our study focuses on why adolescent patients with suicidality have increased risk behavior, aiming to provide a mechanism-based target for suicide prevention. Therefore, our dependent variable in the mediation model was gambling behavior. We also agree that the clinically relevant question is whether suicidality can be predicted from task-derived behavior/parameters. We thus used risky behavior and the potential mental parameters to predict STB. Linear regressions showed that gambling behavior, as well as the value-insensitive approach parameter, can predict suicidal symptom scores among patients (former: β = 9.189, t = 2.004, p = 0.048; latter: β = 5.587, t = 2.890, p = 0.005). In healthy controls, these predictions failed (gambling behavior: β = 1.471, t = 0.825, p = 0.411; approach: β = 0.874, t = 1.178, p = 0.241). These results suggest that clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. We found same patterns for the mood parameter (mood sensitivity to certain rewards: patients: β = -28.706, t = -2.801, p = 0.006; healthy controls: β = -2.204, t = -0.528, p = 0.599). In sum, we believe that our statement of “computational markers for general suicidal tendency among adolescents” is reasonable now. Please see our revisions below:
  
  Page 17:
  
  “Furthermore, linear regression showed that gambling rate can predict the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048) among patients, but not among HC (β = 1.471, t = 0.825, p = 0.411), suggesting that gambling behavior has patient-specific predictive utility for suicidal symptoms.”
  
  Page 19:
  
  “Furthermore, linear regression showed that approach parameter can predict the current suicidal ideation score (β = 5.587, t = 2.890, p = 0.005) among patients, but not among HC (β = 0.874, t = 1.178, p = 0.241), suggesting that value-insensitive approach parameter has patient-specific predictive utility for suicidal symptoms.”
  
  Page 21:
  
  “Furthermore, linear regression showed that mood sensitivity to CR can predict the current suicidal ideation score (β = -28.706, t = -2.801, p = 0.006) among patients, but not among HC (β = -2.204, t = 0.528, p = 0.599), suggesting that mood sensitivity to CR has patient-specific predictive utility for suicidal symptoms.”
  
  (3) The FDR correction for multiple comparisons mentioned briefly in lines 536-538 was not clear. Which analyses were included in the FDR correction? In particular, did the correlations between gambling rate and BSI-C/BSI-W survive such correction? Were there other correlations tested here (e.g., with the TAI score or ERQ-R and ERQ-S) that should be corrected for? Did the mediation model survive FDR correction? Was there a correction for other mediation models (e.g., with BSI-W as a predictor), or was this specific model hypothesized and pre-registered, and therefore no other models were considered? Did the differences in beta_gain across groups survive FDR when including comparisons of all other parameters across groups? Because the results were replicated in the online dataset, it is ok if they did not survive FDR in the patient dataset, but it is important to be clear about this in presenting the findings in the patient dataset.
  
  Thank you for raising the important issue of multiple testing and for asking us to clarify exactly which tests were covered by the FDR procedure. In the clinical dataset we conducted a large number of inferential tests (χ<sup>2</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values. Please see Supplementary Note 8.
  
  (4) There is a lack of explicit mention when replication analyses differ from the analyses in the patient sample. For instance, the mediation model is different in the two samples: in the patient sample, it is only tested in S+ and S- groups, but not in healthy controls, and the model relates a dimensional measure of suicidal symptoms to gambling in the task, whereas in the online sample, the model includes all participants (including those who are presumably equivalent to healthy controls) and the predictor is a binary measure of S+ versus S- rather than the response to item 9 in the BDI. Indeed, some results did not replicate at all and this needs to be emphasized more as the lack of replication can be interpreted not only as "the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients" (lines 582-585) - it may also be that this link is not truly there, and without a replication it needs to be interpreted with caution.
  
  Thank you for these important comments. This study focused on cognitive and affective computational mechanisms underlying increased risky behavior in STB. Accordingly, we compared patients with STB (S<sup>+</sup>) with patients without STB (S<sup>-</sup>) and healthy controls (HC) to examine the effects of STB on risky behavior. Therefore, group comparison, instead of dimensional measure of suicidal symptoms by Beck Scale for Suicidal Ideation, can answer our research questions directly.
  
  To enhance consistency between the clinical and replication datasets, we included all participants in each dataset when performing the mediation analysis. Given that S<sup>-</sup> and HC did not differ in gambling behavior or the approach parameter in the clinical dataset, we merged these two groups. In the replication dataset, to mirror the S<sup>+</sup> vs. S<sup>-</sup> contrast used clinically, we categorized the general sample into S<sup>+</sup> and S<sup>-</sup> based on BDI item 9. The mediation results remained significant in both datasets (the clinical dataset: a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; the replication dataset: a × b = 0.143, 95% CI = [0.016, 0.288], p = 0.031), suggesting that STB is associated with increased risk behavior via stronger approach motivation.
  
  We also acknowledge the non-replication of the correlation between gambling behavior and mood sensitivity to certain rewards in the online sample. While this pattern might indicate that the link is specific to suicidal patients, it may also reflect sample-specific or unstable effects; thus, we now state this explicitly and interpret the finding with caution. Please see our revisions below:
  
  Page 15:
  
  “We next verified our results in an independent dataset, including the same task and BDI questionnaire in 747 general participants （500 females; age: 20.90±2.41）[46]. One item in BDI involves the measurement of STB. In item 9 of BDI, participants chose one option that describes them best: Option 1, “I don't have any thoughts of killing myself.”; Option 2, “I have thoughts of killing myself, but I would not carry them out.”; Option 3, “I would like to kill myself.”; Option 4, “I would kill myself if I had the chance.”. In line with the current definition of S<sup>+</sup>/S<sup>-</sup> in the clinical dataset, we identified S<sup>+</sup> group as choosing Option 2, 3, or 4, while participants selecting Option 1 were categorized as S<sup>-</sup> group.”
  
  Page 19:
  
  “Given significant correlations between group, approach parameter, and gambling rate for gain trials (ps < 0.017), we further conducted a mediation analysis with the assumption of the mediating effect of approach motivation of suicidality on the risk behavior. Given that we aimed to test the effect of STB, with S<sup>-</sup> and HC as controls, and given that S<sup>-</sup> and HC did not differ in gambling behavior or in the approach parameter, we merged these two groups for the mediation analysis. Results supported our hypothesis (a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; Figure 2C), confirming that suicidal thoughts and behavior increase risk behavior through stronger approach motivation.”
  
  Page 26:
  
  “However, we did not observe any significant correlation between mood sensitivity to CR and gambling behavior (ps > 0.389), which suggests that the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients. Alternatively, this non-replicated result may also reflect sample-specific or unstable effects, which needs to be interpreted with caution.”
  
  (5) In interpreting their results, the authors use terms such as "motivation" (line 594) or "risk attitude" (line 606) that are not clear. In particular, how was risk attitude operationalized in this task? Is a bias for risky rewards not indicative of risk attitude? I ask because the claim is that "we did not observe a difference in risk attitude per se between STB and controls". However, it seems that participants with STB chose the risky option more often, so why is there no difference in risk attitude between the groups?
  
  Thank you for pointing out the ambiguity. In our manuscript, “motivation” and “risk attitude” are defined at the computational level. Following prior work with this task Rutledge et al., (2015, 2016), we decompose observed gambling into (i) value-dependent valuation parameters that capture risk attitude (e.g., risk aversion and loss aversion, which scale the subjective value of outcomes), and (ii) value-insensitive, valence-dependent biases that capture approach/avoidance motivation. Accordingly, a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups which is what we observe for S<sup>+</sup> vs. controls. We have clarified this point in the computational modeling section.
  
  Pages 12-13:
  
  “Please note that a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups. Risk attitude is indeed conceptualized in economics as the curvature of the utility function (i.e., the subjective value) of the objective outcomes, with concave curves associated with risk aversion, and convex curves associated with risk seeking [54,56]. By contrast, the approach or avoidance bias apply to all the value. A possible interpretation of the approach bias is that participant approach the option with the highest possible gain (the lottery) in the gain frame; the avoidance bias would then reflect a tendency to systematically avoid the highest potential losses (the lottery) in the loss frame.”
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This article addresses a very pertinent question: what are the computational mechanisms underlying risky behaviour in patients who have attempted suicide? In particular, it is impressive how the authors find a broad behavioural effect whose mechanisms they can then explain and refine through computational modeling. This work is important because, currently, beyond previous suicide attempts, there has been a lack of predictive measures. This study is the first step towards that: understanding the cognition on a group level. This is before being able to include it in future predictive studies (based on the cross-sectional data, this study by itself cannot assess the predictive validity of the measure).
  
  Strengths:
  
  (1) Large sample size.
  
  (2) Replication of their own findings.
  
  (3) Well-controlled task with measures of behaviour and mood + precise and well-validated computational modeling.
  
  Weaknesses:
  
  I can't really see any major weakness, but I have a few questions:
  
  (1) I can see from the parameter recovery that the parameters are very well identified. Is it surprising that this is the case, given how many parameters there are for 90 trials? Could the authors show cross-correlations? I.e., make a correlation matrix with all real parameters and all fitted parameters to show that not only the diagonal (i.e., same data is the scatter plots in S3) are high, but that the off-diagonals are low.
  
  Thank you for raising these thoughtful concerns. The current task consisted of 90 choices and 36 mood ratings. There were 5 choice parameters and 4 mood parameters. The apparently strong identifiability is not unexpected, as 90 choice trials and 36 mood ratings are comparable to those in prior computational modeling literature (Blain & Rutledge, 2022).
  
  As suggested, we computed cross-scorrelations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery. Please see Supplementary Pages 2-3.
  
  “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”
  
  Page 10：
  
  “The numbers of choice trials and mood ratings were comparable to those in prior computational modeling studies [34,35].”
  
  (2) Could the authors clarify the result in Figure 2B of a correlation between gambling rate and suicidal ideation score, is that a different result than they had before with the group main effect? I.e., is your analysis like this: gambling rate ~ suicide ideation + group assignment? (or a partial correlation)? I'm asking because BSI-C is also different between the groups. [same comment for later analyses, e.g. on approach parameter].
  
  Thank you for pointing out the lack of clarity. We performed group difference analysis and correlation of suicidal ideation analysis, separately. We first performed group difference analysis to test our hypothesis of STB effects. We then conducted correlational analysis to further specify our findings.
  
  (3) The authors correlate the impact of certain rewards on mood with the % gambling variable. Could there not be a more direct analysis by including mood directly in the choice model?
  
  Thank you for this insightful suggestion. As suggested, we tried to integrate mood into choice models by adding mood bias component(s) in line with previous literature (Vinckier et al., 2018). The first model (mcM1) assumes that mood biases choice, building on cM3 (the winning choice model). cmM2 further separated the mood bias parameter into two components according to participants’ choices.
  
  However, model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see Supplementary Note 6.
  
  (4) In the large online sample, you split all participants into S+ and S-. I would have imagined that instead, you would do analyses that control for other clinical traits. Or, for example, you have in the S- group only participants who also have high depression scores, but low suicide items.
  
  Thank you for this insightful suggestion. Following prior suicide-related literature (Tsypes et al., 2024), we controlled for depression by including them as covariates. Note that depression scores were derived from our established bifactor model (Wang et al., 2025), which decomposed depression from the anxiety. These results remained largely significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.
  
  Please see our clarifications below:
  
  Page 26:
  
  “After controlling for depression severity using our established bifactor model (see ref 60 for details), these results remained significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.”
  
  Reviewer #3 (Public review):
  
  This manuscript investigates computational mechanisms underlying increased risk-taking behavior in adolescent patients with suicidal thoughts and behaviors. Using a well-established gambling task that incorporates momentary mood ratings and previously established computational modeling approaches, the authors identify particular aspects of choice behavior (which they term approach bias) and mood responsivity (to certain rewards) that differ as a function of suicidality. The authors replicate their findings on both clinical and large-scale non-clinical samples.
  
  (1) The main problem, however, is that the results do not seem to support a specific conclusion with regard to suicidality. The S+ and S- groups differ substantially in the severity of symptoms, as can be seen by all symptom questionnaires and the baseline and mean mood, where S- is closer to HC than it is to S+. The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence. Therefore, the results do not adequately deconfound suicidality from general symptom severity.
  
  Thank you for this important comment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.
  
  As pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M<sub>1</sub>), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.
  
  Please see Table S7, S8 &S9 and our revisions below.
  
  Page 17:
  
  “Within patients, this group effect on gambling rate remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.024; also see Figure S11, Table S7 and Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) to extract main components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. To further control for anxiety and depression, linear regression using these components as covariates revealed that the group effect on gambling rate remained significant (p = 0.024; Table S9).”
  
  Pages 18-19:
  
  “Within patients, this group effect on the approach parameter remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.027; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on approach parameter remained significant (p = 0.027; Table S9).”
  
  Page 21:
  
  “Within patients, this group effect on βCR remained significant after controlling for gambling rate, earnings, mood-related outcome effect, mood drift effect, sex, illness duration, family history, diagnosis, and various medications use (ps < 0.032), as well as general symptoms (e.g., depression and anxiety; p = 0.001; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on this mood parameter remained significant (p = 0.001; Table S9).”
  
  (2) The second main issue is that the relationship between an increased approach bias and decreased mood response to CR is conceptually unclear. In this respect, it would be natural to test whether mood responses influence subsequent gambling choices. This could be done either within the model by having mood moderate the approach bias or outside the model using model-agnostic analyses.
  
  Thank you for this important suggestion. As suggested, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model). Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2). Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see Supplementary Note 6.
  
  (3) Additionally, there is a conceptual inconsistency between the choice and mood findings that partly results from the analytic strategy. The approach bias is implemented in choice as a categorical value-independent effect, whereas the mood responses always scale linearly with the magnitude of outcomes. One way to make the models more conceptually related would be to include a categorical value-independent mood response to choosing to gamble/not to gamble.
  
  We apology for the unclear statement. The approach bias is implemented in choice as a continuous value-independent effect, ranging from -1 to 1.
  
  It was true that the mood responses always scale with the magnitude of outcomes, since mood ratings were request after the outcomes. Therefore, mood parameters and the approach bias were both continuous.
  
  We also attempted to integrate mood into choice modelling. See Response 2 for Reviewer 3 for details.
  
  (4) The manuscript requires editing to improve clarity and precision. The use of terms such as "mood" and "approach motivation" is often inaccurate or not sufficiently specific. There are also many grammatical errors throughout the text.
  
  Thank you for this important suggestion. We have now explained motivation and mood in the Introduction section and the computational modeling section. Please see our clarifications below:
  
  Pages 3-4:
  
  “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory [18,19], that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion) [20].”
  
  Page 5:
  
  “Although mood is thought to persist for hours, days, or even weeks [30–33], momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes [30,32,34–38]. Momentary mood external validity is demonstrated e.g., through its association with depression symptoms [37]. Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g., from surprise to fear) [31–33,39].”
  
  We have corrected grammatical errors throughout the manuscript.
  
  (5) Claims of clinical relevance should be toned down, given that the findings are based on noisy parameter estimates whose clinical utility for the treatment of an individual patient is doubtful at best.
  
  Thank you for this comment. We agree that we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters, which is outside the scope of the study, and it is indeed possible that parameter estimate is somehow noisy. Therefore, we tone down the clinical relevance of our results. Please see our revision below:
  
  Page 32:
  
  “Next, we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters and it is indeed possible that parameter estimate is somehow noisy.”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Title: I believe "aberrant mood dynamics" is both too general and overstating the results of this study, which did not measure mood dynamics longitudinally. "Aberrant" is also overly pathologizing. I would suggest sticking more directly to the results, for instance, "Insensitivity of momentary mood to non-risky rewards in adolescent suicidal patients".
  
  Thank you for this suggestion. We have now corrected it.
  
  (2) Abstract: in line 61, "Our study uncovers the cognitive and affective mechanisms" suggests that these are the only ones, and you uncovered them. Of course, there could be more mechanisms contributing to risk behavior in STB, so I would suggest removing the word "the" or adding "one of the".
  
  Thank you for this suggestion. We have now corrected it.
  
  (3) One major weakness of this study is that suicidal thoughts and behaviors were not assessed via a clinical instrument such as the Columbia Suicide Severity Rating Scale - this should be mentioned upfront.
  
  Thank you for this comment. According to medical records and information from family and friends by the researcher and psychiatrists, patients with suicidal thoughts and behaviors were categorized as suicidal group (S<sup>+</sup>), while patients without suicidal thoughts and behaviors were identified as control group (S<sup>-</sup>). Note that medical records and information were recorded from clinical interviews where the psychiatrists were vigilant for signs of suicidal ideation and inquired about suicidal-related thoughts and behaviors from both the patients and their families. Therefore, the current group operation was possibly comparable to Columbia Suicide Severity Rating Scale.
  
  (4) Table 1: female/male are sex, not gender (gender is man/woman/transgender/non-binary).
  
  Thank you for this suggestion. We have now corrected it.
  
  (5) Equation 1: It would be good to clarify what happens in gain-only or loss-only trials (the other value is then 0, but this can be clarified as it is not technically a loss or a gain).
  
  Thank you for this suggestion. We have now corrected it. Please see below for our revision:
  
  Page 12:
  
  “Please note that V<sub>gain</sub> is 0 in gain trials and V<sub>loss</sub> is 0 in loss trials.”
  
  (6) Figure 1E: The model prediction is not informative here. Given the linear regression model, there is no other option except that the mean prediction would overlap with the mean empirical measurement (unless the model was specified incorrectly). The same is true in Figure 2A.
  
  Thank you for this suggestion. We have now removed plots for model prediction.
  
  (7) Figure 1G: There was no analysis of the differences between groups in terms of earnings, given that the ANOVA was not significant. Still, if the claim is that risky behavior is sometimes suboptimal in this task, it would be good to show that there is a correlation between, say, symptoms of STB across groups and 1) risky behavior and 2) earnings.
  
  Thank you for this insightful comment. In the patient cohort, risky behavior (gambling rate)—but not earnings predicted the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048; earnings, β = 0.001, t = 0.582, p = 0.562). The lack of association for earnings is consistent with the task design, in which there is no stable optimal policy and payouts are only a coarse proxy for decision quality. Future work in learning paradigms, where optimality is well defined, may be better suited to test earning-based links to STB. We have clarified this point below:
  
  Page 32:
  
  “Second, although we assumed that increased risky behavior in STB was suboptimal, the current task was not suited to test this, given the task design of random feedback for gambling option. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB.”
  
  (8) Line 290: "beta_gain: -1-1" is unclear. I believe you meant beta_gain \in [-1,1].
  
  Thank you for this suggestion. We have now corrected it to make it clear.
  
  (9) The gain and loss biases are modeled as minimum and maximum probabilities for choosing the gamble. This is a legitimate choice for value-agnostic biases, but it is not the traditional choice (as far as I know). I wonder if the same results would hold with the more traditional formulation of the bias as an added constant to the utility of the gamble, i.e., p(gamble) = 1/(1+ exp(-mu(U_gamble + beta_gain - U_certain)). I believe in this case, you would also not have to specify different equations for positive or negative biases, or to limit the bias to the range of [-1,1] (indeed, the bias would be in reward-equivalent units).
  
  Thank you for this suggestion. The winning choice model we used here was consistent with previous literature (Rutledge et al., 2015 & 2016), which decomposed the decision process into risk-attitude-driven valuation (e.g., loss and risk aversion) and value-insensitive motivational components. These approach/avoidance parameters are a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference.
  
  As suggested, we also compared the traditional bias choice model. Model comparison did not support this. Please see Supplementary Page 4.
  
  (10) Also, for equations 5-8, it seems that 5-6 are identical to 7-8 except for the use of beta_gain versus beta_loss. You might want to consider simplifying by putting beta in the equations and specifying in the text that, depending on the trial type (loss or gain), the relevant beta is used.
  
  Thank you for this suggestion. We have now simplified it. Please see our revision below:
  
  (11) It is not clear what equations are applied to mixed trials in cM3.
  
  Sorry for the confusion. We have now clarified this point.
  
  Page 12:
  
  “Approach/avoidance parameters are not applied to in mixed trials.”
  
  (12) Model comparison: the mood models are nested within each other (e.g., mM3 can be derived from mM1 by setting beta_EV = beta_RPE). In this case, model comparison can use the likelihood ratio test instead of BIC, which can be too conservative (and therefore does not support the extra beta parameter for RPE, different from previous results in the literature). I wonder if a likelihood ratio test would lead to results more in line with previous findings with this task?
  
  Thanks for this suggestion. We agree that mM1 (CR+EV+RPE) and mM3 (CR+GR) are nested. However, our model space also included unnested models, such as mM5 (CR+GR<sub>better</sub>+GR<sub>worse</sub>). Therefore, it was not reasonable in our model space to use likelihood ratio tests.
  
  (13) Line 346: The replication sample is described as "healthy participants," however, their health (or mental health) status was not assessed, and they may as well have mental health concerns. I would suggest calling this a general sample or an undifferentiated sample - but not a healthy sample.
  
  Sorry for the confusion. We have now corrected this phrase.
  
  (14) Line 363: "in addition to the replication of previous findings in the validation dataset" is unclear. Are those tests not two-tailed?
  
  Sorry for the unclear statement. In the replication analyses, we used one-tailed t-tests because the direction of the effect was revealed on the clinical dataset. Please see our clarification below:
  
  Page 15:
  
  “For the replication of previous findings in the validation dataset, we used one-tailed tests in line with our clinically motivated directional hypothesis.”
  
  (15) Line 372: "validating our group manipulation" - the presented work does not have a manipulation. Maybe you meant "validating our grouping of participants"?
  
  Thank you for this suggestion. We have now corrected it to make it clear.
  
  (16) Figure 2B: It is not clear how the data were binned for illustration purposes only, and why this binning is necessary (I have not seen it in other papers) - presenting the data from each subject and the correlation line with error margins (as is done here) should be sufficient.
  
  Thank you for flagging this. For illustration only, we binned the data proportional to group sizes: in the patient sample (S<sup>-</sup> n = 25; S<sup>+</sup> n = 58; ≈1:2), we displayed 3 bins for S<sup>-</sup> and 6 bins for S<sup>+</sup>. We agree that binning is not necessary; all statistics were computed on raw, unbinned data. The binned panel was included solely for visualization, consistent with our prior work (Blain et al., 2023).
  
  (17) Table 2: delta BIC should be presented per subject (that is, divided by the number of subjects in each group), as the groups are of different sizes, so as presented now, the columns are not comparable across groups.
  
  Thank you for the helpful suggestion. Our goal in Table 2 is not to compare ΔBIC magnitudes across groups, but to identify the winning model within each group. The ΔBICs are aggregated at the group level solely to rank models for that group. Dividing by the number of participants would rescale each group’s column by a constant and would therefore not affect the within-group ranking or the conclusion that cM3 is the best model in all groups. For this reason, we retain the current presentation and interpret each column within group rather than across groups.
  
  (18) Line 640 - the effect of expectations and prediction errors on mood was not only shown in healthy people, but also in people with depression (Rutledge et al., 2007, https://pubmed.ncbi.nlm.nih.gov/28678984/)
  
  Thank you for this comment. Indeed, Rutledge et al., (2017) showed evidence for CR+EV+RPE mood model in adult people with depression. However, our study recruited adolescents with depression or anxiety, given that adolescent period might provide a developmental window for opportunities for early intervention of suicidality. Therefore, it is also possible that the current winning model was specific to adolescents. Please see our clarifications below:
  
  Page 28:
  
  “It is also possible that the current winning model was specific to adolescents. Given that Rutledge et al., (2017) supported the “CR-EV-RPE model” in adults with depression, our study with adolescent populations may suggest a developmental change for mood sensitivities.”
  
  (19) Supplemental material: Is the R2 section about R-squared? Perhaps you can use superscript on the 2 to make that clearer? For Figure S2, how was model recovery determined? Should I interpret the confusion matrix as suggesting that the winning model for each and every simulated subject was the generating model, or was the winning model determined for the whole simulated population in each of the 100 simulations? Traditionally, confusion matrices use the former measure, but the results of 100% recoverability make me suspect the latter was used here. In Figure S3, should we not be looking at simulated parameters and recovered parameters? What are "real parameters" here?
  
  Thank you for these important comments. We now consistently denote the coefficient of determination as R<sup>2</sup> (with a superscript 2) throughout the manuscript and Supplementary Materials.
  
  For the model recovery analysis in Figure S2, we have clarified that the confusion matrix is computed at the population level. Specifically, for each of the 100 simulations we generated a full dataset under each candidate model, fit all models to that dataset, and selected the winning model based on group-level model evidence (BIC). Each cell in the confusion matrix therefore reflects the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. This operation was reasonable because the decision of the winning model is made on the population-level dataset rather than on individual subjects.
  
  In Figure S3, the term “real parameters” referred to the parameters used to generate the simulated data. To avoid confusion, we now relabel these as “simulated (generating) parameters” and explicitly describe the figure as showing the relationship between simulated (generating) parameters and recovered parameters. Please see Supplementary Pages 2-3:
  
  “Model recovery： We generated 100 simulated datasets for each model (3 choice models and 8 mood models) using the fitted parameters of each model as the ground truth. Each dataset contained 201 trials and included 3 (or 8) sets of simulated data corresponding to the respective models. For each simulated dataset, we then fit all models and determined the winning model at the population level based on group-level BIC, yielding a confusion matrix in which each entry represents the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. As shown in Figure S2, all models are highly identifiable, indicating excellent recovery performance for both the choice and mood models.”
  
  “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“generating”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”
  
  Typos:
  
  (1) Line 90: original → originate
  
  (2) Line 596-598 - the same phrase is repeated twice.
  
  (3) Line 616: on the other word → hand.
  
  Sorry for the mistakes. We have now corrected them throughout the manuscript.
  
  Reviewer #2 (Recommendations for the authors):
  
  For people unfamiliar with interpersonal theory or motivational-volitional model, or three-step theory (lines 105-106), could you briefly explain the key idea of mood and suicide before going to the decision-making tasks? And from this, maybe motivate the predictions in your task? In particular, in the abstract and introduction, the phrasing could be a bit more concise and simpler. In the abstract, sentences were sometimes quite long. In the introduction, some paragraphs are somewhat repetitive. In the discussion, there were some typos.
  
  Thank you for these suggestions. We have now explained the key idea of mood and suicide before going to the decision-making tasks in the introduction, which can be seen below:
  
  Pages 4-5:
  
  “Contemporary theories of suicide converge on the idea that STB is initially caused by low mood experience. The interpersonal theory of suicide proposes that suicidal desire arises when people simultaneously feel socially disconnected (“thwarted belongingness”) and like a burden on others (“perceived burdensomeness”), experiences that are tightly linked to chronically low mood [25]. The motivational–volitional model [26] and the three-step theory [27,28] similarly emphasize that when negative mood and feelings of defeat or entrapment are experienced as inescapable, they can give rise to suicidal ideation, and that the progression from ideation to suicide attempts depends on additional factors such as reduced fear of death, increased pain tolerance, and a tendency to act impulsively under intense affect. Some official organizations, e.g., National Institute of Mental Health, have also listed mood problems as warning signals [8]. Interestingly, within the framework of decision making under uncertainty, gambling on lotteries with a revealed outcome has been found to induce high mood variance [29], providing an opportunity to assess the relationship between deficient mood and increased gambling decisions in STB.”
  
  We have also refined the wording and corrected typos throughout the manuscript.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Since many readers might only read the abstract, it is important that it is both informative and accurate. I have two suggestions in this respect. First, for the abstract to be more informative, it may be helpful to indicate already there that these are value-insensitive approach-avoidance parameters, in the sense that they favor/disfavor the gamble regardless of the potential outcomes' magnitude or probability. This issue is also present throughout the text, where the phrases "approach and avoidance motivation" are referred to as if they have established and precise computational definitions. In my view, these terms could just as easily be interpreted as parameters that multiply the value of potential gains or losses, which is not what the authors mean. It would be helpful to clarify this terminology.
  
  Thank you for these suggestions. In line with previous literature (Rutledge et al., 2015 & 2016), approach and avoidance motivation are indeed defined at the computational level, referring to a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. We have cited these papers in the manuscript. We also make it clear to further clarify approach and avoidance parameters in the abstract and introduction. Please see our revisions below:
  
  Page 2 (Abstract):
  
  “Using a prospect theory model enhanced with value-insensitive approach-avoidance parameters revealed that this rise in risky behavior resulted only from a heightened approach parameter in S<sup>+</sup>.”
  
  “Altogether, model-based choice data analysis indicated dysfunction in the approach system in S<sup>+</sup>, leading to greater propensity for gambling in the gain domain regardless of the lottery expected value.”
  
  Page 3 (Introduction):
  
  “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory [18,19], that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion) [20].”
  
  (2) The statement "our study uncovers the cognitive and affective mechanisms contributing to increased risk behavior in STB" is overstating the findings, as the study may have uncovered some contributing mechanisms, but likely not all of them. Removing the word "the" would fix this issue.
  
  Thank you for this suggestion. We have now corrected it.
  
  (3) Since mood is typically defined as lasting hours, it's inappropriate to refer to ratings that only reflect the last few trials as self-reports of mood. To be sure, I view the distinction between emotions and moods as quantitative, not qualitative, so I do not think there is a problem studying the former to understand the latter, but to avoid confusion, the terminology should follow common usage.
  
  Thank you for this suggestion. We follow previous work and operational definitions regarding mood (Rutledge et al., 2014, Eldar & Niv, 2015, Vinckier et al., 2018). Emotion is usually a very brief response to a specific stimulus (Emanuel & Eldar, 2023), e.g., leading to rapid changes like surprise then fear. In contrast, mood is defined as a diffuse state that is not specific to one stimulus. Here, we operationally and computationally define mood as an affective state reflecting the recent history of safe and gamble outcomes. We now clarify that point in the main text. Please see our revision below:
  
  Page 5:
  
  “Although mood is thought to persist for hours, days, or even weeks [30–33], momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes [30,32,34–38]. Momentary mood external validity is demonstrated e.g., through its association with depression symptoms [37]. Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g. from surprise to fear) [31–33,39].”
  
  (4) Line 78: The phrases "increase in risk attitude", "decrease in loss attitude", and "decrease in value-independent choice biases" are unclear to me in terms of their directionality. An attitude might be avoidant or embracing. If it is the former then increasing it would decrease risk-taking.
  
  Thank you for pointing out the ambiguity. We have now corrected them throughout the manuscript. Please see our revision below:
  
  Page 4:
  
  “We therefore hypothesized that heightened approach motivation, or weakened avoidance motivation, would account for increased risk behavior in STB.”
  
  (5) Line 125: I was not sure why one would expect the mood response to gamble-related quantities (EV and RPE) to be lower in STB and not higher.
  
  Sorry for the typo. We hypothesized that mood would respond more strongly to gambling-related quantities expected value (EV) and reward prediction error (RPE)—in adolescents with STB than in controls, given prior evidence that STB is associated with greater risk-taking.
  
  (6) The text could use proofreading, as there are many typos. These are from the first 100 lines alone:
  
  (a) Abstract: regardless the lotteries -> regardless of the lotteries'.
  
  (b) Line 78: it remains whether.
  
  (c) Line 80: can each -> each can.
  
  (d) Line 90: may original from.
  
  Sorry for the mistakes. We have now corrected them throughout the manuscript.
  
  (7) The rationale for focusing on the S+ group for mood model comparison is incorrect. The purpose is to identify parameters that vary as a function of suicidality, and for that, the S- group is just as important.
  
  Thank you for this comment. We agree that the S<sup>-</sup> group is as important as the S<sup>+</sup> group. A direct comparison was complicated because the winning mood models differed (S<sup>+</sup>: mM3; S<sup>-</sup>: mM5; Table 3). To ensure comparability, we checked results from both model specifications (mM3 and mM5). The conclusions were convergent: mood sensitivity to certain rewards (CR) was lower in S<sup>+</sup> than in S<sup>-</sup> (see Fig. 3 for mM3 and Fig. S8 for mM5).
  
  (8) There appears to be a contradiction between the inclusion criteria, which include having experienced suicidal thoughts and behaviors, and the definition of the S- group as not having suicidality.
  
  Thank you for pointing out this mistake. The corrected version of inclusion criteria can be seen on Page 7:
  
  “Patients were included if they met the following criteria: 1) both the researcher and psychiatrists agreed on their group classification; 2) they had a current diagnosis of major depressive disorder (MDD; unipolar depression), generalized anxiety disorder (GAD), or bipolar disorder with depressive episodes (BD), confirmed by two experienced psychiatrists using the Structured Clinical Interview for DSM-IV-TR-Patient Edition (SCID-P, 2/2001 revision; see Supplementary Note 1 for details)；3) they were between 10 and 19 years of age; 4) they had no organic brain disorders, intellectual disability, or head trauma; 5) they had no history of substance abuse; 6) they had no experience of electroconvulsive therapy.”
  
  (9) It would be helpful to specify whether mood modeling was based on objective or subjective values, and why.
  
  Thank you for this helpful suggestion. We have now clarified whether mood modeling was based on objective or subjective values, and why. Specifically, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000). Based on this result and for parsimony, we report and interpret the mood modeling results from the objective-value family in the main text. We have clarified this point in Supplementary Note 9.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2023.10.31.23297870v5
www.biorxiv.org www.biorxiv.org

Large-scale synthetic data enable digital twins of human excitable cells

1
1. Public_Reviews 01 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This study presents an interesting approach for finding electrophysiological models that match experimental patch-clamp data. The authors develop a new method for deriving optimized current clamp protocols by training a neural network on synthetic data. This optimized current clamp is then used on both computational training data and on experimental data to predict current gating and conductance parameters that correctly reconstruct the electrical phenotype.
  
  Strengths:
  
  (1) The fitting of gating variables through an optimized patch clamp protocol is interesting.
  
  (2) The inclusion of experimental data is important, and the approach is shown to be effective in fitting them.
  
  Weaknesses:
  
  (1) Some clarity is necessary on the generation and selection of variable IPSC models. With such a large variation in so many parameters, I would expect some resulting parameters to generate non-realistic phenotypes, quiescent cells, etc. Are all 200,000 or 1,100,000 generated cells viable? Or are they selected somehow for realistic cell properties?
  
  Thank you for this important point. We agree that broad parameter variation can generate non-physiological model behavior. Indeed, with the +/-40% perturbation range, some simulated cells produced non-realistic outputs, including quiescent behavior, and failure to generate a complete action potential. These cases were excluded from the dataset. As a result, only cells exhibiting physiologically meaningful and numerically stable behavior were retained for further analysis. We have clarified this selection procedure in the Methods section. We applied a large variation to ensure that all possible combinations and morphologies were included in the training and testing data so the model would readily ingest new data and perform robustly.
  
  (2) The error shown in Figure 4 between different population sizes is not completely explained in the text - there seems to be a minimal difference between a population of 1,000 and 10,000, followed by a very good fit at 200,000. Is there a particular threshold that needs to be crossed where the error drops off? Related, how was the 200,000 number chosen?
  
  Thank you for this observation. We agree that the decrease in error shows a gradual performance improvement as the population size increases, rather than a strict cutoff. As shown in Figure 4, the difference between 1,000 and 10,000 samples is small, but as we continue to increase and get to around 200,000 samples, we see strong error minimization. This indicates how much training data is needed for optimal model performance. This improvement is due to better coverage of the high-dimensional parameter space, which helps the network learn the nonlinear relationships between the parameters and outputs.
  
  We tested a range of training data sets and found that above 200,000 training data sets, the model consistently produced low, stable errors and good test-training agreement. The test error decreased with the training error as the population size increased, indicating better generalization and suggesting that the model accurately predicts unseen data rather than overfitting to the training set.
  
  (3) Related to the point above, the 1,100,000 population for fitting experimental data also needs a more complete explanation: how was this number chosen, and how does the error compare with the other population sizes shown in Figure 4?
  
  Thank you for this question. We found that at a training data set size of 1,100,000 we were able to cover the large parameter space induced by +/-40% parameter perturbation. iPSC-CM measurements are known to exhibit high variability, and we wanted to capture the full range in the training data set so the model could ingest a wide range of experimental data. It is trivial to generate new training data, for example, to capture different experimental conditions like temperature differences, mutations, drugs, or ionic variability. We view this flexibility as a substantial strength of the approach. But the large perturbations we show in this study (+/-40%) allow the generation of a very broad range of cellular phenotypes while maintaining physiologically realistic ionic current properties and action potential behavior. Consistent with Figure 4, increasing population size reduces prediction error and improves generalization. The larger dataset provided more stable, accurate predictions when fitting experimental data, without evidence of overfitting.
  
  (4) Why are the optimized current clamp protocols different between panels A and B in Figure 5? Are they somehow informed by experimental data?
  
  Thank you for this question. The stimulation protocol used in panels A and B is identical. Panels A and B show whole-cell currents recorded under the same stimulation conditions as in Figure 3. The differences reflect variability in the underlying whole-cell ionic currents of the model cells rather than differences in the applied protocol. This is exactly the idea: the exact same protocol will generate different whole-cell currents in individual cells, but the model can find parameter sets for all of them.
  
  (5) Figure 6D: Is the EAD risk in panel D specific to cell 1, 2, or the pooled variants of both?
  
  Thank you for this question. We have clarified this point in the revised manuscript. The EAD risk shown in panel D is computed from the pooled variants of both Cell 1 and Cell 2, rather than being specific to either cell individually.
  
  (6) How sensitive is the fitting to minor parameter variation? Further, if one were to pick, let's say, the next-best-fitting value, would that fall close to the best one? Is the solution found unique, or are there multiple sets with good fits?
  
  Traditional optimization methods, such as Nelder–Mead, directly fit the model to the observed data by iteratively minimizing the error for each dataset. As a result, the solution can depend on the initial parameter guess and may converge to different local minima. In contrast, our approach trains a deep learning model on synthetic data generated from the baseline model, learning a mapping from whole-cell currents to the corresponding 52-parameter sets by minimizing prediction error. The mean squared error (MSE) decreases from approximately 10⁻² to below 10⁻³, with training and test errors overlapping closely, indicating stable training, good generalization, and accurate reproduction of the observed signals.
  
  The model achieves very low MSE and reproduces the electrophysiological outputs with high fidelity. However, accurate reproduction of the outputs does not imply a unique parameter solution. This is illustrated in Figure S1, where baseline and predicted parameter values show close agreement overall, yet small deviations persist across parameters. This indicates that different parameter combinations can yield similar whole-cell behaviors due to parameter correlations and compensatory effects. In such cases, the model learns to predict a representative parameter set that is most consistent with the training data and loss function, rather than converging to a single unique solution within a fixed numerical tolerance.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors present a computational framework for generating "cell-specific" digital twins of human iPSC-CMs from a single optimized voltage clamp recording. Using deep learning trained on > 1 million artificial cells, the authors demonstrate that the model can infer 52 biophysical parameters governing 6 major ionic currents, and the resulting digital twins can reproduce experimentally recorded action potentials.
  
  Strengths:
  
  The framework has clear potential for understanding cellular heterogeneity in iPSC-CMs, predicting individual drug responses, and reducing the experimental burden of multiple patch clamp protocols.
  
  Weaknesses:
  
  There are several concerns about the validation of the model and its clarity. First, the biological variability being modeled in this manuscript is not defined well. It is unclear whether the framework addresses cell-to-cell differences within a single differentiation batch, variability across iPSC lines, or donor-to-donor differences. This ambiguity makes it difficult to interpret what the "digital twin populations" actually represent biologically. Second, the main claim, "the digital twins enable drug testing and arrhythmia prediction that would be impractical experimentally", is not experimentally validated. For example, the E-4031 simulations predict EAD rates, but no direct experimental head-to-head comparison is provided to confirm that these predictions are accurate. Third, technical reproducibility and biological representativeness are not assessed. Single voltage clamp recordings are inherently noisy. Without knowing how much variability comes from the recording process (technical variation) vs true biological differences, it is difficult to judge whether observed "cell-specific" parameter differences are meaningful. In addition, the optimized protocol is claimed to be superior to conventional approaches, but again, no experimental comparison is shown.
  
  The authors should address these concerns, with particular emphasis on clarifying the biological context and providing direct experimental validation. Below are detailed specific points:
  
  (1) Ambiguous definition of iPSC-CM heterogeneity. The authors model "typical iPSC-CM heterogeneity" by varying 52 parameters +/- 40% around a baseline model (Figure 1), generating > 1 million synthetic cells. However, the manuscript does not clearly state what biological variability this model is intended to capture. Is this modeling within-line, cell-to-cell variability (e.g., cells from the same dish or differentiation batch that differ due to stochastic gene expression or maturation state)? Or is this modeling between-line or between-donor variability (e.g., genetic background differences, reprogramming efficiency)? This distinction is critical for interpretation. If the goal is to understand why different cells in the same dish behave differently, then training data should reflect that. If the goal is to compare patient lines or disease models, the framework needs validation across multiple donors or lines.
  
  For example, the experimental validation in Figure 5 uses a single iPSC line (iPS-6-9-9T.B), but how many differentiation batches or dishes were tested, or whether cells came from the same preparation are unclear. Another example is that the wide AP diversity in the training population (Figure 1A) is impressive, but there is no demonstration that real experimental cells actually fall within this assumption range of +/- 40%.
  
  From a biological perspective, iPSC-CMs are known to be highly heterogeneous within lines (maturation state, metabolic differences, epigenetic variation, spatial differences within the same dish, etc) and between lines (different donor/genetic background). Thus, please explicitly state whether the +/- 40% variation is intended to model within-line or between-line heterogeneity, and justify this choice with wet experiment data (or reference to experimental literature on iPSC-CM variability). Please clarify how many dishes, differentiation batches, and time points post-differentiation were used for experimental recordings (Figures 5-6). If the framework is intended to generalize across lines from different donors, please test the model on multiple independent iPSC lines (from different donors).
  
  Thank you for this important and insightful comment. The selected ±40% range was chosen to broadly explore all physiologically plausible electrophysiological behaviors, not to match a specific experimental distribution. Our goal was to cover enough behaviors for the model to learn a reliable mapping between responses and ionic parameters.
  
  We recognize that this approach does not explicitly account for variability between lines or donors. We have a current project focused on extending the framework to include multiple iPSC-CMs from patient donors, but given that the model framework successfully reproduces such a broad range of cell phenotypes, we feel confident that it will readily apply to different genetic backgrounds from patient-specific cells. This study is underway.
  
  We have updated the manuscript to clarify how the modeled variability is interpreted and added a discussion of these limitations. Furthermore, we clarified the experimental conditions, such as the number of differentiation batches and recording settings, in the revised Methods section.
  
  (2) Biological representativeness of single-cell measurements.
  
  The framework generates digital twins from single voltage clamp recordings. The patch clamp recordings in iPSC-CMs are subject to substantial technical variability. The manuscript does not address a fundamental question: "How representative are the measurements from a single cell on the dish (or line)?" In other words, if I measure one cell from a dish of a million cells, does that cell's digital twin tell me something about the dish as a whole, or just about that one cell? The manuscript presents Cell 1 and Cell 2 (Figures 5-6) as distinct individuals, but it's unclear whether these differences reflect true biological heterogeneity or simply sampling variability. I think the authors should perform replicate recordings on multiple cells (e.g., > 10 cells) from the same dish (same differentiation batch) and quantify how much the inferred parameters vary, and then compare between lines.
  
  Thank you for this important comment. We agree that the representativeness of single-cell measurements and the impact of technical variability are important considerations in interpreting the results. In this study, the framework is designed to generate digital twins that reflect the electrophysiological properties of individual recorded cells, rather than to directly represent the behavior of the entire cell population within a dish.
  
  As such, differences observed between Cell 1 and Cell 2 are intended to reflect variability at the single-cell level, which may arise from a combination of biological heterogeneity and experimental variability. We agree that systematic replicate recordings across multiple cells are valuable to quantify the relative contributions of biological and technical variability, and to assess the consistency of inferred parameters. However, this is beyond the scope of the current study. We have added clarification in the manuscript to explicitly state this limitation and to outline this as an important direction for future work.
  
  (3) No experimental validation of the main claim that in silico populations can replace wet experiments.
  
  The most exciting claim in the manuscript is that digital twins enable drug testing and arrhythmia prediction "at scale" without requiring hundreds of patch clamp experiments. Specifically, the authors show that in silico populations derived from two experimental cells (Figure 6C) predict dose-dependent EAD incidence for the IKr blocker E-4031 (Figure 6D), with ~3% of cells showing EADs at 50 nM.
  
  However, this prediction is not validated experimentally. If I actually patch 20-30 real iPSC-CMs and apply 50 nM E-4031, will ~3% of them show EADs, as the model predicts? Without this validation, I think the drug testing framework is purely hypothetical. The model may be internally consistent (e.g., Cell 1's twin behaves differently from Cell 2's twin), but there is no evidence that these in silico populations reflect real biological variability in drug response. Please provide experimental validation that justifies the prediction by digital twins.
  
  Thank you for this important comment. We agree that experimental validation of population-level drug response will be valuable for establishing the quantitative accuracy of the predicted EAD incidence. The E-4031 simulations are intended as a proof-of-concept illustrating how the framework can identify susceptible subpopulations and quantify relative proarrhythmic risk in silico. We agree that direct comparison with large-scale experimental datasets is a key next step, and we are working hard to get the study funded so that we can perform those experiments and bring this technology to scale.
  
  (4) Experimental validation and head-to-head comparison of optimized protocol.
  
  The authors claim that their deep learning-optimized voltage clamp protocol (Figure 3, Figure 4A) is superior to conventional approaches, but they have not validated this experimentally by doing a head-to-head comparison. The manuscript does not compare the optimized protocol to any published voltage clamp designs. If the optimized protocol is genuinely easier to implement and more informative than existing approaches, this would be a major practical advance. But without side-by-side comparison, it is impossible to judge whether the optimization made a real difference.
  
  Thank you for your comment. We agree that comparing directly with traditional voltage-clamp protocols through experiments would be useful. In this study, our main aim was to show that the optimized protocol enhances parameter inference within the modeling framework, not to prove experimental superiority. We have clarified this point in the revised version.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This work uses a convolutional neural network to optimize a voltage clamp protocol to identify features and parameters from human pluripotent stem cell-derived cardiomyocytes.
  
  Yang et al. introduce an innovative experimental framework that integrates computational modeling and deep learning to generate a digital twin of human pluripotent stem cell-derived cardiomyocytes (hPSC-CMs).
  
  Strengths:
  
  The major strength is the methodology used to bridge in silico prediction of cell behavior and mechanistic insights from the experimental dataset.
  
  The approach used in this study represents a significant step toward precision medicine by enabling in silico prediction of cellular behavior and mechanistic insight from experimental datasets. The study addresses an important and timely challenge in stem cell-based and personalized medicine, and the authors compellingly leverage state-of-the-art methods alongside strong expertise in computational modeling and cardiac electrophysiology
  
  Weaknesses:
  
  While the overall approach is highly compelling and the potential impact is substantial, there are two areas where clarification and refinement, particularly in the phrasing and framing used throughout the manuscript, would further strengthen the work.
  
  (1) While the overall goal of the study is compelling, the manuscript would benefit from clearer articulation of how the proposed framework is intended to be used in practice. In particular, it is not entirely clear whether the authors envision this approach as:
  
  (a) a method to extract population-level trends that, when paired with biological data, enhance statistical power and interpretability, or
  
  (b) a strategy capable of constructing a population-based model from limited single-cell recordings. If the latter is intended, additional guidance on the number of action potentials required per cell and the assumptions underlying this extrapolation would greatly clarify the scope and applicability of the method.
  
  Thank you for this thoughtful comment. We agree that the intended use of the framework should be more clearly articulated. In this study, we generate a large synthetic population of iPSC-CM models by varying 52 biophysical parameters governing key ionic currents. A neural network is trained on simulated whole-cell current responses to learn a mapping between current profiles and model parameters. Experimental recordings are then used as inputs to this trained model to infer ionic parameters, rather than directly fitting the model to data. This enables individual recordings to be interpreted within a large, physiologically plausible parameter space and supports population-level analysis of electrophysiological variability. The primary goal of the framework is therefore to facilitate mechanistic interpretation of variability and relate experimental observations to underlying ionic currents. But the longer-term intended goal is to develop digital twins from patient-derived cell lines and then use populations constructed from patient-specific digital twins to screen therapeutics and identify arrhythmia marker vulnerability in a very thorough and high-throughput way. We have clarified this in the revised manuscript.
  
  (2) The manuscript would also benefit from a clearer explanation of how electrophysiological heterogeneity observed in hPSC-CMs is linked to inter-patient variability. Although the authors state that this framework can be generalized to compare patient-specific hiPSC-CM lines, it remains unclear how this generalization is achieved, given the substantial sources of variability intrinsic to hiPSC-CMs (e.g., batch effects, reprogramming strategy, differentiation protocol, and maturation state). As acknowledged by the authors, addressing this level of variability likely requires large datasets; further clarification of how the proposed approach mitigates or accommodates these challenges would strengthen the translational claims.
  
  Below are my suggestions that could help strengthen the claims in the manuscript:
  
  (1) Adding a dedicated section describing the electrophysiological phenotype of the hPSC-CMs used in this study would help justify the choice of the underlying ionic model and the selection of the six ion currents analyzed. These currents are not only developmentally regulated but may also vary substantially across different hPSC-CM lines, which has implications for generalizability.
  
  Thank you for this important suggestion. We agree that providing additional context on the electrophysiological phenotype of the hPSC-CMs strengthens the rationale for both the underlying ionic model and the selection of currents analyzed.
  
  We have expanded the Methods section to clarify this point. Briefly, the ionic currents were selected based on the Kernik-Clancy iPSC-CM model developed in our prior work, which was specifically designed to capture the range of electrophysiological variability observed within an iPSC-CM cell line using a population-based framework. In this model, variation in key ionic conductances is sufficient to reproduce the diversity of action potential morphologies, spontaneous activity, and repolarization dynamics commonly reported experimentally, while avoiding non-physiological behaviors.
  
  Accordingly, we focused on six primary ionic currents that are known to play dominant roles in shaping action potential characteristics and variability in iPSC-CMs. This selection reflects a balance between model parsimony and physiological relevance, enabling the framework to capture the expected spectrum of variability within a given cell line. We also note that the framework is extensible, and additional currents or alternative parameterizations can be incorporated to account for differences across cell lines, donors, or experimental conditions in future studies. See updated discussion.
  
  (2) If feasible, inclusion of patch-clamp data from an additional hPSC-CM line would significantly strengthen the claim that this framework can harmonize and generalize across datasets and cell sources.
  
  Thank you for this helpful suggestion. We agree that adding data from more hPSC-CM lines would improve the framework's generalizability. In this work, our goal was to show that the digital twin framework is data-driven and can easily be expanded to include more hPSC-CM lines, allowing for cross-line comparisons in future studies. We have clarified this and included a discussion of this limitation in the revised manuscript. We are currently seeking funding for patient-specific lines as well to allow scalability.
  
  (3) The authors note that the experimental cells exhibited high variability in action potential morphology. This is an important observation that directly supports the motivation for the study and should be explicitly presented, even if only in the supplementary materials.
  
  Thank you for this suggestion. We agree that explicitly showing the variability in experimental action potential morphology strengthens the motivation for this study. We have now added a section in the discussion discussing this and referencing the many prior studies that focused on iPSC-CM variability, including the studies upon which our initial model (Kernik-Clancy) was based.
  
  (4) In the hERG-blocker experiments, further clarification is needed regarding the biological relevance of the reported 3% incidence of early after depolarizations (EADs). Additionally, an interrupted sentence in this section makes it unclear whether the goal is to demonstrate that the digital twin can capture rare arrhythmic risk events or whether the digital twin is necessary to determine whether this level of risk is clinically meaningful.
  
  Thank you for this important comment. We agree that more clarification is needed on the ~3% EAD incidence and the digital-twin role. This analysis aims to show that electrophysiological variability can create a small, susceptible subpopulation under drug effects, not to set a clinical risk threshold. The observed ~3% EAD incidence reflects the emergence of such a susceptible subpopulation under hERG block. While relatively small, this fraction is important because it arises from modest, physiologically plausible variation in ionic properties and would be difficult to capture using single-cell or small-sample approaches. As described in the Discussion, this variability-driven emergence of EADs provides a quantitative measure of proarrhythmic risk at the population level. The digital-twin framework enables systematic identification and quantification of these rare events, linking cell-level variability to population-level responses. We have revised the manuscript to clarify this point.
  
  (5) The manuscript states that some action potentials were excluded from the experimental dataset. A brief explanation of the exclusion criteria, along with guidance on how to distinguish high-quality from low-quality recordings, would improve transparency and reproducibility.
  
  Thank you for this comment. We agree that the definition of failed recordings should be clarified. We have now specified the exclusion criteria in the Methods section.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) It would be helpful if the network cartoon in Figures 2 and 3 were replaced with a simplified sketch of the actual neural network used.
  
  Thank you. We now have new figures 2 and 3.
  
  (2) Subsection title for the Introduction has a typo.
  
  Thank you. We have fixed it.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Technical quality control criteria are not specified.
  
  The Methods section states that "any incomplete or failed recordings were excluded," but does not define what constitutes a failed recording. The criteria could be subjective.
  
  Thank you for pointing this out. We agree that the definition of failed recordings should be clarified. We have now specified the exclusion criteria in the Methods section.
  
  “Recordings were excluded if they exhibited no spontaneous firing, abnormally slow firing rates, or failed to capture a complete action potential waveform. These criteria were applied consistently across all recordings.”
  
  (2) "Cell-specific" may overstate the claim.
  
  The term "cell-specific digital twins" (title, throughout) implies that the inferred parameters reflect the true biological state of each cell. However, parameters are derived only from curve-fitting to electrophysiological data and do not reflect other biological components (e.g., gene expression, contractility, calcium handling, metabolism, etc). Please consider rephrasing to "electrophysiology-based digital twins", "voltage clamp-matched digital twins", etc.
  
  Thank you for this important comment. We agree that the term “cell-specific” could be interpreted as implying a complete representation of the biological state of each cell. We have also adjusted the wording in relevant sections to avoid over-interpretation.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) I would add the list of the 52 parameters in the method section/SI and not just in the reference. Additional justification of why the perturbation was set as +/- 40% for the 52 parameter or +/- 20% for the EAD population would also help.
  
  Thank you for this helpful comment. We have included model equations and highlighted the 52 parameters in the Supplementary Information and provided additional justification in the Methods.
  
  (2) In Figure 1B, might be helpful to add the axis of the Vm instead of the dotted line indicating 0 mV to show differences in the diastolic potential.
  
  Thank you! We have now updated Figure 1B.
  
  (3) Figure 1C-I might be more impactful to show traces from the AP shown in Figure B to reinforce the impact of a single current in the AP shape.
  
  We have now updated Figure 1C-I to include traces from the AP shown in Figure 1B.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.03.674034v5
socialsci.libretexts.org socialsci.libretexts.org

6.1: Principles of Interpersonal Communication

1
1. LeAndraOnys 01 Jul 2026
  
  in Public
  
  In short, you are testing the compatibility of your schemata with the new people you encounter. Although storytelling will continue to play a part in your relational development with these new people, you may be surprised at how quickly you start telling stories with your new friends about things that have happened since you met.
  
  Story telling is one of my favorite ways to bond with a new friend or coworker. I find it fun not just to share but also to listen. If I think the story is on topic or I think they would find interesting, I like to share. I find this helps build a sense of knowing each other deeper even if we don't get to spend that much time together. As a listener, I get a chance to see what they are like in different scenarios.
Visit annotations in context

Annotators

LeAndraOnys

URL

socialsci.libretexts.org/Bookshelves/Communication/Introduction_to_Communication/Communication_in_the_Real_World_-_An_Introduction_to_Communication_Studies/06:_Interpersonal_Communication_Processes/6.01:_Principles_of_Interpersonal_Communication
www.biorxiv.org www.biorxiv.org

Wound-Induced Syncytia Outpace Mononucleate Neighbors during Drosophila Wound Repair

1
1. Public_Reviews 01 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  This study aims to understand how cell fusion contributes to wound healing using a laser-induced injury in the notum epithelium of a developing fruit fly. The authors meticulously characterize the epithelial fusion events using a live imaging approach and report that syncytia arise by 'border breakdown' and 'cell shrinking'. The syncytial epithelial cells also appear to outcompete mononucleated cells and preferentially dissolve their tangential borders, which correlates with the accumulation of actin at the leading edge.
  
  Strengths:
  
  The strength of this study is the authors' live imaging approach to capture these dynamic fusion events that are a fundamental, yet poorly understood biological process.
  
  Weaknesses:
  
  A major weakness is that all the authors' conclusions are based on descriptive studies, in which the role of cell fusion is not directly tested. This is particularly important because other models of wound induced polyploidization have demonstrated that another cytoskeletal protein, myosin, was upregulated and dependent on endoreplication, and not cell fusion. Therefore it remains unclear to what extent cell fusion, endoreplication, or both are required to outcompete mononucleated cells as well as pool actin as described in this study.
  
  We thank the reviewer for appreciating our live imaging and meticulous approach. In this revision we have identified that the gene Atg1 is required for wound-induced fusion in the pupal notum: when Atg1 is knocked down, there is a reduction in wound-induced cell fusions, both border breakdown and cell shrinking. Analysis of Atg1 knockdown shows that the wounds close more slowly. This is a direct test of the role of cell fusion in speeding wound closure, presented in new Fig. 4.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  Overall, this study provides a thorough description of the formation of syncytia following wounding of the proliferation-competent diploid epithelium of the pupal notum. While this phenomenon has already been described briefly for this particular tissue by the Galko lab in Wang et al 2015, the authors provide a much more detailed description and characterisation of the process providing some novel insights (radial versus tangential border breakdown, cell shrinkage, timings, syncytia outcompeting mononucleated cells, etc.).
  
  Strengths:
  
  This paper provides an elegant, thorough, descriptive characterisation of syncytia-driven wound closure using state-of-the-art confocal live imaging of the pupal notum. The authors show that laserinduced wounding of this diploid, proliferation-competent epithelium results in the formation of syncytia of various sizes in the first few cell rows around the wound edge, which progressively become bigger as healing proceeds. This results in ~50% of cells becoming part of these syncytia. The cell fusion events were convincingly demonstrated by showing the disappearance of p120ctnRFP and E-Cadherin-GFP from cell-cell borders as well as cytoplasmic GFP mixing of GFPpositive cells with a GFP-negative cell.
  
  Apart from cell-cell fusion by border breakdown that mostly happens in the first 2h following wounding, the authors also found that at later stages of wound healing cell shrinkage following cytoplasmic mixing contributed to sycytia formation.
  
  Next, the authors provided some convincing evidence that syncytia outcompete mononuclear cells for being positioned in the first cell row around the wound.
  
  The authors then show that radial border breakdown occurs much less frequently than tangential border breakdown. They suggest that radial border breakdown reduces the requirement for cell-cell intercalations. They also hypothesise that tangential border breakdown might allow fused cells to share resources and provide more resources to be used near the wound edge, e.g. for actomyosin cable formation. To test this, the authors generate single-cell clones that overexpress Actin-GFP. They then show convincingly how a single Actin-GFP-positive cell in the second cell row fuses with one GFP-negative cell in the first cell row. The Actin-GFP signal then spreads in the fused cell and labels some previously unlabelled actin-rich structure near the wound edge which most likely is the actomyosin cable. This provides some evidence for resource sharing by cytoplasmic mixing following fusion.
  
  Weaknesses:
  
  The authors provide some convincing evidence that syncytia outcompete mononuclear cells for being positioned in the first cell row around the wound. The authors suggest that the syncytial cells might be better able to close the wound. However, some genetic studies would need to be done to establish this more convincingly. E.g. Could the authors genetically block syncytia formation and then show that these wounds now heal slower?
  
  We now present such data in new Fig. 4, which describes knocking down Atg1, previously shown by the Leptin lab to promote wound-induced fusions in larval epidermis. We quantify the resulting reduction in fusion in the pupal notum and show that the leading edge advances more slowly to heal the wound.
  
  The authors suggest that radial border breakdown reduces the requirement for cell intercalation. While this might be true it also raises the question of how the various syncytia facing the wound border change shape to allow the shrinkage of the first cell row over time to allow wound closure. None of the four movies included in the study shows the whole wound healing process until the later stages, making it hard to assess this. It would be good to include one such movie showing the syncytia in the whole wound and comment on this point.
  
  In response to the reviewer's request, we now extend Supplemental Video S1 out through 8 hours after wounding (same video as included previously but extended longer). In this video, as in many of the wounds, it is hard to determine the exact moment of closure because a syncytium extends across the wound whereas the nuclei do not. However, during the process of closure, one can clearly observe the large syncytia becoming more wedge-shaped – drastically reducing the section of their perimeter remaining in contact with the wound’s leading edge.
  
  In addition, we now explore how syncytia reduce the need for intercalation in a computational model, presented in new Fig. 7 and Supplemental Videos S5 and S6. One can observe the modeled syncytia becoming similarly wedge-shaped. The modeling shows that the presence of syncytia and their ability to reshape can speed closure by about 1/3 even if the syncytia have no special properties aside from their relative size.
  
  In both the experiments and models, some syncytia are also removed from the leading edge by intercalation, but the presence of syncytia reduces the total number of intercalations needed.
  
  The authors hypothesise that tangential border breakdown might allow fused cells to share resources and provide more resources to be used near the wound edge, e.g. for actomyosin cable formation. They show convincingly through the fusion of a single Actin-GFP-positive cell in the second cell row with a GFP-negative cell in the first cell row that Actin-GFP spreads in the fused cell and labels the previously unlabelled actomyosin cable. While the hypothesis of resource sharing to improve healing is intriguing and makes sense, this experiment doesn't necessarily prove the benefit of resource sharing. It does show cytoplasmic mixing following fusion, now allowing the GFPlabelled actin to diffuse and be incorporated into the actomyosin cable. In a wild-type condition, fusion would not increase the total concentration of resources, although it would increase the total amount of resources within this bigger fused cell. The question is whether resource sharing without increasing the protein concentration is beneficial and increases the efficiency of certain wound healing mechanisms. There might be a benefit of cell fusion, if for example certain resources were only present in limited amounts or if protein transport could increase the concentration locally. To provide better evidence for the hypothesis that resource sharing improves wound healing, maybe the authors could look at the actomyosin cable in a wounded epithelium (such as in Figure 4E, F), in which all cells express MyoII-GFP. The authors could compare the average intensity of the actomyosin cable at the wound edge in mononucleated cells versus in syncytia. If resource sharing is indeed beneficial, it might be that the actomyosin cable is stronger/brighter in syncytia or it forms quicker.
  
  We agree with the reviewer that we have not "proved the benefit of resource sharing". Because we cannot inhibit resource sharing while still allowing cell fusion, we can think of no rigorous way to test this hypothesis. We appreciate the reviewer's suggestion of quantifying the myosin at the leading edge cable, but we can imagine too many caveats to the interpretation to make it worthwhile. Rather, we accept the limitation that this is an untested, perhaps untestable, hypothesis -- but nevertheless intriguing.
  
  We do want to clarify ideas about the concentration of resources after fusion. We agree that the overall concentration of a given resource (mass/volume) throughout a syncytium would be the same as the overall concentration in the unfused progenitor cells; however, a syncytium would have a larger total resource mass to direct subcellularly, allowing for local subcellular concentration to be greater in a syncytium vs. an unfused cell. We demonstrate this subcellular localization of actin in a syncytium twice, in Fig. 7C and E (previously Fig. 6C,E), which we think is evidence for increased local concentration.
  
  The biggest limitation of this study is that the authors don't address how the formation of these syncytia is regulated. While the manuscript in its current form provides some valuable new insights into syncytial-driven wound closure, it would be much more informative if it also provided some mechanistic details. The authors could test if some of the mechanisms shown to regulate syncytial formation in other types of syncytia-driven wound healing are also involved here. E.g. Yorkie was shown to negatively regulate cell fusion in adult syncytial-driven wound closure (Losick et al 2013). The authors could test for the effect of Yorkie-RNAi in the epithelium on wound closure and syncytia formation. Expression of the dominant negative RacN17 also blocked cell fusion in adult syncytial-driven wound closure (Losick et al 2013).
  
  Moreover, JNK activation was shown to be needed in larval syncytial-driven wound closure (Galko and Krasnow 2004). The authors could test JNK pathway reporters to assess pathway activation or test if the JNK pathway is needed for syncytial-driven wound closure by expressing a dominantnegative form of Basket JNK in the epithelium.
  
  Or could syncytia formation be regulated by changes in Integrin-mediated adhesion as shown by the Galko lab in Wang et al 2015? They show that wounding provoked a striking relocalization of PINCH and ILK, indicating the disassembly of functional FA complexes concomitant with syncytium formation. Maybe the authors could investigate some of these.
  
  We investigated the role of JNK in fusion by expressing bsk<sup>DN</sup> on one side of the wound. Comparing the numbers of border-loss fusion on each side, we did not find a significant difference in our seven-sample cohort (see Author response image 1). If we had increased the sample size, we may have found a significant difference with a small effect size, but because of the small difference in fusions on each side we did not think this was worth pursuing. Instead, we include data that the autophagy gene Atg1 is required for cell fusion in new Fig. 4, which begins to address mechanism, and relates the wound-induced fusion described here in pupae to wound-induced fusion shown in larvae. A complete mechanism for wound-induced fusion is outside the scope of this paper, as we focus on the function of syncytia in healing wounds.
  
  Author response image 1.
  
  Another general question that the authors raise but don't address enough is whether syncytia-driven wound closure in proliferation-competent epithelia is any different from the one in post-mitotic, polyploid epithelia. Since the mechanism regulating the former is not known, this remains unclear.
  
  We now include a paragraph on this question in the discussion.
  
  Finally, it is not clear, whether syncytia in these proliferation-competent epithelia get resolved after wound healing. Do they get removed and replaced by mononucleated proliferation-competent cells or do the syncytia stay in the epithelium like a scar? The authors should provide some images of wound areas a few hours after wound closure is complete and comment on this.
  
  To answer the reviewer’s question: some but not all syncytia do get removed during wound closure by remarkable apoptotic/extrusion events. This will be the subject of a future manuscript, as it is outside the scope of this paper focusing on the function of syncytia in promoting wound healing.
  
  Minor points:
  
  Figure 3: It would be better to have the microcopy images alongside the quantifications.
  
  The images in Figs. 1 and 2 show the border breakdown and shrinking cells, and we do not see benefit in adding them in Fig. 3.
  
  Figure 4A: The syncytium at the wound edge here doesn't look straight but wavy. Does it not form an actomyosin cable that straightens the front? Or are there lamellipodia/filopodia?
  
  We assume the reviewer is asking about the wavy edge outlined at 400 min after wounding (now Fig. 5A). As shown by Jacinto and colleagues in the first pupal wounding paper (JCB 2013), the actin cable forms quickly, within 15 minutes; much later actin protrusions extend from the leading edge to close the wound. This result is consistent with the wavy edge 400 min after wounding.
  
  248: The authors suggest an interesting hypothesis that mitochondria or ER could be pooled in fused cells. It would be nice to see some evidence: e.g. by labeling mitochondria and assessing where they are in syncytia versus mononucleated cells and whether they are concentrated around the wound edge.
  
  Although we don't think that exploring mitochondria or ER is central to this manuscript, we agree it would be an interesting question for the future.
  
  141-145 (Figure 4B and C) This example is not completely convincing. First, it is hard to see where the wound edge is. Second, it would be good to include an even later time point when the cell is clearly no longer at the wound edge.
  
  We have revised this figure, now Fig. 5B,C, to include a later image at 360 min after wounding healing, and this additional panel clarifies that the smaller cell leaves the wound edge. As noted in the text, the wound edge is indicated by the cell borders lacking p120ctn.
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  White et al. described laser-induced wound healing of the Drosophila pupal notum. They found that the epithelial monolayer is dynamically induced to form syncytia by cell-cell fusion as an important part of repair. They reveal two processes: cell shrinking and border breakage that occur as part of syncytia formation. Expression of GFP in the cytoplasms of some epithelial cells reveals that cytoplasmic contents mix following injury and the GFP rapidly diffuses between cells. Using live imaging they observe that syncytia expand towards the wound, maintain their positions close to the leading edge, and apparently displace smaller cells. They propose that syncytia redistribute cellular components towards the wound facilitating repair and show that labelled actin becomes concentrated at the leading edge.
  
  Strengths:
  
  The manuscript is interesting and on an important and emerging topic of wound healing in a genetically tractable organism. The manuscript is very well written.
  
  Weaknesses:
  
  There are three major issues that the authors must address: 1. Is cell-cell fusion sufficient to enhance/facilitate wound healing? 2. Characterization of "border breakdown"; Is this phenomenon disassembly of apical junctions following membrane fusion? 3. Are cells really shrinking or is it only the apical domains that "shrink" as the cells join the syncytium.
  
  We thank the reviewer for recognizing the importance of this topic. Our responses to the specific weaknesses are below.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  Major Components:
  
  (1) For syncytia measurements the nuclei are labeled with histone-GFP which is expressed in all cell types. How do you know the nuclei within the cell junctions are epithelial and not another cell type, such as immune cells recruited to the injury site? It would be helpful to verify the number of nuclei per cell using an epithelial-specific nuclear marker as well. This could be via epithelial Gal4-specific expression of a UAS-nls-GFP.
  
  This is an interesting point. In response to the reviewer's question, we investigated by doing the converse experiment, labeling immune cells with hml-Gal4, UAS-GFP, and observing what they do after wounding (analyzing six wounded pupae). They do get recruited to the wound, but they remain either in the wound center or at the basal side of the leading edge. Because they are labeled with cytoplasmic GFP, we would be able to ascertain whether they fused with epithelial cells because they would share their GFP with epithelial cells in the epithelial plane, and they did not. Thus we are confident that the many syncytial nuclei are not derived from immune cells. Our live tracking throughout the manuscript, and specifically of GFP-labeled clones, also supports our interpretation that syncytial nuclei derive from epithelial cells.
  
  (2) The manuscript focuses on cell fusion, but other mechanisms of cell enlargement have been observed to occur during wound healing via endoreplication. To what extent do epithelial cells in pupae notum endocycle or endomitosis post injury? It is unclear if the increase in syncytia size during a 1-2hr period could also be due to endomitosis, which would also increase nuclear number.
  
  Since the first submission of this manuscript, we published our results demonstrating limited wound-induced endoreplication after this type of explosive laser injury to the pupal notum (White et al, 2024, PMID: 38495588). We chose to publish this work separately because we could not offer the same degree of depth for endoreplication as we could for fusion: our pupal notum injury model is extremely well-suited to analyzing cell fusion and wound closure by live imaging; however, it is not particularly well-suited for analyzing endoreplication in fixed tissue. With respect to reviewer's question about endomitosis -- i.e. nuclear divisions that are not accompanied by cell divisions -- even after many years we have not observed an endomitosis event, which would be visible by live imaging, whereas we frequently and easily observe mitosis of diploid cells.
  
  (3) One of the major conclusions of this study is that cell fusion is necessary to pool resources at the leading edge. Therefore it is critical that authors identify a mechanism to inhibit cell fusion to test this assumption.
  
  We now include new Fig. 4, an analysis of the role of Atg1 in promoting wound-induced fusion and wound closure. These results build on the finding of the Leptin lab (Kakanj et al, 2022) that autophagy genes are required for fusion. Our results are consistent with the model that syncytia speed wound closure.
  
  (4) There is evidence that myosin increases in endoreplicating cells during wound healing hence it is, maybe equally - if not more - probable that the increase in resources (here actin-GFP) at the leading edge is dependent on endoreplication instead of cell fusion.
  
  Some of the new data we provide for this manuscript is a correlation between cell size and distance traveled, showing that larger cells travel more within the wound (Fig. 4F,G). Endoreplication would certainly be expected to contribute to increasing cell size, and our published 2024 data indicates that there can be one extra S-phase induced by these types of wounds. Doubling the genome is not a significant contribution to cell size compared to the 10s of nuclei we observe in syncytia from fusion. Nevertheless, we do not claim that actin is the only important resource that can be pooled subcelluarly for the benefit of the cell; we use it only as a proof-of-principle. Finally, we discuss the work on myosin in wound-induced endoreplicating cells (Losick and Duhaime, 2021).
  
  Reviewer #3 (Recommendations For The Authors):
  
  Major comments
  
  (1) Can induction of epithelial fusion enhance wound healing?
  
  Different epithelial cell-cell fusion processes have been well-characterized: i) Trophoblast fusion in the placenta mediated by Syncytins. ii) Viral induced cell-cell fusion mediated by diverse viral glycoproteins (e.g. gp41 from HIV, Hemaglutinin from Influenza, GP from Ebola, and G glycoprotein from VSV). iii) Epidermal, myoepithelial, and other epithelial cell-cell fusion in C. elegans mediated by EFF-1 and AFF-1. iv) Cell-cell fusion in the eye lens (unknown fusogens). The authors may want to compare and discuss the temporal dynamics and intermediates observed in the diverse processes of epithelial cell-cell fusion with the characterization of syncytia formation during wound healing of the Drosophila pupal notum. Since some of these characterized cell-cell fusogens can fuse heterologous cells, including Drosophila S2 cells (Shilagardi et al., 2013; https://pubmed.ncbi.nlm.nih.gov/23470732/), the authors may consider expressing these fusogens in Drosophila pupal notum before, during and after injury. This could determine whether syncytia formation is sufficient to stimulate efficient wound healing.
  
  We thank the reviewer for the suggestion of comparing and discussing temporal dynamics and intermediates observed in the many types of epithelial fusion that are well understood. Regretfully, we do not think this article is the right venue for such a complex discussion, especially since we have little by way of comparison in our own wound-induced fusion data. As for overexpression of fusogens, it is an intriguing idea to force cell fusion with a heterologous fusogen such as EFF-1 and then investigate any resulting changes in wound healing. However, since half the cells within 70 µm of the wound already fuse even without a heterologous fusogen, it seems unlikely we could meaningfully increase the level of cell fusion unless we expressed the fusogen universally, forcing the fusion of nearly all the epithelial cells as well as other cells throughout the body that express pnr-Gal4. Because the overexpression of EFF-1 in C .elegans results in lethality (PMID: 26854231), a widespread induction of fusion would be expected to cause other types of physiological problems that would interfere with the interpretation of wound closure rates. Further, the conditional expression tools in Drosophila allow excellent spatial control, but temporal control is still somewhat low-resolution, so that we would have difficulty expressing EFF-1 before, during, and after wounding at times that would be relevant to understanding wound healing.
  
  (2) The phenomenon of "border breakdowns" described here is not clear. The authors are probably studying the disassembly of the apical junctions following the initiation of membrane fusion and pore expansion. This should be clarified by using membrane labels to directly observe membrane fusion. Researchers have used electron microscopy and membrane fluorescent probes to follow cell-cell fusion. For example, GPI-mCherry, FM4-64, lipid-modified-GFPs (e.g. PH-domain fluorescently labeled proteins) DiO, DiI, and many others. See for example: Markosyan et al., 2016; https://pubmed.ncbi.nlm.nih.gov/26730950/; Mohler et al., 1998; https://pubmed.ncbi.nlm.nih.gov/9768364/; Meng et al., 2020; https://pubmed.ncbi.nlm.nih.gov/32668210/.
  
  We agree completely with the reviewer, that border breakdowns represent the disassembly of apical junctions following initiation of membrane fusion and pore expansion. Direct evidence for this order of events is found in the video stills of Figure 1 panel I and video S2, which show that cytoplasmic GFP is transferred to the fusion partner 14 minutes before there is a visible decrease in the apical adherens junction marker p120ctn. The reproducibility of this order of events is documented in Fig. 3: among 107 GFP-labeled cells, 30 of them first visibly shared GFP with a fusion partner, and then 11/30 displayed border breakdown, 16/30 displayed cell shrinking, and 3/30 did not fuse. This last category is consistent with a fusion pore that closed rather than expanded productively. Although we have obtained TEM images of wound-induced fusion pores, these are included in another manuscript currently in revision and so cannot be included here, and further these EM images do not shed light on border breakdown per se, as only live imaging can establish the relationship between border breakdown and pore formation (GFP-sharing).
  
  (3) The observation of cell shrinking may be misleading. The process the authors describe as "cell shrinking" may involve shrinking of the apical domain, maintaining the cell volume. To clarify this process, the authors may simultaneously label the apical and basolateral domains. It is possible that fusion pore formation occurs in the basolateral, apical, or both domains. The apical shrinking could reflect the migration of the apical junctions following fusion. A similar process has been described in epidermal and vulval cells of C. elegans and other nematodes (Mohler et al., 1998; https://pubmed.ncbi.nlm.nih.gov/9768364/; Sharma-Kishore et al., 1999; https://pubmed.ncbi.nlm.nih.gov/9895317/; Kolotuev and Podbilewicz 2008; https://pubmed.ncbi.nlm.nih.gov/18031720/).
  
  We thank the reviewer for pointing out these examples of cell fusion in nematodes, and we now compare our findings to Mohler et al, 1998. In Fig. 2D, we specifically investigated what happened to the cell volume of these shrinking cells, and we hope we have now clarified both the text and the annotations on the figure to make our findings more clear. In the X-Z plane, the entire cell volume of two shrinking cells is visible from cytoplasmic GFP labeling. For both cells, the cytoplasmic volume moves laterally into the neighboring syncytia, appearing to initiate the movement from the basal-most area of the cell so that 150 minutes after wounding, both cells have a reduced apical footprint and only a whisp of apically-oriented cytoplasm, with the remainder of the cytoplasm having moved into the syncytia. These images make it clear that fusion is occuring, and that when the apical area disappears the corresponding cytoplasm has also moved into the territory of the neighboring syncytium. In response to the reviewer's suggestion, we did try labeling basolateral domains, but the fluorescent proteins we examined are not restricted to the basolateral domain and are difficult to interpret.
  
  Minor comments
  
  (1) Lines 40-43. Repair of injuries has also been observed in non-proliferative syncytial epidermal cells and involves cell-cell fusogens. The authors may want to include this reference: Meng et al., 2020; https://pubmed.ncbi.nlm.nih.gov/32668210/.
  
  We thank the reviewer for the suggestion, and we have included this reference in the Discussion paragraph about fusogens.
  
  (2) Lines 128-130. Is "Shrinking fusion" an "artefact"?
  
  The apical junction shrinks not the cell. I suggest following basolateral membranes to see whether the cell is indeed shrinking as it fuses. The authors may want to share whether the cell volume is maintained but spills into an existing syncytium; the apical junction shrinks because it disappears/disassembles (see also Major comment 3).
  
  As discussed in Major comment 3, we do provide evidence that the cell cytoplasm spills into an existing syncytium. Perhaps the reviewer finds the term "shrinking cell" to be misleading, as we all agree that the cell contents do not disappear. We have updated the manuscript to use the term "apical shrinking" throughout.
  
  (3) Lines 157-159. Are these small cells or instead they are small apical junctions? The interpretation should include basolateral domains of the small cells to determine their size! It is also possible that some small cells have fused with the syncytia but on the basolateral domain without apical junction disassembly.
  
  We appreciate the reviewer's rigor. As noted above, we were not able to analyze the basolateral domains of these cells. Because our all analyses are live-imaging videos, we are able to identify the cells are undergoing apical shrinking and clearly delineate those from stable diploid cells. We now realize that the term "small cells" is confusing and can be mixed up with apical shrinking. These cells are not "small" but normal sized, small only in comparison with the gigantic syncytia around them. We have removed the term "small" from this description.
  
  (4) Lines 204-206. Many genes required for myoblast fusion in Drosophila have been shown to play a role in different stages of cell-cell fusion. Do they play roles in epithelia fusion during wound closure in the pupal notum?. For example, actin polymerization? Dynamin? Ig-domain and integrin cell adhesion machineries?
  
  We now provide a new Fig. 4 that shows that the autophagy gene Atg1 reduces wound-induced cell fusion, as it does in larvae (Kakanj et al, 2022), and importantly these wounds close more slowly. We have not analyzed mutants in actin polymerization because we are confident they would interrupt many aspects of wound healing. The Galko lab has identified that integrins suppress wound-induced cell fusion in larval epidermis, but we have not tested these. We have a manuscript in revision demonstrating a requirement for Dynamin and other endocytosis genes in wound-induced fusion, and without dynamin-mediated fusion, these wounds close more slowly.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.06.25.546442v3
www.biorxiv.org www.biorxiv.org

In vitro sexual dimorphism establishment in schistosomes

1
1. Public_Reviews 01 Jul 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This useful study presents an improved protocol for long-term in vitro culture of Schistosoma mansoni that enables progression toward sexually dimorphic stages, representing a meaningful advance for studying parasite development and reducing reliance on animal models. The findings show that host-specific culture conditions support essential developmental and metabolic functions required for parasite maturation, although development remains delayed compared to in vivo conditions. The evidence is solid overall, but limited pairing efficiency and the absence of egg production indicate that the system does not yet fully recapitulate complete reproductive development.
  
  On behalf of the co-authors, we thank the three reviewers and the editors for their complimentary remarks as well as the major and minor comments/ concerns. Addressing these concerns have led to revisions that improved the manuscript. In particular, further analyses have generated an updated Figures 3 and 4, and Supplementary Tables S1, and S4-S6.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Pichon, Rémi et al. describe an in vitro method for transforming Schistosoma cercariae into mature adult worms. The authors show that human serum (HS) supports parasite growth and differentiation more effectively than fetal bovine serum (FBS). They also observed differences in parasite growth and activity, with worms cultured in HS efficiently digesting human red blood cells (hRBC). Cultured worms were able to pair with ex vivo adult worms and produce eggs, indicating functional maturation suitable for downstream applications such as drug screening. While the experimental approach is comprehensive and supports the advantage of HS culture conditions, the pairing efficiency was low (≈7%) and required long culture periods (70-80 days), highlighting limitations that may affect reproducibility.
  
  We acknowledge the reviewer for the positive highlights. Regarding the low in vitro pairing efficiency, we have now edited the manuscript to clarify a misleading statement related to 7%. We decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed, as it does not accurately represent the actual number of observed worm pairs and it is probably misleading. We have updated the text as follows:
  
  Results, lines 230 ff.:
  
  “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”
  
  We also agree with the reviewer that the extended culture periods required to obtain fully sexually dimorphic parasites remain a limitation. As elaborated in Discussion (see below), key factors, probably derived from the host, are missing in the in vitro system explaining both the slow in vitro development and low rate of spontaneous pairing between in vitro developed, sexually dimorphic male and female worms. This was discussed as follows (lines 340-343): “That said, while our system was highly efficient in producing sexually dimorphic worms, spontaneous pairing between male and female parasites was extremely rare, mainly in aged in vitro cultures (from 80 to 100 days in culture) indicating that other factors, e.g., cholesterol, may be missing [35].”
  
  A major strength of the study, in particular, is that the authors clearly differentiate the effects of FBS versus HS on developmental progression. The conversion rate observed in HS cultures is significant and consistent with previously published data.
  
  While the study has several strengths, some aspects of the work are not fully explored. In particular, the role of hRBC supplementation requires further clarification. Although HScultured worms were shown to digest hRBC more readily, the implications of this observation remain unclear. Specifically, it would be useful to understand whether hRBC supplementation influences (1) long-term culture stability, (2) molecular pathways associated with development and differentiation, or (3) the pairing capacity of the worms. While addressing these questions may not be the main objective of the study, further discussion of these points would strengthen the manuscript.
  
  We agree that deciphering the role of the human Red Blood Cells (hRBCs) supplementation is critical. Regarding the influence of hRBCs on the long-term culture stability in parasite development it has been well established for more than four decades that schistosomes do need red blood cells to grow in culture [Basch, P. F. Cultivation of Schistosoma mansoni in vitro. II. production of infertile eggs by worm pairs cultured from cercariae. J Parasitol 67, 186-190 (1981); Basch, P. F. Cultivation of Schistosoma mansoni in vitro. I. Establishment of cultures from cercariae and development until pairing. J. Parasitol. 67, 179-185 (1981)]. The molecular pathways underlying development, sexual differentiation and pairing and modulated by hRBCs in culture is currently being investigated by our team. We decided not to include these data and analyses in the current manuscript, as they fall outside its scope.
  
  The manuscript is clearly written and represents a valuable contribution to the field. Overall, the experimental approach is sound, and the results support a useful methodological framework for the in vitro culture of Schistosoma worms and the attainment of sexual maturity, particularly for adult male worms.
  
  We thank the reviewer for highlighting the manuscript’s strengths.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors perform confirmation studies of Paul Basch's seminal schistosome work from 1981, demonstrating the development of transformed schistosomules into sexually dimorphic adult parasites, albeit without successful egg production. In addition to the findings from Basch's earlier work, the authors add some new molecular data in the form of an analysis of proliferative cells in in-vitro-derived animals.
  
  Strengths:
  
  The authors successfully confirm experimental results from earlier schistosome researchers, providing a potential new tool for studying schistosome biology without the need for vertebrate hosts.
  
  We thank the reviewer for highlighting the manuscript’s strengths.
  
  Weaknesses:
  
  The display of data from the authors is sometimes difficult to follow/understand where it comes from. For example:
  
  (1) Line 136: The authors claim that parasites in HS and FBS conditions have substantially different mortality rates (11.3 +/- 2.7 vs 5 +/- 2.3) but a quite high p-value (0.8). Analyzing the raw data myself, I obtained a mean of 8.2 +/- 1.7% vs 4.8% +/- 4.3% with a p-value of 0.15. Either the data are not clearly presented, and I did not follow them, or the data presented in the text do not match the raw data in the supplemental files.
  
  We thank the reviewer for pointing this out; we have now edited Supplementary Tables S1 and S6 by turning them into a long format for the sake of clarity. Accordingly, Results, Methods sections, and indicated supplementary tables were edited as follows:
  
  Results, lines 142 ff.:
  
  “No morphological differences were observed between parasites cultured either in FBS or HS within the first week in culture; in both conditions most parasites were classified as early schistosomula [category 1: 76% ± 30 (average ± SD) in FBS and 73% ± 29 (average ± SD) in HS] with few lung (category 2) and early liver schistosomula (category 3) (Figure 1B, week 1; Supplementary Figure S1). The mean mortality (category 0) at week 1 was slightly higher, but not statistically significant (P= 0.42), in worms cultured in HS [9.75% ± 2.76 (average ± SD)] compared to the mortality registered in FBS-cultured parasites [5.52% ± 5.18 (average ± SD), Supplementary Table S6], consistent with previous findings [39].”
  
  Methods, lines 463-465:
  
  “To evaluate differences in mortality between HS- and FBS-cultured parasites, data from 5 experiments were combined and analysed using a Shapiro-Wilk normality test to test normality of the data and a non-parametric Wilcoxon rank sum exact test (Supplementary Tables S1 and S6).”
  
  Supplementary Tables:
  
  Supplementary Table S1. “Raw counts of parasites within each developmental stage category. Each row corresponds to a picture of parasites in culture medium containing FBS or HS. Each column corresponds to the raw parasite counts at indicated stage development (categories 0 to 5), time in culture (Time in days - D), and experimental condition.”
  
  Supplementary Table S6. “Summary of all statistical tests employed in this study. 1. Statistical tests of parasite mortality and the raw data table used for this test. 2. Statistical tests for worm size comparisons (correspond to Figure 2). 3. Statistical tests for worm black gut comparisons (correspond to Figure 3). BG: Black gut. 4. Statistical tests for EdU positive cells comparisons (correspond to Figure 4). Replicate code: E, M and L correspond to day 2, 8 and 15 respectively; R and W correspond to the presence (R) or absence (W) of RBCs added 13 days after transformation.”
  
  For clarity, below we provide the R script used to perform the statistical tests on the data shown in Supplementary Table S6 (column ‘Raw count of parasite developmental category per image and experiment’)
  
  Author response image 1.
  
  (2) Line 187/Figure 4: Though it is not clearly stated, it appears that the authors treat their EdU counts as an ordinal data set of 61 steps (from 0 to >60) rather than a continuous measure of EdU+ cells per animal. In this author's opinion, the graph strongly suggests a continuous data set, and the fact that this reviewer had to dig through poorly-labeled raw data to discover the nature of the data is problematic. The authors should either switch to a continuous data set or make it explicit that the data shown are ordinal. If counting EdU+ cells is too arduous, the authors could consider comparing the amount of EdU+ area to the amount of DAPI+ area in maximum intensity projections of their confocal images, as this would roughly approximate the amount of proliferative cells in the animals.
  
  As the reviewer correctly pointed out, the data were treated as ordinal because counting worms with more than 60 Edu+ cells became extremely difficult and highly inaccurate. Therefore, we decided to group in a single category, “60 EdU+ cells”, all worms showing more than 60 EdU+ cells. We have now updated Figure 4 where medians are shown instead of media values, Supplementary Table S5 to provide more comprehensive access to the raw counts, and Supplementary Table S6 to indicate the data for EdU+ cells per worm were considered ordinal. Accordingly, we have revised the corresponding sections as follows:
  
  Results, lines 211 ff:
  
  “HS-cultured schistosomula showed higher numbers of proliferating stem cells, with a median of >48 and >60 EdU+ cells per worm at days 8 and 15, respectively (Figure 4). On the other hand, most FBS-cultured parasites displayed no more than an average of 20 EdU+ cells per worm (Figure 4).”
  
  Methods, lines 520 ff:
  
  “EdU+ cells per parasite were counted for an average of 100 parasites across three independent experiments (Supplementary Table S5). Worms were grouped based on the number of cells per individual, but all those showing ⪰ 60 EdU+ cells were counted in the same group named ‘60 EdU+ cells'. Therefore, the data were considered ordinal data. Statistical analysis was performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 considered significant (Supplementary Table S6).”
  
  Figure 4 legend, lines 830 ff:
  
  “A. Violin plots showing the number of Edu+ cells per worm at indicated time points (2, 8, and 15 days post cercarial transformation) in parasites cultured either in Foetal Bovine Serum (FBS, blue) or Human Serum (HS, light brown). Human Red Blood Cells (hRBCs) were added in the culture at day 13 post cercarial transformation. The small black dots indicate individual worms, and the big black point indicates the median of EdU+ cells per worm. All worms showing ⪰ 60 EdU+ cells were counted and clustered together in the group named ‘60 EdU+ cells’. Hence, the data were treated as ordinal and statistical analysis performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 (*) considered significant (Supplementary Tables S5 and S6).”
  
  We thank the reviewer for the very interesting suggestion to quantify cell proliferation by calculating the ratio between EdU+ area to DAPI+ area in maximum intensity projections images. Measuring the fluorescence area for each worm in maximum projection is an excellent idea; however, due to the number of EdU+ cells present in some samples, we think this technique would not provide additional information or produce more detailed data compared with our analysis when the number of Edu+ cells exceeds 60 per worm. We will certainly consider this approximation for future studies.
  
  There are some minor issues as well:
  
  (1) Line 122: It is perhaps incorrect to refer to humans as "the" definitive host of schistosomes, as S. japonicum is primarily considered a zoonotic infection with water buffalo/cows being the primary definitive host.
  
  We thank the reviewer for pointing this out; we have now replaced ‘schistosomes’ with ‘Schistosoma mansoni’ (current line 131)
  
  (2) Line 185/298: The authors refer to EdU pulse-chase experiments, but the experiments described here are EdU pulse experiments.
  
  This is a very good point, we thank the reviewer for bringing this up and have accordingly edited by replacing ‘EdU pulse-chase’ with ‘EdU pulse’ experiments in lines 37, 204, and 321.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This study is significant as it established a protocol for the long-term culture of Schistosoma mansoni newly transformed cercariae, which developed in vitro into sexually dimorphic forms. The impact of two different sera, Fetal Bovine Serum (FBS) and Human Serum (HS), added to the culture medium supplemented with human red blood cells was evaluated. The authors demonstrated that HS-cultured parasites were able to digest red blood cells, a critical step for long-term parasite development. Furthermore, while most FBS-cultured parasites did not progress beyond an early liver stage, sexual dimorphism was clearly evident in the HS-cultured worms, albeit delayed compared to in vivo development.
  
  Strengths:
  
  This study could contribute to further in vitro studies for a better understanding of the unique sexual biology of Schistosoma mansoni and for screening novel schistosomicidal compounds. By increasing parasite development in in vitro studies, this protocol could have a positive impact on the principles of the 3Rs (Replacement, Reduction and Refinement) for animal research.
  
  We thank the reviewer for highlighting the manuscript’s strengths.
  
  Weaknesses:
  
  As the authors mentioned, "pairing between male and female parasites was rare. Pairing was observed in approximately ~7% of the experiments, usually after day ~ 80 in culture. Egg production was also not achieved with this protocol.
  
  Following the reviewer’s point and to clarify a misleading point, we have now decided to remove the value of 7% - which corresponds to the percentage of experiments in which couples were observed. However, this value does not accurately reflect the actual number of observed worm pairs, and it is probably misleading. We have updated the text as follows:
  
  Results, lines 230 ff:
  
  “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  The manuscript is well-written overall. However, there are some minor revisions that would further improve the clarity and presentation of the data.
  
  (1) At the beginning of the manuscript, it would be helpful to clearly state three to four specific aims or objectives. This would help readers better understand the expected outcomes and the broader methodological contribution of the study.
  
  We agree with the reviewer and accordingly have stated the overall goals of the study, as follows:
  
  Introduction, lines 106 ff:
  
  “We aimed at optimising a platform to study intra-mammalian schistosomes that supports in vitro sexual dimorphism establishment, consequently leading to an overall positive impact in the 3Rs (Reduction, Replacement, Refinement) for animal research (https://nc3rs.org.uk/) [42]”.
  
  (2) In the abstract, you highlighted the relevance of the work according to the 3R principles of reduction in animal experimentation. However, this point is not clearly introduced in the Introduction section. Including a short discussion of this aspect would improve continuity and context.
  
  Following this and previous item raised by the reviewer, we have now clarified the potential impact in the 3Rs by our research outcomes and included that link to the NC3Rs website and a representative reference [Louis-Maerten E, Rodriguez Perez C, Cajiga RM, Persson K and Elger BS (2024). Conceptual foundations for a clarified meaning of the 3Rs principles in animal experimentation. Animal Welfare, 33, e37, 1–11)].
  
  (3) In line 43, please italicize Schistosoma spp.
  
  Edited accordingly.
  
  (4) When discussing the importance of "interfering with sexual development," in line 52, please specify the life cycle stages being referred to.
  
  Revised accordingly as follows:
  
  Introduction, lines 54-56:
  
  “This suggests that interfering with the sexual development of schistosome intra-mammalian stages could potentially restrict human pathology.”
  
  (5) Between lines 56-58, please rephrase this sentence for clarity.
  
  We thank the reviewer for this editorial suggestion. The text has been revised as follows:
  
  Introduction, lines 58 ff :
  
  “Therefore, novel control strategies are urgently needed, and new targets for drug/ vaccine development became a priority. A better understanding of the mechanisms underlying schistosome development, including sexual dimorphism establishment, will pave the wave to achieve this goal.”
  
  (6) In lines 66-68 & line 88, please clarify whether the transcriptomic studies cited were performed in vivo, in vitro, or ex vivo, and indicate the developmental stages analyzed.
  
  We have now included the information suggested by the reviewer as follows:
  
  Introduction, lines 69-70:
  
  “Transcriptomic studies, at both bulk [7-11] and single cell [12-1]4 levels for intra mammalian stages in vivo and ex vivo,...”
  
  (7) Please indicate, in line 110, the day of culture for reference. Without this information, the conversion rates per life cycle stage are difficult to interpret and reproduce. Overall, please try to give an overview in the text of these rates of conversion for context, wherever possible.
  
  Following the reviewer’s question, we have clearly indicated the in vitro and in vivo timings for ‘conversion’ (understood as sexual dimorphism establishment.) We have written:
  
  Introduction, lines 117-120:
  
  “Finally, while most of the FBS-cultured parasites did not progress beyond lung and early liver stage, HS-cultured parasites reached sexually dimorphic stages by week 6, albeit at a slightly delayed rate compared to in vivo development. In the mouse model, parasites become dimorphic by day 21 post-infection (~3 weeks) [12].”
  
  (8) The section beginning with "Furthermore, phenotypic...cell proliferation" (line 110) may be easier to follow if moved earlier in the Introduction.
  
  Following the reviewer’s suggestion, we have moved and slightly rewritten the sentence to current line 112, as follows: “First, phenotypic differences between FBS- and HS- cultured parasites became evident as early as 48 hours in culture, with HS-cultured parasites exhibiting higher rates of cell proliferation resulting in larger worms in the HS condition.”
  
  (9) In line 126, please remove the DOI and add the citation.
  
  Edited accordingly.
  
  (10) When referring to 10-week-old parasites, in line 130, please indicate the developmental stage at which they stalled and relate this to the phenotypic scoring shown in Figure 1.
  
  Based on this suggestion, we have now revised the third paragraph of Results section (‘Sexually dimorphic schistosomes developed entirely in vitro from cercariae’), as follows:
  
  Results, lines 137 ff.:
  
  “The development of schistosomula derived from mechanically transformed cercariae was assessed in at least 15 independent experiments, five of which were maintained over a period of at least 10 weeks to assess parasite survival and ability to mate and produce fertile eggs (Figure 1A; Supplementary Table S1).”
  
  Lines 151 ff.:
  
  “Differences in parasite development between the two conditions became apparent by week 2 (Figure 1B). At this time point, 14.8% ± 24.9 (average ± SD, excluding dead worms) or 36% ± 33.6 (average ± SD, excluding dead worms) of the parasites cultured in FBS or HS, respectively, have reached category 3, i.e., early liver schistosomulum. Parasites in FBS rarely progressed beyond this stage during the 10-week experiment, with very few parasites (<0.1% ± 0.2, average ± SD) reaching category 4, i.e., late liver schistosomulum. In contrast, worms cultured in HS developed over time across all categories, achieving marked sexual dimorphism by week 6 (13.4% ± 18.6, average ± SD) (Figure 1B; Supplementary Figure S3A), as confirmed by PCR (Supplementary Figure S3B; Supplementary Table S2). No differences in the timing for sexual dimorphism establishment were observed between male and female parasites. The mortality rate of FBS-cultured parasites reached an average of 76.24% ± 23.46 (average ± SD) by week 10, after which the experiments under this condition were stopped as most parasites were dead (Supplementary Figure S2). From that time point onwards only parasites in HS were kept in culture. As previously described for the in vivo development of schistosomes [12], in vitro cultured parasites showed developmental asynchrony in agreement with Basch’s observations [33]; however, by week 10 most of the worms in HS (73.7% ± 25.4, average ± SD) acquired an evident sexual dimorphism (Figure 1B).”
  
  (11) In line 142, please provide a standard deviation value for the reported average of 14.8%, if available. As well as the absolute numbers of these parasites or indicate them in the supplementary. Otherwise, it is difficult to understand the true conversion rate.
  
  We followed the reviewer’s suggestions and have now rewritten the text (see above, item 10). In addition, Supplementary Table S1 was edited in long format (see answer for item 1, reviewer #2)
  
  (12) Please explain, IN line 144, why all cultures were maintained for 10 weeks and provide the rationale for this experimental design.
  
  We thank the reviewer for this opportunity to clarify this point and hence improve the manuscript. The experimental condition stopped at week 10 included only FBS-cultured worms, not HS-cultured parasites. This is relevant as most of the parasites in FBS were dead by this time, unlike the HS-developed schistosomes. Indeed, some experimental groups consisting of parasites cultured in HS were maintained for up to 22 weeks. We have now updated the text to clarify this point, as follows:
  
  Results, lines 160 ff.:
  
  “The mortality rate of FBS-cultured parasites reached an average of 76.24% ± 23.46 (average ± SD) by week 10, after which the experiments under this condition were stopped as most parasites were dead (Supplementary Figure S2). From that time point onwards only parasites in HS were kept in culture.”
  
  (13) In lines 146-151, please streamline the timelines of culture conditions and observed outcomes in FBS versus HS media. As the current wording makes interpretation difficult.
  
  Following the reviewer’s suggestion we have streamlined the culture timelines and observed outcomes, as follows:
  
  Results, lines 137 ff.:
  
  “The development of schistosomula derived from mechanically transformed cercariae was assessed in at least 15 independent experiments, five of which were maintained over a period of at least 10 weeks to assess parasite survival and ability to mate and produce fertile eggs (Figure 1A; Supplementary Table S1).”
  
  Results, lines 151 ff.:
  
  “Differences in parasite development between the two conditions became apparent by week 2 (Figure 1B). At this time point, 14.8% ± 24.9 (average ± SD, excluding dead worms) or 36% ± 33.6 (average ± SD, excluding dead worms) of the parasites cultured in FBS or HS, respectively, have reached category 3, i.e., early liver schistosomulum. Parasites in FBS rarely progressed beyond this stage during the 10-week experiment, with very few parasites (<0.1% ± 0.2, average ± SD) reaching category 4, i.e., late liver schistosomulum. In contrast, worms cultured in HS developed over time across all categories, achieving marked sexual dimorphism by week 6 (13.4% ± 18.6, average ± SD) (Figure 1B; Supplementary Figure S3A), as confirmed by PCR (Supplementary Figure S3B; Supplementary Table S2). No differences in the timing for sexual dimorphism establishment were observed between male and female parasites. The mortality rate of FBS-cultured parasites reached an average of 76.24% ± 23.46 (average ± SD) by week 10, after which the experiments under this condition were stopped as most parasites were dead (Supplementary Figure S2). From that time point onwards only parasites in HS were kept in culture. As previously described for the in vivo development of schistosomes [12], in vitro cultured parasites showed developmental asynchrony in agreement with Basch’s observations [33]; however, by week 10 most of the worms in HS (73.7% ± 25.4, average ± SD) acquired an evident sexual dimorphism (Figure 1B).”
  
  (14) In lines 153-159, please clarify comparisons between worms cultured in FBS and HS at equivalent time points (e.g., 2 weeks FBS vs 2 weeks HS), rather than comparing only 10 week cultures.
  
  Following the reviewer’s comment, we have now rewritten the whole third paragraph in Results, under the heading “Sexually dimorphic schistosomes developed entirely in vitro from cercariae” - changes detailed in answers to items 10 and 13 (above).
  
  (15) It would also be helpful to include information on male versus female development in the context of sexual dimorphism.
  
  This is a relevant point that we have not clarified in the original submission - we have now indicated in the text that no differences were detected in the timing for male and female dimorphism establishment. New text included as follows:
  
  Results, lines 159-160:
  
  “No differences in the timing for sexual dimorphism establishment were observed between male and female parasites.”
  
  (16) In line 163, please resolve the editing marks and punctuation.
  
  Resolved accordingly.
  
  (17) In lines 169 and 172, when referring to stages such as "early liver stage," please indicate the corresponding time in culture (e.g., 3 weeks, 7 weeks + 3 days), or define these stage classifications earlier in the manuscript.
  
  Following the reviewer’s suggestion we have now included the developmental category after stating ‘early liver stage’, as follows:
  
  Results, line 187:
  
  “Even though few parasites in FBS reached the early liver stage (category 3)…”
  
  (18) Please indicate, in line 173, the developmental stage of worms used when assessing hRBC digestion in HS and FBS cultures. Additionally, here, it would be useful to discuss how hRBC supplementation may influence worm development beyond culture conditions, including possible molecular mechanisms. As a revision, that way maybe you can include data, if already performed or conduct it, to show the effect of adding or not adding hRBC even in HS cultured worms.
  
  We thank the reviewer for highlighting this important item that warrants further clarification. As stated in Results washed human red blood cells (hRBCs) were added to the culture at day 13. Pilot experiments in which hRBCs were added at different time points had been previously performed; no hemoglobin digestion was apparent when hRBCs were added at days 4, 5 and 6 consistent with previous findings (Correnti JM, Jung E, Freitas TC, Pearce EJ. Transfection of Schistosoma mansoni by electroporation and the description of a new promoter sequence for transgene expression. Int J Parasitol. 2007 Aug;37(10):1107-15. doi: 10.1016/j.ijpara.2007.02.011. Epub 2007 Mar 18. PMID: 17482194.).
  
  Following this observation, we have added a line to clarify this point, as follows (lines 181187): “Based on both previous reports [45], and pilot experiments in which adding human Red Blood Cells (hRBCs) to the culture before day ~10 did not show obvious haemoglobin digestion, we decided to supplement the culture media with hRBCs at day 13. The addition of hRBCs allowed the parasites to feed and thus continue their development [19]. At this point, they began to swallow and degrade erythrocytes, producing hemozoin, a black pigment derived from host haemoglobin degradation and visible in the worms' intestines.”
  
  Regarding the specific effect of adding hRBCs in the culture, this is a very good point. First, it has been well established for more than four decades that schistosomes need red blood cells in culture to grow, as example see (Basch, P. F. Cultivation of Schistosoma mansoni in vitro. II. production of infertile eggs by worm pairs cultured from cercariae. J Parasitol 67, 186-190 (1981); Basch, P. F. Cultivation of Schistosoma mansoni in vitro. I. Establishment of cultures from cercariae and development until pairing. J. Parasitol. 67, 179-185 (1981). Second, we are currently analysing transcriptomic data from parasites cultured in different conditions, including in the presence or absence of hRBCs. We decided not to include these data and analyses in the current manuscript, as they fall outside its scope.
  
  (19) In line 183, please clarify whether the referenced single-cell transcriptomic data were obtained from adult worms.
  
  We have now clarified this point in the manuscript as follows:
  
  Results, lines 199 ff:
  
  “In schistosomes, a complex stem cell system consisting of both somatic and germline stem cells has been described by leveraging recent single cell transcriptomic data across different developmental stages, including schistosomula and adult worms [47].”
  
  (20) In lines 210 and 213, please indicate the absolute number of worms used for these observations, rather than only percentages. If possible, also report any sex bias in pairing.
  
  Following this and a similar item raised by reviewer #3 (public review), we decided to remove the mention of 7% given it is misleading. This percentage corresponds to the percentage of experiments in which couples were observed. However, this value does not accurately reflect the actual number of observed worm pairs, and it is probably misleading. We have updated the text as follows:
  
  Results, lines 230 ff.:
  
  “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”
  
  (21) In the final results section, please clarify whether pairing enhances sexual maturation of already mature worms or whether maturation occurs primarily after pairing.
  
  This is a very relevant point, and we thank the reviewer for giving us the opportunity to clarify it in the manuscript. As described in the manuscript the parasite sexual dimorphism was established in vitro and developed male and female parasites were capable of pairing. Moreover, enlarged oocytes in the ovary’s posterior section of in vitro developed female parasites became apparent after pairing. This observation (Figure 6E, F and Supplementary Video S6) suggests that these female parasites, fully developed in HS-supplemented culture media, were not only capable of pairing, but of starting to fully maturate. We have clarified this aspect in the manuscript as follows:
  
  Results, lines 243 ff.:
  
  “Moreover, in vitro developed females coupled with ex vivo collected mature males displayed signs of primordial ovary maturation with larger oocytes towards the posterior region of the ovary (Figure 6E, F; Supplementary Video S6). On the other hand, females developed in vitro but not paired with ex vivo collected males remained immature.”
  
  (22) Further in the Materials and methods sections, please clarify, isn't 8000 schistosomula/well of a 6-well plate really a confluent culture condition, and does it contribute to NTS mortality in that way, as shown in previous in vitro transformation publications? Please clarify, at least with relative values, percentages of parasite transformation in such a concentrated system.
  
  No formal titration experiments were carried out but based on empirical observations during pilot experiments we decided to add no more than 8,000 schistosomula per well. This is something to further investigate in the future. We have now added the following sentence in Methods:
  
  Methods, lines 423-426:
  
  “The number of parasites cultured per well (~8,000 schistosomula) was determined empirically, as no formal titration experiments were performed. At higher densities (>10,000 per well), more frequent media changes were required, and parasite development appeared to be impaired.”
  
  (23) Also, what was the rationale of adding hRBCs as early as 13 days post-transformation, when the parasites are in the lung and early liver stage, just forming the guts? Therefore, is it possible that this would have contributed to the observation of lesser parasites disgesting hRBCs? Also, were the hRBC supplemented each time with the media change? This was not clear.
  
  We thank the reviewer for these questions. The rationale of adding hRBCs at day 13 has been elaborated above (question 18). In addition, in the mouse model, parasites have already migrated through and left the lungs by day 13 post-infection, as described by Nation et al [Nation CS, Da’dara AA, Marchant JK, Skelly PJ (2020) Schistosome migration in the definitive host. PLoS Negl Trop Dis 14(4): e0007951] as follows: “In the mouse, S. mansoni schistosomula begin to arrive in the lungs between 2 and 3 days post-infection, peaking at around day 7 and lasting until around day 11”. Hence, we do not think that adding hRBCs at day 13 contributed to the observation of fewer parasites digesting hemoglobin, because this was only seen in parasites cultured in FBS, not in HS.
  
  The hRBCs were replaced every two weeks, or sooner if their numbers decreased due to consumption. We have now clarified this point in Methods as follows (lines 427-430): “LTC medium was replaced twice a week and washed human red blood cells (hRBCs) added to a final concentration of 0.02% v/v at 13 days after transformation. Washed hRBCs were replaced every two weeks, or sooner if their numbers decreased due to consumption.”
  
  (24) In the Discussion, please address the limitations related to the relatively late onset and low frequency of pairing in vitro.
  
  Following the reviewer’s suggestion and comments from reviewer #1, we have now included a section in Discussion highlighting the limitations of the study and avenues to overcome these in the future.
  
  Discussion, line 360 ff.:
  
  “Considering these elements in future experiments will help overcome the limitations encountered in this study, including the low rate of spontaneous pairing between in vitro– developed male and female worms and the requirement for extended culture periods (>70 days). In addition, further research is needed to assess the role of host- and parasite-derived cues in schistosome development.”
  
  (25) Figure 1: Please consider adding arrows or markers indicating which parasites correspond to the representative developmental stages used for classification.
  
  We acknowledge the reviewer for the suggestion; however, we respectfully consider this may not be necessary as (1) the images shown in Figure are representative pictures of each time point included for illustrative purposes; (2) Supplementary Figure S1 clearly depicts representative images of worms in each developmental category associated with specific morphological descriptions. For greater clarity we have now added the following text at the end of Figure 1 legend:
  
  Figure 1 legend, line 810-811:
  
  “A detailed description of the developmental categories and representative images are provided in Supplementary Figure S1.”
  
  (26) Figure 2: This plot is somewhat misleading in showing that the HS cultured worms grew significantly more than the FBS worms, where the latter did not grow at all, as also shown by the blue bars all over the plot.
  
  We appreciate the reviewer’s observation; critically, the data shown in Figure 2 represent measurements of the worm's area, which means that some worms may have become longer but thinner maintaining the same area. Most of the FBS-cultured worms did not develop beyond lung or early liver stages, in which the parasites were long/ thin or shorter/wide, respectively. Therefore, the overall area of these FBS-cultured worms almost did not change (please see the raw data and statistical analyses in Supplementary Tables S3 and S6. We believe that, as presented, Figure 2 is sufficiently clear and self-explanatory. However, we would be happy to consider any suggestions to further clarify this point in the manuscript.
  
  (27) Figure 3: For panel A, what is the worm percentage corresponding to? The context is missing. Please clarify in the text.
  
  Following the reviewer’s question and for clarity, we have now (1) modified the axis-legend in Figure 3 as “Percentage of worms displaying or not Black Guts - BG (%)”, and (2) slightly edited the legend as follows:
  
  Figure 3 legend, lines 820-823:
  
  “Bar Plot representing the percentage of Human Serum (HS)- or Foetal Bovine Serum (FBS)-cultured schistosomula with (blue bar) or without (light brown bar) black guts (BG) due to the presence of intestinal hemozoin.”
  
  Reviewer #2 (Recommendations for the authors):
  
  The authors need to clarify their presentation of data. The raw data needs to be more clearly labeled/explained, and the representation of the data in Figure 4A needs to be explicitly described or changed.
  
  We acknowledge the reviewer for highlighting this issue related with the data presentation and have decided to follow their advice by editing Figures 3 and 4, and improving the data presentation in Supplementary Tables S1, and S4-S6. In particular:
  
  Figure 3. We have now modified the axis-legend as “Percentage of worms displaying or not Black Gut - BG (%)”, and slightly edited the legend as follows:
  
  Figure 3 legend, lines 820-823:
  
  “Bar Plot representing the percentage of Human Serum (HS)- or Foetal Bovine Serum (FBS)-cultured schistosomula with (blue bar) or without (light brown bar) black guts (BG) due to the presence of intestinal hemozoin.”
  
  Figure 4. We have edited this figure to show medians instead of media values, and updated the legend as follows: lines 830 ff.:
  
  “A. Violin plots showing the number of Edu+ cells per worm at indicated time points (2, 8, and 15 days post cercarial transformation) in parasites cultured either in Foetal Bovine Serum (FBS, blue) or Human Serum (HS, light brown). Human Red Blood Cells (hRBCs) were added in the culture at day 13 post cercarial transformation. The small black dots indicate individual worms, and the big black point indicates the median of EdU+ cells per worm. All worms showing ⪰ 60 EdU+ cells were counted and clustered together in the group named ‘60 EdU+ cells’. Hence, the data were treated as ordinal and statistical analysis performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 (*) considered significant (Supplementary Tables S5 and S6).”
  
  Supplementary Table S1. We have clarified the data presentation by turning it into a long format and updated the legend accordingly as follows (lines 864-867): “Raw counts of parasites within each developmental stage category. Each row corresponds to a picture of parasites in culture medium containing FBS or HS. Each column corresponds to the raw parasite counts at indicated stage development (categories 0 to 5), time in culture (Time in days - D), and experimental condition.”
  
  Supplementary Table S4. We have clarified the table by turning it into a long format, simplified the data presentation, and updated the legend accordingly as follows (lines 873874): “Percentage of parasites displaying either black positive (hemozoin) or black negative (no hemozoin) intestine.”
  
  Supplementary Table S5. We have simplified the table by turning it into a long format, and explained the naming for elements in columns C (‘Group’) and D (‘Replicate’). We have updated the legend accordingly as follows (line 876 ff.): “Raw counting of EdU positive cells per parasite for indicated experimental group, replicate and experiment in long format. The worms were classified by group (column C) and replicate (column D), using the following code: E (‘early’), M (‘medium’) and L (‘late’), corresponding to days 2, 8 and 15, respectively. R and W correspond to conditions with (R) or without (W) human red blood cells, and HS and FBS to culture medium employed.”
  
  Supplementary Table S6. We have incorporated a new section with the statistical analyses for parasite mortality estimation and updated the legend accordingly as follows (lines 882887): “Summary of all statistical tests employed in this study. 1. Statistical tests of parasite mortality and the raw data table used for this test. 2. Statistical tests for worm size comparisons (correspond to Figure 2). 3. Statistical tests for worm black gut comparisons (correspond to Figure 3). BG: Black gut. 4. Statistical tests for EdU positive cells comparisons (correspond to Figure 4). Replicate code: E, M and L correspond to day 2, 8 and 15 respectively; R and W correspond to the presence (R) or absence (W) of RBCs added 13 days after transformation.”
  
  Reviewer #3 (Recommendations for the authors):
  
  The study was well conducted, and the data presented clearly support the conclusions. The protocol is well described, making it reproducible. The pairing experiments could be improved.
  
  Specific Questions.
  
  (1) "Male and female adult worms that developed in vivo and recovered from mice by portal perfusion on day 42 post-infection were sorted by sex and placed in culture with worms of the opposite sex developed in vitro (>70 days). Within 24 hours of initiating the co-culturing of in vitro developed worms with ex vivo collected worms, couples were observed".
  
  In the interest of clarity, and considering that stating ‘worms developed in vivo were collected from infected mice’ is redundant, we have now shortened and edited these lines as follows (lines 238- 242): “Male and female adult worms were recovered from mice by portal perfusion on day 42 post-infection, sorted by sex and placed in culture with worms of the opposite sex developed in vitro. Within 24 hours of initiating the co-culturing of in vitrodeveloped worms with ex vivo collected worms, couples were observed (Figures 6C, D; Supplementary Video S5).”
  
  (2) Have the authors conducted experiments with in vitro female and male parasites under the same experimental conditions as the in vitro/ex vivo pairing experiments? Is it possible that the tissue culture medium used for the development of sexually dimorphic forms is inhibiting pairing?
  
  The reviewer raises an interesting point that warrants clarification. First, the experimental conditions tested for in vitro developed parasites were the same as for the pairing experiments, as the ex vivo collected worms were washed and placed in HS-supplemented media. Second, as the culture conditions were the same (same culture protocol and medium) between in vitro pairing and in vitro / ex vivo pairing experiments, we do not think that the tissue culture medium used for developing sexually dimorphic parasites inhibited the pairing. As elaborated in Discussion (see below), key factors, probably derived from the host, are missing in the in vitro system explaining the low rate of spontaneous pairing between in vitro developed, sexually dimorphic male and female worms. This was discussed as follows (lines 340-343): “That said, while our system was highly efficient in producing sexually dimorphic worms, spontaneous pairing between male and female parasites was extremely rare, mainly in aged in vitro cultures (from 80 to 100 days in culture) indicating that other factors, e.g., cholesterol, may be missing [35].”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.20.700612v3
www.biorxiv.org www.biorxiv.org

In-cell structural insights into fungal ER stress responses

1
1. EMBOpress 01 Jul 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  General Statements
  
  We thank the reviewers for thoroughly reading our manuscript and their constructive feedback. We have considered each comment carefully and came up with a revision plan that can be found below.
  
  1. Description of the planned revisions
  
  Response to Reviewer #1, #2 and #3 concerning alternative assays to measure the effects of ER stress on ribosome translation:
  
  Reviewer 1: 4. The modest reduction in the translation upon ER stress induction could be supported by alternative biochemical assays such as polysome profiling and amino acid incorporation.
  
  Reviewer 2: Is there a difference in the number polysomes in the non-stressed vs stressed yeast cells? A functional assay, such as polysome profiling combined with nascent-chain labeling might help to support the observation that an increased level of hibernating ribosomes exist in DTT or Tm-treated cells. Has this been considered? Perhaps there is evidence available in the literature?
  
  Reviewer 3: 5. The study mainly provides structural snapshots and population distributions of ribosomal states, without direct functional measurements of translation activity. Could the authors provide orthogonal biochemical or functional evidence supporting reduced translation under these exact stress conditions?
  
  We thank the reviewers for this suggestion. Other studies have shown a decrease in protein synthesis rate (Geronimo RAC et al., 2025 (PMID: 40959222), Pincus et al., 2014 (PMID: 25275008)) under similar conditions which matches our findings. However, we agree that confirming a reduction in translation for our specific conditions will strengthen our findings.
  
  In our lab we have previously performed polysome profiling for mammalian cells, we will adopt this protocol to yeast cells and use this to measure changes in monosome abundance and changes in the polysome to monosome ratio. We will perform this experiment for our main strain (Ire1cGFP) upon 0 hr, 45 min. and 4 hr. DTT treatment. Given the increase in hibernating ribosomes upon prolonged ER stress, we hypothesize that polysome profiles will reveal an increase in monosome subunits, assuming that the technique is sensitive enough to pick up moderate changes in translational activity. In case polysome profiling is not sensitive enough to pick up the moderate change in hibernation, we also aim to quantify the decrease in protein synthesis using C-35 labeling as we have done earlier in collaboration (Fedry et al, Mol. Cell 2024 (PMID: 38340715)).
  
  Reviewer #1 continued:
  
  For identification of hibernating ribosomes, the authors rely mainly on the presence of empty ribosomes along with eEF2, eEF5A, and eEF3. Whether these particles indeed possess known dormancy factors or they are different subclass of empty ribosomes is unclear. Similar analysis in the absence of dormancy factors would strengthen the authors claims.
  
  While empty 80S ribosomes (lacking tRNAs, elongation or hibernation factors) have been described in purified ribosome samples, those seem to be an in vitro re-association artifact as they are never observed in cells (bacteria: Xue et al. Nature 2022 (PMID: 36171285), yeast: Cheng et al. 2025 (PMID: 39789210), mammals: Xing et al. 2023 (PMID: 37410833), Fedry et al. 2024 (PMID: 38340715), etc.).
  
  Instead, in cells ribosomes are found in three possible states:
  
  individual subunits (40S and 60S in eukaryotes),
  
  translating 80S ribosomes; featuring a tRNA in the P-site
  
  hibernating 80S ribosomes; lacking a tRNA in the P-site and hence non translating. Those ribosomes are typically bound by eEF2 interacting with the dormancy factor bound in the mRNA channel, as well as possible additional factors (eIF5A, Dap1, SNOR, etc.). Our Hib class is seen in cells without tRNA in the P-site and bound by eEF2; we can therefore unambiguously assign this class as hibernating ribosomes.
  
  While doing a similar analysis on the translation landscape under ER stress in the absence of hibernating factors may yield some interesting insights, it would also alter the overall stress response and therefore may not help with the interpretation of our current structures. It would require us to repeat our complete workflow with new yeast strains (with Stm1 and/or Lso2 knocked out) and in our opinion the amount of time that these experiments would require do not justify the additional confidence that would be gained from the results. We have therefore decided that these experiments will be beyond the scope of this study. We will add a supplementary figure showing the presence of a density in the mRNA channel further supporting the presence of a dormancy factor interacting with eEF2.
  
  The authors suggest that ER induces modest level of increase in hibernating ribosomes. Adding controls such as glucose deprivation and nitrogen starvation would have provided more strength in relative comparison of these ribosomal sub populations.
  
  To our knowledge, no cryo-ET study has been done to study the effect of glucose deprivation and nitrogen starvation on the abundance of different translational states, making it an interesting and relevant experiment to do. However, it is unknown what change(s) in translational state abundance(s) these low-nutrient conditions might cause, so we are unsure if they could serve as control conditions. Collecting data on yeast under different stresses would require extensive resources, and would not directly address the translational response to ER stress. Therefore, we consider these suggested experiments beyond the scope of this work. We think this is an interesting future research direction and will comment on this in our discussion.
  
  We will include the suggested conditions in our polysome profiling experiments (proposed above in the first part of our revision plan) and analyse their monosome to polysome ratios. These can serve as positive controls for strong translation shutdown. We are grateful for the reviewers suggestion.
  
  The authors show the retention of dormant ribosomes on the ER surface. As usual notion of ribosome association with ER membrane to be dependent on nascent translation, retention of dormant ribosomes on ER membrane is interesting and puzzling. Analysis using strains deleted for dormancy factors may provide more insights on this mechanism.
  
  We agree with the reviewer that the presence of hibernating ribosomes on the ER surface is an interesting observation, but we do not consider it surprising. For yeast and mammals, idle ribosomes bound to Sec61 are well established in vitro (e.g., Becker et al (PMID: 19933108)), indicating that the interaction between these two components is not dependent on active translation. Furthermore, an average of an ER-bound hibernating ribosomes have been found on microsomes derived from human cells, and they become the prevalent form upon DDT-treatment, which strongly suggests that hibernating ribosomes can stay bound to the ER (Gemmer et al., 2023 (PMID: 36697828)). To clarify this point we will refer to these previous findings in our revised manuscript. As our observation is consistent with current knowledge in the field, we do not believe that additional analysis is necessary on this point.
  
  Reviewer #2 continued:
  
  The new Dec3 state might be clarified a bit further by zooming in to the corresponding areas in the Dec1 and Dec2 structures. This is a point of novelty in the paper and should be emphasized for future reference. Does an additional classification algorithm, such as cryo-DRGN-ET, verify the various states, especially the new Dec3 state? The structures should of course be uploaded to EMDB or another suitable server.
  
  We will provide an additional supplemental figure, zooming in on the eIF5a area in Dec1/2/3. Dec3 was found in 3 separate classification runs and we will therefore not perform classification with an alternative algorithm. We will upload the novel structures (Dec3, Hib and the ER-bound ribosome) to EMDB, which will be released upon publication of this manuscript
  
  There is generally a lack of supporting quantification, which will bother a number of readers. For example, a "high confidence rigid body fit" shows additional density in the hibernating state, but what is the confidence? Even the resolution measures of 7-8 Angstrom are simply stated. Presumably they come from a WARP report. There should be some specification for the evaluation. How many lamellae were used, and how many tomograms? Were they taken from different biological experiments, or all collected from the same grid, for each condition?
  
  We agree with the reviewer that this additional information is required to properly judge our conclusions. We will provide a confidence score for the eEF3 fit. We will provide FSC curves as supplemental data, specifying where the FSC curve was obtained from. Local resolution estimates are derived from Relion. Table 4 indicates the number of tomograms collected per sample and we will add the number of lamella/grids used.
  
  Reviewer #3 continued:
  
  Major comments:
  
  For the analysis of ER-bound ribosomes, the authors applied an ellipsoidal mask during subtomogram averaging. However, this masking strategy may not be sufficient because the relative orientation of ribosomes with respect to the ER membrane can be variable, and membrane density may influence particle alignment. The authors may consider including an additional masking step to exclude membrane density and minimize potential alignment bias.
  
  We thank the reviewer for pointing out this confusing point in our manuscript. The ellipsoid mask was only used in the image classification step aiming at separating ER-bound ribosomes from soluble ribosomes. The ER-bound ribosomes were subsequently aligned with a mask comprising the large ribosomal subunit and the membrane. This was crucial for the alignment not to go astray. We will clarify this in the text:
  
  “Alignment and averaging of these particles using a mask comprising the ribosomal large subunit and the membrane region yielded a ribosome with a clear membrane bilayer and an additional density at the exit tunnel”
  
  The signal coming from the ribosomal RNA is very strong (unlike single particles studies of smaller membrane proteins) and typically much stronger than the signal coming from the ER membrane. This strategy is well established in the field (Pfeffer et al. 2014 (PMID: 24407213), 2015 (PMID: 26411746), Braunger et al. 2018 (PMID: 29519914), Gemmer et al. 2023 (PMID: 36697828)).
  
  Supplementary Figures 1B and 1C appear to suggest that the Ire1i-GFP and Ire1i-NG strains exhibit stronger HAC1 splicing upon DTT treatment. Given this apparent increase in UPR activation, it would be interesting to analyze these strains as well to determine whether they display more pronounced changes in translational states.
  
  We thank the reviewer for raising this interesting point. All strains display ~25% of hibernating ribosomes under ER stress. The corresponding analysis can be found in Sup Figures 4 and 5. We will clarify this point by adding a sentence about this and the reference to the corresponding Sup figures: “First, we observed an increase in the relative abundance of hibernating ribosomes (from 3 to 25%) at the expense of some of the major elongating states, like Dec2 and Pre (from 22% to 16% and from 38% to 17%, (Fig. 3E-F, Supp. Fig. 4D-e). A similar effect was observed in the Ire1i-GFP and Ire1i-NG (Sup. Fig. 4, 5). This increase in inactive 80S complexes is indicative of a reduced translation activity in the cell, that is typically caused by the inhibition of translation initiation.”
  
  There is a difference in the magnitude of HAC1 splicing, but all have sufficiently high HAC1 splicing levels to robustly activate ER stress. This can explain why they all show a similar abundance in hibernating ribosomes.
  
  Minor comments:
  
  "FOV" should be defined as "Field of View" upon first use
  
  We will correct the corresponding sentence to: “Cells were then imaged with cryo-ET at an intermediate magnification (6.32-7.09 Å/pix, Field of View (FOV): ~9 µm2), allowing us to laterally capture near-complete cellular ultrastructure in each tomogram (Figure 1B-D)“
  
  In Supplementary Figure 7A, the image quality appears insufficient to clearly resolve structures within the autophagic bodies. As a result, it is difficult to determine whether ER-derived membranes are present within these structures. If ER-like membranes are observed, this could suggest induction of ER-phagy under ER stress conditions, consistent with previous reports (e.g., Mizuno et al., PLoS Genetics, 2020).
  
  The tomogram in supplemental Figure 7A does not contain obvious ER-derived membranes. We have observed membranes in other tomograms but our cryo-ET approach does not allow us to identify their origin (ER or other organelles). Therefore, we refrain from making any claims about ER-phagy in our manuscript and limit our discussion to the more general autophagy.
  
  In the sentence "Using this approach, we identified 7 distinct ribosome states," the authors should clearly specify which strains and treatment conditions were analyzed. Similarly, statements such as "A similar increase in Dec3 was seen for the other conditions" and "Overall, we observed a consistent, stress-independent increase of the Dec3 state at the ER for all Ire1c-GFP conditions" should explicitly define the corresponding conditions in the text.
  
  We agree that these statements are too vague and we will specify the corresponding conditions in each of these sentences in the revised manuscript.
  
  In the sentence "Finally, like for cytosolic ribosome states, we observed that upon ER stress the abundance of hibernating states at the ER increased over time at the expense of other translating states (Dec2 and Pre)," the authors should explicitly reference the corresponding figures.
  
  Agreed, we will refer to the Figure 4D for explicit comparison.
  
  In the References section, "Elife" should be corrected to "eLife" for the citation of van Anken et al.
  
  Agreed, we will correct this citation.
  
  The enrichment of eEF3 on inactive ribosomes leads the authors to propose a possible role for eEF3 in yeast ribosome hibernation improvement or keep. However, this interpretation currently appears speculative because the map resolution for the external density is relatively limited (~9-15 Å). Could the authors strengthen this claim by performing focused refinement/classification of the eEF3 density, testing eEF3 mutants or depletion strains or examining whether eEF3 occupancy changes quantitatively during stress progression?
  
  Based on the abundance and function of eEF3 we deemed eEF3 the most likely candidate to fit this external density. However, we agree that currently the strength of the claim does not match the strength of the evidence. We will try to improve the local resolution by performing a focused refinement on the eEF3 density. Though, eEF3 is a small density for cryo-ET. In this resolution range we are uncertain whether it will improve the quality of the map in this region. We will quantify the quality of the fit.
  
  Regarding the mutant/depletion strains, eEF3 is essential in yeast and it is also required for translation elongation. Hence depletion strains or interaction mutants will also perturb the role of eEF3 in translation. This strongly limits the possibilities to specifically investigate the functional importance of eEF3 for ribosome hibernation.
  
  The current study only examines translational states during ongoing stress exposure and does not investigate whether these changes are reversible after stress resolution.
  
  We thank the reviewer for this interesting point. We hypothesize that the modest translation decrease we observe upon ER stress is most likely reversible. We will check this by including a recovery condition in our polysome profiling experiment.
  
  2. Description of the revisions that have already been incorporated in the transferred manuscript
  
  No revisions have been carried out yet.
  
  3. Description of analyses that authors prefer not to carry out
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  The authors mainly focus on 80S particles in their analysis for suggesting the different states of ribosomes. However, there is a possibility of free subunits being stored under specific condition. Can the authors comment on free 40S and 60S subunits?
  
  The reviewer is correct and several factors binding free ribosomal subunits have been proposed to play a role in translation inhibition and ribosomal subunit hibernation (Saba et al. EMBO J 2024 (PMID: 39533057)). While we appreciate this interesting perspective, we believe that it is beyond the scope of the present work and that incorporating it would distract from the central focus of the manuscript, namely the effect of ER stress on translation elongation dynamics.
  
  However, if the polysome profiling experiments, now planned based on the suggestions of the reviewers, highlight a significant change in 40S and/or 60s subunit abundance relative to 80S we will try to re-analyze our data, focusing on 40S and 60S subunits.
  
  A previous study has reported the storage of dormant ribosomes on the mitochondrial surfaces. Analysis of mitochondria associated dormant ribosomes in S. cerevisiae would shed more light on this phenomenon.
  
  We thank the reviewer for raising this interesting point. Because of our focus on ER stress, our data collection was targeted at the ER, hence only a few of our tomograms contain mitochondria. On the few mitochondria that we did image, we do not observe lattice-like tethering of ribosomes, as described upon glucose starvation in S. Pombe (Gemin and Gluc et al. 2024 (PMID: 39379376)). This tethering is a novel observation, and its function still needs to be explored. Indeed, earlier experiments also indicated that glucose deprivation can induce ribosome binding to mitochondria in S. cerevisiae spheroplasts (Kellems et al., 1975 (PMID: 1092698)). However, initial experiments should first confirm whether this phenomenon also without conversion to spheroplasts before moving on to ER stress.
  
  The structural analysis of mitochondria-associated ribosomes upon ER stress would require new sample preparation of lamellae of control and DTT-treated yeast cells and data collection targeted at mitochondria. It is unlikely to be very different from the modest effect we describe in the cytosol and at the ER membrane. Finally it would distract from the main message of our manuscript centered on the impact of ER stress on translation dynamics. Hence, we consider these experiment beyond the scope of our current work.
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  Were the cryo-FM lamellae maps shown in Fig. S1F-I used to target the tomogram acquisitions? A correlation between the FM and the EM could provide hints about where the Ire1p clusters are located. The puncta are curious somehow, although established in the literature. I'm wondering if they appear somewhere in the tomograms. I would not insist on new experiments to find them, but it would make sense to show if they are already present in the data. Fig S1D does not show a lamella, and it is hard to conclude that the puncta are really absent there. As a general/historical comment, is it clear that the GFP does not affect the protein condensation?
  
  We thank the reviewer for this highly relevant and interesting question. Indeed, the cryo-FM data was used to collect tomograms targeted at Ire1p oligomers. However, none of the conditions (the 3 different cell lines, different timing and different type of stressors) showed detectable clusters in the tomograms.
  
  The absence of clusters can be explained by at least three possible reasons. First, as pointed out by the reviewer, it is possible that the fusion of a fluorescent protein (GFP or NeonGreen) affects the assembly of Ire1p clusters. We think that this is unlikely as these clusters could be visualized by cryoCLEM using a similar fusion constructs in mammalian cells (Tran et al. Science 2021 (PMID: 34591618)). The second possible explanation is a technical limitation. To detect enough fluorescent signal, our cryo-FM data was collected on ~400 nm thick lamellae prior to polishing the lamellae down to Since it is very challenging to convincingly determine which explanation is correct, and it still remains largely speculative to us, we decided to not elaborate on this part of the research effort.*
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  For the description of ER volume changes under ER stress, the manuscript currently presents only tomograms from Ire1c-GFP cells treated with DTT. To strengthen this observation, it would be helpful to also include representative tomograms and corresponding segmentations from additional treatments and strains in the supplementary figures.
  
  Indeed we have only collected these low magnification tomograms on a single strain. Similar to HAC1 splicing, ER expansion in response to ER stress is a widely accepted phenomenon (Bernales et al. 2006 (PMID: 17132049), Schuck et al. 2009 (PMID: 19948500)). There is likely only limited potential for new insights from reproducing these data, and we do not feel the resource investment required is justified.
  
  While cryo-ET enables structural analysis of small cellular volumes at high resolution, volume EM approaches such as FIB-SEM can provide complementary large-scale ultrastructural information. In particular, samples prepared by high-pressure freezing and freeze substitution generally preserve membrane morphology well and closely resemble native membrane architecture. Incorporating such approaches could further support and complement the cryo-ET observations.
  
  We think that additional volume EM experiments cannot be justified here as the enlargement of ER volume upon ER stress has already been well established through various volume EM approaches (eg; Sriburi et al. 2004 (PMID: 15466483), Bernales et al. 2006 (PMID: 17132049), Schuck et al. 2009 (PMID: 19948500), Heinz et al. 2025 (PMID: 40795978)). Here we only collected additional low magnification tomogram and quantified the ER volume on these to confirm that our experimental conditions lead to a similar ER stress response as previously described. We will add additional references related to this volume EM work to the text to clarify this.
  
  Minor Comments
  
  Figure 1: the number of biological replicates (N = 3) is relatively small, particularly considering that yeast samples are generally not difficult to prepare.
  
  The sample size here was not limited by yeast preparation but by cryo-ET data collection time. Because ER expansion upon ER stress is well established in the literature, we only collected a few low magnification tomograms to confirm this effect in our samples, and dedicated most of our microscope time to the collection of high magnification tomograms for the analysis of translation elongation dynamics.
  
  In addition to ER volume expansion, were there any detectable changes in nuclear size or nuclear envelope morphology? Since the nuclear envelope is continuous with the ER network, this could provide additional insight into the cellular response to ER stress.
  
  Only a few of our low magnification tomograms contained parts of the nucleus. The few nuclei that we observed may show an increased distance between the nuclear membranes, but we collected too few examples to reliably quantify this effect. Hence we refrained from discussing this in our manuscript, focused on translation elongation.
  
  The discussion of alternative pathways remains underdeveloped. Specifically, the authors briefly mention Gcn2p and PKA signaling as potential contributors. Yet no experiments directly test whether the observed ribosome hibernation depends on these pathways. Could the authors clarify: whether eIF2α phosphorylation was induced under their stress conditions, whether Gcn2-deficient strains alter the hibernation phenotype and how much of the observed effect is truly UPR-specific rather than a generic integrated stress response?
  
  We touched upon alternative pathways in the discussion to explain that the observed hibernation was plausible. Since the PERK pathway does not exist in yeast, we expect that some readers might be surprised by our findings. We will adjust this paragraph to improve its readability.
  
  The suggested experiments would show which pathways are activated, however the activation of these pathways has already been described in the literature (Pincus et al., 2014 (PMID: 25275008) ; Patil et al., 2004 (PMID: 15314660)). To really explain which factors directly trigger hibernation, various additional biochemical experiments will have to be performed. This is definitely interesting for future research, but beyond the scope of this cryo-ET focused paper.
  
  We agree that we cannot directly attribute the observed changes to the UPR, that is why we focus on ER stress instead of the UPR. We will double check that this is done consistently throughout the paper.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.04.10.717691
www.biorxiv.org www.biorxiv.org

DUAL: deep unsupervised simultaneous simulation and denoising for cryo-electron tomography

1
1. Public_Reviews 01 Jul 2026
  
  in eLife
  
  Author response:
  
  Reviewer #1 (Public Review):
  
  Zeng et al.’s work links several key issues in Cryo Electron Tomography in ways that reinforce each other, inspired by the cycleGAN model, leading to very positive results across several benchmark datasets. The related topics include tomogram cleaning and simulations (two crucial areas in the field), with ”spin-off” outcomes in automatic annotation and the completion of the missing wedge. The manuscript covers nearly all essential topics in Tomography, making it very comprehensive and potentially critical in the field. The generalization capabilities on the SHREC 2021 data set are very interesting, although difficult to quantify. I appreciate the approach, but I have serious concerns about some of the limitations of the results presented by the authors.
  
  We thank the reviewer for the encouraging assessment of our work and for recognizing the potential importance of integrating tomogram denoising and simulation within a unified unsupervised framework. We appreciate the reviewer’s thoughtful evaluation and the concerns raised regarding the limitations of the current results. We address these concerns in detail below and have revised the manuscript to clarify the scope, evaluation strategy, and practical applicability of DUAL.
  
  (1) Simplified data versus nowadays challenging tomography data. It is acknowledged the difficulty inmaking general tests. In this work, the method shows excellent results on potentially simple data sets (the SHREC 2021, which was used for a benchmark in ET several years ago, but not much used since then) and, even more, the old Relion data set for picking).
  
  We appreciate the reviewer raising this important point regarding dataset difficulty and relevance. The SHREC 2021 dataset was selected because it is currently the most widely used benchmark simulated dataset for cryo-electron tomography and originates from the last SHREC contest specifically designed for evaluating cryo-ET analysis methods. It provides standardized simulated tomograms with known ground truth structures, which enables objective and reproducible quantitative comparison between different methods. The RELION ribosome dataset is also a commonly used experimental benchmark for evaluating particle detection performance. Nevertheless, we agree that demonstrating performance on additional recent and challenging datasets will further strengthen the evaluation of the method. In response to this comment, we have expanded the experimental evaluation in the revised manuscript by applying DUAL to additional recent cryo-ET datasets to further demonstrate its effectiveness on recent tomograms with more complex biological structures and imaging conditions.
  
  Specifically, we added an evaluation on the CZII Cryo-ET Object Identification dataset, a popular competition in 2025 with more than 1,000 participants. This experiment complements the original SHREC 2021 and RELION ribosome benchmark results and shows that DUAL can also be successfully applied to more recent cryo-ET data. The quantitative results and representative visual comparisons (shown above in Figure 1 and 2) are provided in the new section 2.6.
  
  (2) Reproducibility by the average user. I have found many cases in which a specific software producesexcellent results when run by the authors. Still, the average user is lost with the parameters and cannot reproduce these promising results. I propose that the authors address this issue by involving some experimental colleagues and ask them to repeat the work. This is a general concern that applies not only to this work but to many others. I think this consideration is crucial for a field that is growing very quickly and where method development happens at an extraordinary pace... but are all of them generally useful?
  
  We fully agree with the reviewer that reproducibility and usability are critically important for computational methods in cryo-ET. In response to this concern, we substantially improved the accessibility and reproducibility of the DUAL framework and revised the accompanying documentation to make the implementation easier to inspect and use, as two experimental colleagues have used and reproduced the results. The updated software repository now includes improved documentation, a clearer README, practical tutorials, a method-to-implementation description, a code reference, and example workflows demonstrating how to reproduce the experiments described in the manuscript. We also provide pretrained models together with the configuration files used to generate the results reported in the paper. In addition, the revised documentation clarifies the data interface, domain convention, training workflow, model outputs, and the interpretation of the trained translators. We believe that these improvements will significantly facilitate reproducibility and make it easier for users to apply the method to their own datasets.
  
  Reviewer #2 (Public Review):
  
  This study introduces DUAL (Deep Unsupervised simultAneous denoising and simuLation), an unsupervised deep learning framework that jointly addresses denoising and realistic data simulation for cryo-electron tomography (cryo-ET). By leveraging a cyclic, unpaired learning strategy, DUAL avoids reliance on paired clean ground-truth tomograms, which represents a practical advantage over many existing supervised approaches.
  
  We thank the reviewer for the positive summary of our work and for recognizing the advantages of the unsupervised framework in avoiding reliance on paired ground-truth data.
  
  Through extensive quantitative evaluations on benchmark datasets, together with qualitative and downstream analyses on diverse experimental tomograms, the authors show that DUAL performs robustly across both denoising and simulation tasks.
  
  We appreciate the reviewer’s recognition of the robustness of the framework and the evaluation strategy presented in the manuscript.
  
  If feasible, a limited quantitative or qualitative comparison with one or more recently published deep learning approaches for cryo-ET denoising or simulation, such as CryoSamba, or DeepDeWedge, would further strengthen the evaluation and help contextualize DUAL’s performance.
  
  We thank the reviewer for this helpful suggestion. As also recommended by the editor, we extended the experiments to include comparisons with recently proposed methods CryoSamba and DeepDeWedge. These comparisons were performed using the same evaluation metrics used in the current experiments so that the results remain directly comparable. The additional comparisons are added into section 2.6.
  
  Specifically, DUAL was compared with CryoSamba for denoising and with DeepDeWedge for missing wedge compensation on the CZII Cryo-ET Object Identification dataset, a popular competition in 2025 with more than 1,000 participants. The results are shown above in Figure 1 and 2.
  
  Reviewer #3 (Public Review):
  
  The paper is titled “DUAL: Deep Unsupervised Simultaneous Simulation and Denoising for Cryo-Electron Tomography.” The authors provided two closely related code branches: one for denoising and one for missingwedge correction. However, I did not find the simulation component. This is important, as the authors state that “the simulation branch provides learning-based cryo-ET simulation to generate synthetic tomograms indistinguishable from experimental ones.”
  
  We thank the reviewer for carefully examining the released code and for pointing out this source of confusion. We would like to clarify that, in the DUAL framework, simulation and denoising are the two simultaneous branches that are trained jointly, rather than separate sequential modules. The simulation branch learns the transformation from clean/simulated tomograms to realistic experimental cryo-ET tomograms, while the denoising branch learns the reverse transformation from experimental tomograms to the clean domain. Together, these two translators form the cyclic unsupervised learning framework described in the manuscript.
  
  In the original repository release, the organization of the code may not have made this relationship sufficiently clear, which likely led to the impression that only denoising and missing-wedge correction components were provided. To address this issue, we have substantially revised the repository structure and documentation. The updated repository now explicitly documents the two simultaneous branches of DUAL, explains how the simulation and denoising translators interact during training, and provides clear instructions for reproducing both functionalities. We have also added a dedicated method-to-implementation guide, code reference, and tutorial examples that describe the usage of the simulation component and its role in generating realistic synthetic tomograms that are statistically and visually consistent with experimental cryo-ET data.
  
  We believe these revisions clarify the implementation of the simulation branch and make the correspondence between the manuscript and the released code substantially easier to understand and reproduce.
  
  In addition, no pre-trained models were provided. Given that the authors indicate that all training data are publicly available, sharing trained models together with references to the corresponding datasets would significantly facilitate evaluation of the reported performance.
  
  We agree with the reviewer that providing pretrained models will greatly facilitate reproducibility and evaluation by other researchers. In the revised release of the repository, we have provided pretrained models corresponding to the experiments described in the manuscript together with clear references to the datasets used for training.
  
  The provided instructions are quite minimal and do not currently support reproduction of the reported findings.
  
  We appreciate the reviewer highlighting this issue. We have expanded the documentation substantially and provided detailed instructions describing the full workflow required to reproduce the experiments presented in the manuscript. In the revised repository, we added documentation that more explicitly connects the method described in the manuscript with the released implementation. The README summarizes the repository scope and data interface, the tutorial describes the practical workflow for preparing data and running training, and the method and code reference documents describe the mapping between the DUAL formulation and the main implementation files. We believe these additions will make the workflow clearer for users who wish to reproduce or adapt the experiments.
  
  After many hours of trial, debugging, and experimentation, I was able to train a model for missing-wedge correction using the default parameters, although the process was slow and memory-intensive.
  
  We thank the reviewer for investing significant effort to test the software and for reporting this observation. Training large 3D deep learning models on cryo-ET volumes can indeed be computationally demanding. We have clarified the computational requirements in the revised manuscript and provide guidance for efficient training and inference.
  
  Once these points are addressed, I would return to my original request that the authors provide: 3. A fully solved and functional tutorial based on their updated notebooks with all the intermediate results.
  
  We agree that a comprehensive tutorial will be extremely helpful for users. In the revised repository we have provided a complete end-to-end tutorial demonstrating the workflow from raw tomograms to the final outputs including simulated tomograms, denoised tomograms, and missing-wedge-corrected tomograms.
  
  We once again thank the editor and reviewers for their insightful comments and suggestions, which have helped us significantly improve the manuscript and the accompanying software.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.03.02.583135v1
Jun 2026
comapeo-tagging-taxonomy.pages.dev comapeo-tagging-taxonomy.pages.dev

Open questions for the team

1
1. rudo 30 Jun 2026
  
  in Public
  
  Should the app keep this fully hidden, or are there moments where naming the kind to the user would actually help (for instance, framing a screen as “what happened” versus “what’s still there”)?
  
  When thinking about this question, my mind goes to pedagogy and ways to cue the user into understanding the new data structure (and even data structures in general!)
  
  That is: in addition to asking "how do minimize the amount of stuff people in the field have to think about", an interesting parallel question is "how can we offer (optional?) learning opportunities for people to understand how data works in CoMapeo?"
  
  Part of the reason for suggesting this is that this taxonomy definitely introduces a learning curve! And it may also impact field teams who have to intuit what observations to create in order to set up a good entity causal chain. ("OK, first let me map the creek, so I can then map the contamination in the creek...")
  
  See my comments in What & why and CoMapeo data model for more of me wrestling with instantiating entity types, for reference
Visit annotations in context

Annotators

rudo

URL

comapeo-tagging-taxonomy.pages.dev/open-questions/
www.biorxiv.org www.biorxiv.org

Phasic and tonic pain serve distinct functions during adaptive behaviour

1
1. Public_Reviews 30 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Strengths:
  
  This is an ambitious study that provides a quantitative dissociation of the roles of phasic and tonic pain in adaptive behavior, by integrating ecological neuroscience, motivational theory, and computational modeling. The use of immersive VR combined with a freeoperant foraging task offers a more ecologically valid context to study pain-related behavior compared to traditional paradigms. Furthermore, the study employs a multimodal approach by combining behavioral data, computational frameworks, physiological signals, and EEG. In particular, one of the main strengths of the study is the use of sophisticated computational modeling to capture phasic and tonic pain effects. The experiment codes are available on GitHub, increasing reproducibility.
  
  We appreciate the reviewers’ recognition of the study’s ambition, the integration of ecological and computational approaches, and our efforts to support reproducibility through open code.
  
  Weaknesses:
  
  The main limitations of this article are that it provides insufficient detail on VR implementation. The design of the VR environment is, at this stage, under-described. Crucial information is missing, such as the number of pineapples per block, timing precision, details on how motion is mapped to the virtual movement, etc. This aspect strongly limits the reproducibility of the experiments.
  
  We thank the reviewer for highlighting the importance of detailed reporting to ensure reproducibility. In response to this valuable feedback, we have taken the following steps:
  
  (1) Open Access to Software and Data: We have now uploaded the full software and hardware specifications used in our study to a public GitHub repository: https://github.com/ShuangyiTong/PineappleStudy2025ReplicationSoftware. This includes the complete VR implementation, allowing readers to directly experience the task using a commercially available VR headset. The repository also contains the raw data and analysis scripts to facilitate full replication of our results. These links have been updated in “Data and Code Availability” section.
  
  (2) Expanded Methodological Details: We have revised the Methods section to include the specific details requested, such as:
  
  (a) The number of pineapples presented per block,
  
  (b) The temporal resolution and precision of the data collection,
  
  (c) The mapping between physical motion and virtual movement within the VR environment.
  
  Specifically, the paragraph containing the changes is following: “At the beginning of each one-minute block, a total number of 150 virtual pineapples of varying heights from 0.33 to 1 m were randomly generated in a circle centred around the participant with a diameter of 6.67 m. Five identical baskets were placed within the space. Spatial locations of trees and vegetation were generated using the game engine's default tree painting tool (Unity Technologies, San Francisco, US).”
  
  We hope these updates address the reviewer’s concerns and significantly improve the transparency and reproducibility of our experimental design.
  
  A second limitation lies in the lack of clarity regarding the study hypotheses. Although two overarching hypotheses can be inferred, they are not explicitly formulated. To this end, it is unclear which analyses were merely exploratory, especially for physiological and EEG outcomes.
  
  We thank the reviewer for this constructive feedback. We agree that making the hypotheses more explicit—particularly regarding the computational framework and the role of physiological measures—strengthens the manuscript. We have significantly revised the final section of the Introduction to explicitly formulate our two primary hypotheses and operationalise the associated behavioural and neurophysiological measures.
  
  (1) Phasic Pain Hypothesis: We hypothesised that phasic pain serves as a discrete valuation signal that updates the state-action value of specific actions. We predicted this would be evidenced behaviourally by reduced choice probability and increased ‘distance bias’ for pain-associated targets. Neurally and physiologically, we predicted that these aversive values would be tracked by skin conductance responses (SCRs) and the amplitude of pain event-related potentials (ERPs), which serve as established markers for the encoding of aversive magnitude and salience.
  
  (2) Tonic Pain Hypothesis: We hypothesised that tonic pain acts as a coefficient modulating the trade-off between opportunity cost and vigour cost. This was tested by applying tonic pain to the non-dominant (non-task) limb to ensure that any observed changes were motivational rather than mechanical. We predicted a global reduction in motivational vigour, operationalised as decreased movement velocities and foraging rates.
  
  By framing the study this way, we clarify that the physiological and EEG outcomes were used to quantitatively test whether the brain and body implement the computations (valuation and vigour-regulation) defined by our model. We have updated the text in the Introduction (see below) to reflect these explicit formulations.
  
  Updated paragraphs: “Our first hypothesis was that phasic pain provides a distinct valuation signal that updates the value of specific actions within complex environments. In our task, this was implemented by associating specific fruit (distinguishable by colour) with a brief electrical stimulus to the grasping hand, emulating thorns. In our computational model, this was defined as an aversive utility term incorporated into the state-action value evaluation process. We predicted that this computational mechanism would manifest behaviourally as a reduction in choice probability for pain-associated targets and an increase in ‘choice distance bias’ (the willingness to travel further for pain-free options). Neurally and physiologically, we predicted that these aversive values would be tracked by skin conductance responses (SCRs) and the amplitude of nociceptive event-related potentials (ERPs), specifically the N1-P2 complex (Favero et al., 2023).
  
  Second, we hypothesised that tonic pain acts as a coefficient modulating the tradeoff between opportunity cost and vigour cost, thereby serving a recuperative function. To test this in Experiment 2, we delivered continuous tonic pressure to the non-dominant arm via an inflated cuff to emulate a background state of injury. Within our free-operant framework, tonic pain was modelled as a weighting factor that shifts the optimal balance toward reduced energy expenditure. Because the stimulus was applied to the non-task limb, we specifically predicted a global reduction in motivational vigour—operationalised as decreased movement velocities and foraging rates—rather than a direct mechanical impairment. By applying this formal computational approach, we move beyond exploratory observations to provide a rigorous, mechanism-based explanation for how distinct pain states adaptively govern choice and action.”
  
  In Experiment 2, the reduction in vigor during tonic pain could plausibly reflect attentional load rather than pain per se. As recognized by the authors, there is no control condition involving an innocuous salient stimulus to rule out non-specific effects of distraction. Perhaps a tonic non-painful but salient somatosensory stimulus (e.g., a strong vibrotactile stimulus applied on the same arm) could have been used as a control stimulus.
  
  We agree that examining the potential role of attentional load on the interaction between tonic and phasic pain is an important area of future investigation. The inclusion of additional control conditions matched for attentional salience with additional experiments is possible but introduces other confounds related to their different qualities (e.g. a salient vibrotactile stimulus might invigorate behaviour). More fundamentally, attentional processes are a core part of pain function, and should not necessarily be viewed as a confound (i.e. the way that pain mediates some of its core functional effects may directly be through its salient attentional nature). This view is formalised in Wall and Melzack’s classical tripartite model of pain, and distinguishes pain from purely sensory systems such as somatosensation, vision and so on.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Computational models may be difficult to follow without prior familiarity. Including simplified explanations could make the approach more accessible.
  
  We thank the reviewer for this constructive suggestion. To make the computational framework more accessible to a broader audience, we have added two new schematic diagrams (Figure 2 and Figure 8) that provide a visual overview of the models used in Experiment 1 and Experiment 2, respectively. These figures illustrate the state-action transitions and provide a clear decomposition of the payoff components—including reward, pain, and temporal costs. We believe these additions significantly clarify the modelling logic and help ground the mathematical descriptions in a more intuitive visual context.
  
  (2) Lines 220-222: I don't think it is possible to talk about "objective measures of pain" as pain is, by definition, subjective. I suggest rephrasing the sentence.
  
  We thank the reviewer for this thoughtful observation regarding our terminology. We recognise that the phrase ‘objective measures of pain’ may be misintepreted. Our intention was to highlight the distinction between the internal, reported experience and the behavioural manifestations of pain that our computational method reveals.
  
  To avoid ambiguity and to better align the text with the core focus of our study, which is the motivational function of pain, we have rephrased the sentence as suggested. We have shifted the emphasis from ‘measuring pain’ to quantifying its specific impact on behaviour.
  
  Original lines 220-222 have been revised as follows:
  
  "Taken together, this indicates the composite nature of overall aversiveness and highlights the benefit of combining subjective ratings with model-based measures of its motivational impact on behaviour."
  
  We believe this revision more accurately reflects our approach of using choice and movement as objective indices of the motivational value of pain.
  
  (3) The explanation for choosing the foraging task is very interesting, but should be provided in the Introduction rather than in the Methods section. In contrast, the Methods section should include the details of the VR implementation.
  
  We thank the reviewer for these constructive suggestions regarding the manuscript structure.
  
  Regarding the rationale for the foraging task: We agree that providing the theoretical justification for the task earlier in the manuscript improves the narrative flow. We have revised the Introduction to explicitly outline why a foraging paradigm was chosen by added the following sentences:
  
  “A foraging paradigm provides a robust, free-operant framework that captures the core components of adaptive behaviour: it is goal-directed, involves complex movement, and requires the learning of an optimal strategy to maximise rewards. This allows us to computationally dissociate how different types of pain influence the control of action.”
  
  We believe this addition clarifies the link between our computational hypotheses and the experimental design.
  
  Regarding the VR implementation: We have updated the Methods section to include the specific experimental parameters requested in the reviewer's previous comments (e.g., timing precision, stimulus counts, and motion mapping) to ensure full reproducibility. However, we have opted not to include the exhaustive engineering details of the underlying software architecture and communication protocols. To ensure complete transparency, the full software and firmware source code, which allows for the exact replication of the environment, is available in our public GitHub repository shown in the code and data availability section.
  
  (4) It is unclear how the sample size was determined. This information should be included.
  
  We thank the Reviewer for this comment. For the present study, an a priori power analysis was not conducted due to the novelty of the investigation and the complexity of the analyses. Standard power analyses are not commonly conducted for studies where computational modelling is the primary focus, as results would be potentially misleading. Instead, we based our sample size estimate of N ≈ 30 participants on previous studies using computational modelling of neurophysiological data [6], as well as EEG, SCR and pain studies [7, 8] and studies in our group using combined neurophysiological recordings and VR [9]. This approach represented a pragmatic balance which ensured the credibility of our results and the stability of our model estimates while accounting for the high persubject cost and the depth of the data collected from each individual. This has now been described more accurately in the Method section:
  
  “An a priori power analysis was not conducted due to the novelty of the investigation and the complexity of the analyses. Instead, we based our target sample size (N ≈ 30 per experiment) on previous studies using computational modelling of neurophysiological data (Mahajan et al., 2025), as well as EEG, SCR, and pain studies (Schulz351 et al., 2015; Zhang et al., 2018), and studies from our group using combined neurophysiological recordings and VR (Hewitt et al., 2026). This approach represents a pragmatic balance that ensures the credibility of the results and the stability of model estimates while accounting for the high per-subject cost and depth of data collected from each individual.”
  
  (5) Please clarify how / when the monetary performance incentive was provided.
  
  We thank the reviewer for the opportunity to clarify the incentive structure. The monetary performance incentive is detailed below:
  
  Participants were informed at the start of the study that they would earn a performance-based bonus of up to £10, determined by the points they collected during the foraging task. To ensure that motivation remained consistent across the entire session for all individuals—regardless of their baseline foraging speed—the specific exchange rate between points and currency was not disclosed. This prevented potential 'ceiling effects', where a high-performing subject might stop exertive effort after reaching the maximum bonus early, or 'floor effects', where a subject might perceive the reward for an individual action as too small to be motivating.
  
  Following the completion of the experimental session, all participants were compensated with the full £10 bonus in addition to their base payment for participation.
  
  We have updated the Methods section to reflect these details:
  
  “Participants were informed at the start of the experiment that their total points would be rewarded with a monetary incentive of up to £10. To maintain a constant level of motivation throughout the task, the exact point-to-currency exchange rate was not specified. Upon completion of the session, all participants were awarded the maximum bonus of £10.”
  
  Reviewer #2 (Public review):
  
  Strengths:
  
  Overall, this study aims to address an important topic and is generally well written.
  
  We thank the Reviewer for the generally positive evaluation of our work.
  
  Weaknesses:
  
  First, phasic pain was induced using electrical stimulation, which typically elicits somatosensory evoked potentials (SEPs). These responses may not reflect pain-specific processes and thus complicate interpretation. This issue bears directly on the study's conclusions, especially when discussing interactions between phasic and tonic pain. For example, tonic pain is known to reduce perceived intensity or cortical responses to phasic pain stimuli delivered elsewhere on the body - an effect not expected for SEPs elicited by electrical stimuli.
  
  We acknowledge the reviewer’s concern regarding the specificity of evoked potentials elicited by electrical stimulation. We agree that traditional SEPs— particularly those evoked by large surface electrodes—primarily reflect activation of non-nociceptive A-beta fibres and thus may not reliably index pain-specific processes or be modulated by tonic pain via descending nociceptive control. However, we would like to clarify that phasic pain was administered in the present study using small-diameter concentric ‘Wasp’ electrodes. These are comparable to intraepidermal electrodes shown to preferentially activate nociceptive A-delta fibres, thereby eliciting ERPs more closely associated with nociceptive processing rather than mixed somatosensory input [1, 2]. Accordingly, our ERP results demonstrated a reliable increase in N1-P2 amplitude with higher phasic pain intensity, suggesting that the evoked responses captured stimulus-evoked nociceptive processing.
  
  We acknowledge that these ERPs may still reflect mixed sensory processing and thus may not be fully modulated by tonic pain. Previous studies have shown that ERPs elicited by nociceptive electrical stimulation can be attenuated during tonic pain using cold-water immersion in CPM paradigms [3, 4]. However, these studies typically employ passive tasks, whereas our paradigm involved continuous voluntary behaviour during sustained tonic pressure pain. This difference in task context may engage distinct modulatory systems, possibly prioritising behavioural adaptation over sensory gating.
  
  We have revised the Discussion and Methods sections to explicitly clarify the electrode design and address the lack of ERP modulation by tonic pain in the context of active behaviour:
  
  Discussion: “Although we utilised concentric ‘Wasp’ electrodes designed to selectively activate nociceptive A-delta fibres, and confirmed that the resulting ERPs (N1-P2) were significantly modulated by phasic intensity (Figure 6E, F), we observed no such attenuation by tonic pain (Fig. 6G, H).”
  
  Methods: “These electrodes preferentially activate nociceptive A-delta fibres, thereby eliciting ERPs that more accurately reflect nociceptive processing compared to standard bipolar stimulation (Inui et al., 2002; Mørch et al., 2011).”
  
  Second, additional control experiments are necessary to rule out alternative explanations. For instance, the authors are suggested to deliver phasic pain to the contralateral arm (e.g., at 1-2 Hz), which might also reduce action velocity. Similarly, tonic pain applied to the grasping hand should be tested to disentangle hand-specific effects.
  
  We thank the reviewer for these suggestions regarding the spatial configuration of stimuli. The decision to deliver phasic pain to the grasping hand and tonic pain to the contralateral arm was a deliberate feature of our experimental design.
  
  First, delivering phasic pain to the grasping hand ensured spatial congruency between the virtual stimulus (the fruit) and the physical consequence (the pain). This congruency is essential for subjects to form a coherent representation of the 'painful' object; a contralateral delivery would have introduced a sensory-motor mismatch that could complicate the interpretation of the learning and choice data.
  
  Second, tonic pain was applied to the contralateral arm specifically to avoid mechanical interference with the grasping action. Applying sustained pressure to the ipsilateral limb would likely have impeded the manual dexterity and fine motor control required to operate the controller buttons. This would have introduced a physical confound, making it difficult to determine if changes in behaviour were due to motivational vigour or simply the mechanical difficulty of performing the grasp while the arm was under pressure.
  
  We agree that exploring the spatial generalisation of these effects is an important future direction, and we have added a paragraph to the Discussion to clarify these design choices:
  
  “It is also important to consider the spatial configuration of the stimuli used in this study. Phasic pain was delivered to the grasping hand to maintain spatial congruency with the virtual fruit, ensuring a coherent nociceptive feedback signal for the interactive task. Additionally, tonic pain was applied to the contralateral arm to prevent mechanical interference with motor execution, which would have occurred if pressure were applied to the ipsilateral limb used for grasping the controller. Whilst this design promotes spatial congruency and avoids mechanical confounds, future studies might explore how these effects generalise across different body parts, for which VR experiments serve as a promising tool to test relevant hypotheses (Hewitt et al., 2026).”
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) First, the abstract mentions only EEG, yet Experiment 1 employed skin conductance response (SCR) measures while Experiment 2 utilized EEG. Also, the rationale for using SCR in Experiment 1 and EEG in Experiment 2 is not provided and should be explicitly stated.
  
  We thank the reviewer for identifying the discrepancy between the physiological signals reported in Experiment 1 and Experiment 2. We have revised the Abstract and Methods section to clarify the rationale for these measures.
  
  In Abstract, the following sentence has been revised: This could be explained by a free-operant computational framework that formalises and quantifies the function of tonic and phasic pain in terms of motivational vigour and decision value, and model parameters correlated with EEG “physiological and neural responses.”
  
  Regarding the rationale for the measurements, the following sentences were inserted into the Methods section: “Experiment 1 was designed to establish the robust behavioural effects of the foraging task while ensuring the collection of reliable physiological data. We chose SCR as it is a well-validated index of autonomic arousal that we were confident would provide a clear peripheral measure of pain-related processing in this novel VR paradigm.”
  
  For Experiment 2, we aimed to build on these findings by adding EEG. This was intended as a complementary piece of neural evidence to provide insights into the underlying central neural mechanisms of phasic and tonic pain interactions.
  
  (2) Second, the quality of both SCR (Figure 3A) and EEG/ERP data (Figure 5A-D) appears compromised by low SNR. For instance, ERP signals show baseline drift at low frequencies, potentially due to movement-related artifacts. The authors are encouraged to enhance data quality and provide cleaner, more interpretable results.
  
  We thank the reviewer for this observation. We acknowledge that our recordings exhibit a lower SNR compared to conventional, stationary EEG studies. This is a recognized characteristic of Mobile Brain-Body Imaging (MoBI), particularly in immersive VR experiments where participants are physically active [10]. However, previous research has demonstrated that it is possible to recover valid, interpretable neural signals in active settings using modern cleaning methods including trained ICA labels which we have adopted for artefacts cleaning [11]. We also believe we should be restrained from over cleaning the EEG data as pointed out by Delorme in the paper ‘EEG is better left alone’ [12]. Therefore, we have added a new paragraph in the Discussion:
  
  “It is important to acknowledge that the signal-to-noise ratio in both our physiological and neural recordings is lower than that typically observed in conventional, stationary laboratory experiments (Gramann et al., 2011). This is primarily due to the motion artefacts inherent in an immersive and active virtual reality environment. Whilst we utilised robust cleaning and artefact-correction methods (Klug and Gramann, 2021), the elevated noise floor may limit our capacity to detect more subtle neural effects or interactions. These challenges highlight a critical area for future methodological research, particularly in the development of hardware and signal-processing tools designed to isolate neural signals during complex, mobile behavioural tasks.”
  
  Another factor contributing to the appearance of the raw signal is the "free-operant" nature of our task. Unlike conventional neurophysiological study paradigms with fixed, sufficient intervals between trials, our participants were free to move and interact with fruit at their own pace. This means that neurophysiological signals from successive actions (e.g., picking up one fruit followed quickly by another) can overlap. For the SCR analysis, we addressed this by using a canonical response function (CRF) to model and "unfold" the overlapping signals with GLM to produce our final results [13]. While we did not perform a similar deconvolution for the EEG data, we focused our analysis on the early, salient components (N1-P2 and early time-frequency changes < 500ms) which are less susceptible to overlap from subsequent actions than the much slower SCR.
  
  In summary, while significant efforts representing the state-of-the-art approach for MoBI analyses have been taken to minimise the contributions of noise to the dataset, residual noise does remain in the final data. We have employed a combination of robust preprocessing and model-based analytical methods to account for the complexities of a free-operant task. We believe these results represent the best possible balance between signal clarity and the ecological validity of an active foraging task, and we have called for future research to continue improving these tools for immersive VR environments.
  
  (3) Third, although the authors state that time-frequency analysis was conducted on the EEG data, no corresponding results are presented in Figure 8 or elsewhere. Furthermore, the statistical maps shown appear noisy and require further clarification and possible denoising.
  
  We thank the reviewer for pointing this out. The time-frequency results are indeed presented in Figure 8 (now Figure 10); however, they are depicted as topographic maps of the t-statistics derived from our LMM rather than raw power change plots.
  
  The application of EEG to a novel, free-operant task represents a significant methodological development in this study. Unlike conventional EEG experiments where variables are strictly controlled and a "clean" pre-stimulus baseline is easily obtained, our task involves continuous participant engagement and movement. In this context, for the decision-making event, a stable baseline is unattainable as multiple variables, most notably head movements, are constantly in effect.
  
  Therefore, we believe that presenting the LMM statistical maps in the main text is the most appropriate and rigorous interpretation of the time-frequency results, as these maps represent the signal after accounting for these complex fixed and random effects. This approach was also adopted in previous pain studies [7]. We also updated the figure legend and caption specifically saying that the figure represented correlation between band power and variables we were investigating to improve clarity.
  
  Second, for more salient stimuli like phasic pain stimulation, we can indeed obtain a highly interpretable time-frequency analysis without further LMM analysis. We have added induced oscillatory responses to phasic pain stimuli to the Supplementary Material (section: Induced oscillatory responses to phasic pain stimuli). The results showed that, consistent with our ERP findings, the intensity of phasic pain significantly modulated induced responses, while the background tonic pain state did not significantly alter the induced oscillatory response to the phasic pain stimulus.
  
  Regarding the SNR and Denoising Strategy, we acknowledge that the statistical maps appear noisier than those from stationary studies. This is a direct consequence of the lower signal-to-noise ratio (SNR) inherent in mobile VR. Moving EEG from strictly controlled laboratory settings to ecologically valid, "real-world" VR scenarios introduces higher levels of noise, which we believe represents a key frontier for future methodology research. Regarding the denoising process, the maps in the main text represent the data after our full pipeline (including ICA-based artifact rejection and high-pass filtering). Regarding further denoising, we have deliberately chosen not to apply excessive spatial or temporal smoothing [12]. Also, it is important to note that the LMM framework itself serves as a powerful statistical "filter." By including head movement velocity as a regressor and accounting for random intercepts across subjects, the model effectively "cleans" the signal by partitioning out noise components not related to the task conditions.
  
  Reviewer #3 (Public review):
  
  Strengths:
  
  The experimental paradigm is highly innovative. Assessing human behaviour in a naturalistic yet highly controlled setting represents a promising approach to pain research. Notably, assessing pain magnitude implicitly, via its motivational value, offers insights about the overall pain experience that are not usually accessible via common pain ratings.
  
  Weaknesses:
  
  Despite these strengths, the manuscript would benefit significantly from more precise definitions of key concepts and an overall clearer, more coherent presentation of its main arguments. The writing, in its current form, often presents claims that are too vague or insufficiently connected with the experimental findings. Moreover, certain aspects of the computational modeling and statistical analysis appear flawed or inadequately justified.
  
  We thank the Reviewer for the generally positive evaluation of the manuscript.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) The analyses presented in the section
  
  "Results/Additional cost of effort associated with movement" require clearer explanations. The intention here appears to be to assess the association between moving distances and pain intensity to test the hypothesis that the higher the average pain ratings within blocks, the longer the distances moved (i.e., the higher the effort to avoid pain). It is unclear why and how exactly "egocentric distance differences between painful and non-painful fruits" were computed.
  
  We thank the reviewer for pointing out the need for a clearer definition of the egocentric distance calculation. As the reviewer correctly identified, this analysis tests the hypothesis that subjects would trade off physical effort (distance) for pain avoidance. To compute this, we used a blockwise approach: for each one-minute block, we calculated the average egocentric distance travelled to pick up non-painful fruits and subtracted the average distance travelled to pick up painful fruits. This difference (labelled as "Choice Distance Bias" in Figure 3B) represents the additional effort subjects were willing to exert to reach a pain-free option. We have clarified the computation method and our motivation for using it in the revised text:
  
  “As shown in Figure 3B, the vertical axis represents the 'choice distance bias', calculated as the difference between the average egocentric distance to non-painful fruits and the average egocentric distance to painful fruits within each block. The egocentric distance is the fruit distance relative to the participant. This metric was computed to test whether subjects would trade off physical effort for pain avoidance; specifically, a positive bias indicates that subjects were willing to bypass closer painful fruits to reach more distant pain-free ones. As hypothesised, we found that as the pain intensity (VAS) of the aversive fruits increased, this distance bias grew significantly, confirming that subjects exerted greater movement effort to avoid higher levels of pain.”
  
  We have also updated the text in the beginning of " Avoidance increases with increasing phasic pain intensity" section to emphasize the calculation is analysed at the block level to clarify the computation procedure:
  
  “For this analysis, both aversive choice probabilities and subjective pain ratings were estimated at the block level.”
  
  (2) In its current form, the explanation of the first optimality equation lacks precision and transparency. Consider the following improvements:
  
  (a) Precisely define the features that characterize a state/decision point: e.g., i) memory of available options (= set of 7 fruits that were seen but not picked up) and ii) subject's current position, iii) pain intensity associated with green fruit in the current block.
  
  (b) Precisely define the set of values the action variable a can assume.
  
  (c) Precisely define the function u(a) in mathematical notation, including its hyperparameters. The fact that a is likely a categorical variable, while u(a) is later described as a sigmoid function (i.e., as a function of a continuous variable), is confusing. In my understanding (see Figure 2F), u is actually a function of the stimulus intensity associated with a given fruit. Since the stimulus intensity depends on the current state s (and varies from block to block), the phasic pain utility function technically also depends on s.
  
  (d) Precisely define the function d(a) in mathematical notation, including its hyperparameters.
  
  (e) Precisely describe how the separate horizontal and vertical components of C_m enter the equation.
  
  (f) Provide a summary of all parameters and hyperparameters being optimized. Are parameters and hyperparameters optimized jointly? What distinguishes parameters and hyperparameters practically?
  
  We thank the reviewer for this insightful critique. We agree that the original presentation of the optimality equation was insufficiently formal. We have now added a dedicated subsection, "Experiment 1 model summary", which includes a comprehensive table (Table 2) and supporting text to address these points with mathematical precision.
  
  Specifically, we have implemented the following clarifications in the revised manuscript:
  
  State and Action Space (a, b): We have formally defined the state s as an ordered memory list M_s of up to 7 items, governed by a FIFO principle. The action a is now explicitly defined as a one-to-one mapping from these memory items to physical reach trajectories.
  
  Utility and Cost Functions (c, d, e): We have provided the full mathematical notation for the phasic pain utility u(a) and the effort cost d(a). We have clarified that while the choice of fruit (a) is categorical, it serves as an indicator variable that determines the application of a continuous sigmoid utility function based on the block-level pain intensity (x_stim). We have also explicitly decomposed the effort cost into its horizontal (C_h) and vertical (C_v) egocentric components.
  
  Parameters and Hyperparameters (f): We have clarified that because our model focuses on steady-state motivational trade-offs rather than online learning, the hyperparameters listed are the only variables subject to optimisation. These are fixed for each subject across the duration of the experiment.
  
  We believe these additions, centred around the new Table 2, provide the transparency and precision requested.
  
  Furthermore, we would like to clarify a subtle caveat regarding the assumption of a fixed x_stim for the entirety of a block. While participants were aware that green pineapples were aversive, the specific stimulation intensity for a given block was only fully revealed upon picking up the first green pineapple.
  
  To ensure our model-fitting remains robust despite this 'information lag', we considered several computational alternatives:
  
  (1) Prior Estimation Modelling: Modelling a participant’s prior estimation of pain stimulation based on previous blocks. We found this unsuitable due to the independent block design and the limited number of trials available to establish a stable prior.
  
  (2) Data Trimming: Excluding all decisions made before the first green pineapple pickup. While theoretically 'cleaner', this approach introduces significant data imbalance and ignores blocks where a participant—dissuaded by high pain— only picked up a single green fruit before ceasing (approx. 8.75% of blocks).
  
  Crucially, we performed a sensitivity analysis by re-running the model-fitting procedure using only the data collected after the first green pineapple was harvested in each block. This analysis yielded the same qualitative statistical results as the full-block model presented in the main text. We have added a detailed discussion of this caveat and the alternative study designs we explored (such as pre-block stimulation or stochastic choice paradigms) to the Supplementary Material (Section Discussion of pain intensity information and model robustness). We believe this confirms that our current approach provides a faithful representation of the underlying motivational trade-offs.
  
  (3) The statistical method selected for assessing the association between decision values and pain ratings is problematic (Figure 2G): Since there are multiple data points from multiple subjects, which introduces dependence between data points, a multilevel instead of a single-level linear regression should be employed.
  
  We appreciate the reviewer’s suggestion to utilise a multilevel modelling approach. We agree that a single-level regression does not fully account for the nested structure of our data.
  
  In response, we re-analysed the association using a linear mixed-effects model with a maximal random effects structure. Specifically, we included both random intercepts and random slopes for Ratings grouped by Subject (in R syntax: PainFunc ~ Ratings + (1 + Ratings | Subject)).
  
  The results of this mixed effect model are consistent with our original findings, showing a significant relationship between decision values and pain ratings (p = .001). We have updated the Figure caption (now Figure 3G) to reflect these multilevel model statistics. We believe this addition addresses the concern regarding data dependence and provides a more rigorous validation of our conclusions.
  
  (4) The statistical method selected for assessing how decision values/pain ratings relate to SCR coefficients is problematic (Figures 3B and C): Again, a multilevel regression method should be used.
  
  We thank the reviewer for this important point. We agree that a multilevel approach is more appropriate for our nested data structure, and that the interpretation of the SCR data required more explicit justification in the context of the divergence between decision values and ratings.
  
  We have now re-analysed the relationship between SCR coefficients (both fixationevoked and shock-evoked), decision values, and subjective ratings using a multilevel (mixed-effects) regression model. This model included random intercepts and random slopes for each participant to account for individual variability. We have updated Figure 4 (previously Figure 3) caption and the corresponding Results and Discussion sections to reflect these findings (revised text are copied to the response to next comment (5) below. This more rigorous approach provided a clearer and more nuanced picture of the data. Specifically, while the simple regression previously suggested that both measures correlated with fixation-evoked SCR, the multilevel model reveals a dissociation: fixationevoked SCR is significantly associated with decision values, but not with subjective ratings.
  
  (5) The interpretation of the skin conductance analysis results as evidence of "dissociation between expected and experienced utility" is vague and not well-supported given the presented data and statistical shortcomings. The low R2 in Figure 2G already indicates divergence between decision values and pain ratings. It is unclear what the decision values' differential association with shock-evoked SCR coefficients adds to this insight.
  
  The reviewer correctly notes that the low R^2 in the correlation between decision values and pain ratings (Figure 3G) already suggests a divergence between these two measures. We agree that this is one of the key findings, as it highlights that decision values provide a dimension of pain assessment that is not fully captured by subjective report. However, we believe the SCR results add crucial physiological evidence to explain why and how these measures diverge. The updated multilevel results provide a more concrete double dissociation that aligns with the distinction between decision utility and experienced utility:
  
  Experienced Utility (Shock-evoked SCR): This measure of physiological arousal during the painful event was significantly predicted by subjective pain ratings (beta = 0.0154, p = .006) but not by decision values (p = .672). This suggests that ratings are more closely tied to the immediate, experienced aversiveness of the stimulus.
  
  Decision Utility (Fixation-evoked SCR): In contrast, arousal during the period of evaluation/fixation was a significant predictor of decision values (beta = -0.0739, p = .009) but was not significantly associated with subjective ratings (p = .105).
  
  By using a more rigorous statistical method, we found that decision values are actually a more robust predictor of anticipatory/evaluative arousal (fixation) than subjective ratings are. This supports our interpretation that decision values and ratings capture different temporal and functional aspects of pain processing— specifically, the evaluation of potential outcomes (decision utility) versus the reaction to the outcome itself (experienced utility). We have revised the Discussion to be more conservative regarding the strength of this evidence while clearly articulating how these physiological results provide a mechanistic grounding for the divergence observed in the behavioural data.
  
  Summary of changes in the manuscript:
  
  Figure 4 Caption: Updated to report multilevel regression statistics (beta, 95% CI, t, and p-values) instead of R^2 from simple linear regression.
  
  Results Section: Updated the text to describe the mixed-effects model results, highlighting the dissociation between fixation-evoked and shock-evoked SCRs. Revised text:
  
  “Analysis using a multilevel linear mixed-effects model revealed a clear dissociation in the relationship between physiological responses and motivational parameters. Fixation-evoked SCR coefficients were significantly associated with decision values, but not with subjective pain ratings (Fig. 4B). Conversely, shock-evoked SCR coefficients showed a significant association with subjective pain ratings, while the association with decision values was not significant (Fig. 4C). This double dissociation suggests a notable divergence between the physiological correlates of expected utility (at the decision level) and experienced utility (the actual pain experience). Taken together, these findings highlight the composite nature of the overall aversiveness of pain and underscore the benefit of combining subjective ratings with model-based measures to capture its distinct impacts on behaviour.”
  
  Discussion Section: Revised the paragraph discussing decision versus experienced utility to include the "further hint" provided by the divergent SCR correlations.
  
  Revised text:
  
  “In our task we get a further hint of this in the SCR measures in experiment 1, whereby a discrepancy exists between decision values and pain ratings in their respective associations with fixation-evoked SCRs and phasic pain-evoked (shock) SCRs. Taken together, this indicates the composite nature of overall aversiveness of pain, and highlights the benefit of combining subjective ratings with model-based measures of its motivational impact on behaviour.”
  
  (6) When investigating the effects of tonic pain on the neural processing of phasic pain (Figure 5), why were only ERPs analyzed and not induced oscillatory responses?
  
  We thank the reviewer for this insightful suggestion. We initially focused our analysis on Event-Related Potentials (ERPs) because the N1-P2 amplitude is an established and robust marker in pain research, providing a clear and reliable metric for comparing phasic pain processing across conditions.
  
  However, we agree that induced oscillatory responses provide a more comprehensive view of cortical dynamics. Following your suggestion, we have performed a Time-Frequency Representation (TFR) analysis at electrode Cz. These results, now included in the Supplementary Material (Figure S4, S5), are entirely consistent with our ERP findings. Specifically:
  
  Phasic Modulation: Both ERP amplitudes and induced oscillatory power (notably in the theta and gamma bands) were significantly modulated by the intensity of the phasic pain stimulus.
  
  Tonic Independence: Consistent with the ERP results, the presence of background tonic pain did not significantly modulate the induced oscillatory responses to phasic stimuli.
  
  We believe this additional analysis significantly strengthens the manuscript by demonstrating that the observed effects are consistent across both phase-locked and non-phase-locked neural domains. We have amended the ERP results section to reflect the addition of induced oscillatory responses in supplementary materials: “We focused our neural analysis of phasic pain on ERPs as phasic stimuli are well characterised by these time-locked evoked potentials. Nevertheless, to ensure a comprehensive assessment of the neural response, we also examined induced oscillatory responses. These results were consistent with the ERP findings and are detailed in the Supplementary Materials (Fig. S4, S5).”
  
  (7) The explanation of the second optimality equation (involving motivational vigour) requires substantial clarification. Besides the points mentioned for the previous optimality equation, specific opportunities to improve the explanations include the following:
  
  - In the provided formula, C_v and C_m appear indistinguishable given they are multiplied together, rendering this an ill-posed optimization problem. This should be clarified.
  
  - In my understanding, d(a)/V_speed corresponds to the temporal delay associated with picking fruit a. Then, what is tau, and why compute the sum tau + d(a)/V_speed?
  
  - V* is not introduced properly. Is V*(s') = Q*(s', a, tau)? If so, why introduce V*? Moreover, the notational similarity between V_speed and V* is confusing.
  
  - Gamma = 0 still holds?
  
  - Summarize all parameters and hyperparameters that are optimized to model the data and more precisely describe the method used for optimization.
  
  We thank the reviewer for these insightful comments. We agree that the transition from a standard reinforcement learning framework to one incorporating motivational vigour requires precise definitions to ensure the model is well-posed and interpretable. We have addressed these points as follows:
  
  (1) Clarification of C_v and C_m: We have clarified C_m and d(a) in the newly added Experiment 1 model summary table. Specifically, C_v is the scalar vigour constant and C_m is a unit vector representing the horizontal and vertical components. Because C_m is a unit vector, the optimization does not suffer from a collinearity issue from the scalar multiplication between C_v and C_m.
  
  (2) Bridging Theory to Practice (tau and Total Delay): In the theoretical framework of Niv et al. (2007), "delay" is an abstract sum encompassing both waiting and execution. In practice, when fitting to real-world VR data with variable execution times , we must distinguish between the waiting time tau (time spent stationary or searching) and the execution time (||d(a)|| / V_speed). This is necessary because participants take time to look around the forest to search for fruits before deciding to commit to an action. The sum tau + ||d(a)|| / V_speed represents the total delay between two actions, which directly aligns with the notion of opportunity cost of time. We have added a table (Table 3) and added a new Figure 8 to clarify these distinctions.
  
  (3) V*, Q*, and gamma: The reviewer is correct that V*(s') = max_{a’, tau’} Q*(s', a', tau'). We previously used V* for simplicity. Since the notation of V* and V_speed was confusing, we have updated the term to max_{a’, tau’} Q*(s', a', tau') in the optimality equation. We confirm that gamma = 0 (a greedy policy) still holds for the Experiment 2 framework to maintain focus on steady-state motivational trade-offs. We have added this statement to the method section.
  
  (4) Summary of Parameters and Optimization: We have summarized the hyperparameters {k, x_0, C_p, C_v, h, v} in the new summary table for Experiment 2.
  
  (8) It is not clear what the results of the modelling approach presented in Figure 7a+b concretely add to the comparison of movement velocities and collection rates in Figure 6.
  
  We appreciate the reviewer's comment regarding the relationship between the raw behavioral metrics and the computational results. While both sets of findings support the argument for reduced motivational vigour in the tonic pain condition, we believe the modeling approach provides distinct and essential value:
  
  (1) Finer-Grained Analysis Tool: The computational model acts as a more sophisticated analysis tool than simple velocity or rate averages. Unlike Figure 9a+b (in the revised manuscript, previously Figure 7), which summarizes overall performance, the model accounts for the trial-by-trial trade-off between opportunity costs, movement effort, and choice values. This allows us to isolate vigour from other confounding components.
  
  (2) Direct vs. Indirect Measurement: If we assume that motivational vigour in a free-operant task can be quantified through an RL framework, as established in animal studies, then the model's vigour constant (C_v) serves as a direct, concrete estimate of that internal state. In contrast, overall speed and collection rates are indirect markers that can be influenced by multiple factors, such as different choice sets available to the participants as the fruits locations are randomly generated.
  
  In summary, the computational approach provides a rigorous, parameterized bridge between observable behavior and the underlying neuro-computational mechanisms of recuperative pain. We have updated the Discussion section to more explicitly state how the computational approach provides a controlled measure that is isolated from the other confounders of the task. Added text to the Discussion:
  
  “Compared to overall speed and collection rate, which can be influenced by multiple factors, such as different choice sets available to participants as the fruit locations are randomly generated, the model's fitted parameters (e.g. vigour constant C_v) in theory serves as a direct, concrete estimate of that internal state.”
  
  (9) Claims made in the discussion should be more thoroughly and closely linked to the results presented previously. Specifically, experimental outcomes supporting the following claims should be directly referenced:
  
  - "tonic and phasic pain serve different motivational functions".
  
  - "phasic pain provides a punishment teaching signal that directs avoidance".
  
  - "tonic pain reduces motivational vigour".
  
  - "these two functions [punishment teaching signals and reduction of motivational vigour?] can be formally distinguished and quantified".
  
  - "We did not see interactions between tonic and phasic pain".
  
  We have revised the Discussion to more explicitly link these claims to our experimental results. Revised text:
  
  “The experiments show that tonic and phasic pain serve different motivational functions during adaptive behaviour, in line with ecological and evolutionary theories of pain (Bolles and Fanselow, 1980; Walters and Williams, 2019). Specifically, our findings point towards phasic pain providing a punishment teaching signal that directs avoidance through value-based learning, balancing the cost of future harm alongside potential reward. This is supported by the observation that increasing phasic pain intensity significantly reduced choice probability and increased distance bias between choices, whereby participants were willing to travel further to reach a pain-free fruit. In contrast, we found that tonic pain reduces motivational vigour, which supports energy conservation and recuperation in the context of bodily damage. This claim is directly evidenced by the reduction in taskrelated movement velocities and fruit collection rates during tonic pain blocks. The experiments are the first to show that these two functions can be formally distinguished and quantified during ongoing behaviour. By utilising a free-operant RL computational framework, we were able to dissociate these roles phasic pain was quantified as a generally negative utility term affecting choice values, while tonic pain was formalised as a change in vigour constants that were significantly higher (increasing delays between actions) in tonic pain condition. This illustrates how pain simultaneously acts in different ways to serve self-protection.”
  
  “One notable aspect of our results is that we did not see interactions between tonic and phasic pain at either the behavioural or neural level. Behaviourally, we observed that average aversive choice probabilities remained similar regardless of the presence of tonic pain, with no significant interaction effect on punishment sensitivity. Furthermore, our model-fitting confirmed that tonic pain did not significantly modulate the fitted phasic pain utility values. There are two contexts in which these might be predicted. First, in `conditioned pain modulation' paradigms (Kennedy et al., 2016), a tonic pain stimulus is sometimes seen to reduce both the perceived intensity and the cortical evoked responses to phasic pain stimuli delivered somewhere else on the body (Hoffken et al., 2017; Enax-Krumova et al., 2020). Although we utilised concentric ‘Wasp’ electrodes designed to selectively activate nociceptive A-delta fibres (Inui et al., 2002), and confirmed that the resulting ERPs (N1-P2) were significantly modulated by phasic intensity, we observed no such attenuation by tonic pain. Indeed, neither subjective pain ratings nor the N1-P2 amplitude showed a significant modulation by the tonic pressure pain stimulus. In contrast, our results were more compatible with a trend in the other direction.”
  
  (10) The paragraph in the discussion "A concern that is sometimes raised..." (lines 243 - 254) raises interesting points, but its particular relevance to the study at hand is unclear.
  
  We appreciate the reviewer's feedback. The motivation for including this discussion is to address a common critique we received for the study: whether the observed reduction in vigour under tonic pain is "simply" due to distraction or cognitive load, rather than being a specific functional output of the pain system. We have revised this paragraph to link the concern to our paper’s specific finding.
  
  Our central argument is that for tonic pain, distraction is not a confounding "sideeffect" but rather the primary mechanism of action. By being inherently "distracting," tonic pain successfully withdraws resources from ongoing tasks (like foraging) to promote the energy conservation required for recuperation.
  
  (11) The clinical perspective of the methodological framework presented at the end of the discussion is interesting and could be expanded.
  
  We thank the reviewer for this encouraging comment. We have expanded the final paragraph of the Discussion to more explicitly state the clinical utility of our framework. Specifically, we now contrast our approach with standard clinical assessments such as Quantitative Sensory Testing (QST). We highlight that while QST is a valuable tool, it can lack ecological validity; in contrast, our VR-based task allows for a more realistic, behaviourally sensitive assessment of how pain impacts a patient’s daily functional activities and motivational state. We believe this represents a significant step towards more objective and "real-world" clinical pain phenotyping.
  
  (12) The statistical analyses part in the methods section should provide a clear definition of dependent and independent variables and clearly state which test was used for which analysis, e.g., by referencing the corresponding subfigure in the main text.
  
  We agree that a more structured summary of the statistical approach would improve the clarity of the Methods section. We have now included a comprehensive summary table (Table 1) in the Statistical Analysis subsection. This table explicitly defines the dependent and independent variables for each analysis, identifies the specific statistical model used (e.g. Linear Mixed Models or repeated measures ANOVA), and directly maps these to the corresponding figures in the results section.
  
  Minor comments:
  
  (1) Introduction:
  
  (a) The introduction should elaborate more on the advantages of employing an "ecologically meaningful context".
  
  We thank the reviewer for suggesting further elaboration on the advantages of employing an "ecologically meaningful context". We have updated the introduction to provide additional reasoning of choosing an ecologically valid context for the study:
  
  “One of the challenges in studying adaptive functions of pain is the difficulty of embedding experiments within ecologically meaningful contexts. To solve this, we designed an immersive foraging task using virtual reality (VR), in which humans search a forest to collect fruits from the low-lying bushes at varying heights. A foraging paradigm provides a robust, free-operant framework that captures the core components of adaptive behaviour: it is goal-directed, involves complex movement, and requires the learning of an optimal strategy to maximise rewards. This allows us to computationally dissociate how different types of pain influence the control of action.”
  
  (b) It would be helpful to clarify why tonic pain applied to a limb not involved in the task is expected to influence the motivational vigour with respect to the task.
  
  We thank the reviewer for pointing out additional clarification for applying tonic pain to the non-dominant arm. We have added the following text to the introduction clarifying our hypothesis and why it was applied to the non-task limb:
  
  “Second, we hypothesised that tonic pain acts as a coefficient modulating the tradeoff between opportunity cost and vigour cost, thereby serving a recuperative function. To test this in Experiment 2, we delivered continuous tonic pressure to the non-dominant arm via an inflated cuff to emulate a background state of injury. Within our free-operant framework, tonic pain was modelled as a weighting factor that shifts the optimal balance toward reduced energy expenditure. Because the stimulus was applied to the non-task limb, we specifically predicted a global reduction in motivational vigour—operationalised as decreased movement velocities and foraging rates—rather than a direct mechanical impairment.”
  
  (2) Results/Experiment 1:
  
  (a) How were monetary rewards implemented exactly? How much money per fruit?
  
  We thank the reviewer for the opportunity to clarify the incentive structure. Participants were informed at the start of the study that they would earn a performance-based bonus of up to £10, determined by the points they collected during the foraging task. To ensure that motivation remained consistent across the entire session for all individuals—regardless of their baseline foraging speed—the specific exchange rate between points and currency was not disclosed. This prevented potential 'ceiling effects', where a high-performing subject might stop exertive effort after reaching the maximum bonus early, or 'floor effects', where a subject might perceive the reward for an individual action as too small to be motivating.
  
  Following the completion of the experimental session, all participants were compensated with the full £10 bonus in addition to their base payment for participation. We have updated the Methods section to reflect these details:
  
  “Participants were informed at the start of the experiment that their total points would be rewarded with a monetary incentive of up to £10. To maintain a constant level of motivation throughout the task, the exact point-to-currency exchange rate was not specified. Upon completion of the session, all participants were awarded the maximum bonus of £10.”
  
  (b) A green pine apple is not ripe and, in a naturalistic context, possesses some aversive value, even in the absence of phasic pain stimuli. Why was the color coding not counterbalanced across individuals? To what degree could this have confounded the results?
  
  We thank the reviewer for this insightful point. We acknowledge that the lack of counter-balancing for fruit colour (green vs. yellow) is a limitation of the current study design. However, we believe the potential confounding effect of "unripe" green pineapples on the final analysed data is minimal due to the principles of associative learning.
  
  While a naturalistic heuristic (green = unripe) might establish a weak prior bias, fundamental associative learning [14] and reinforcement learning models [15] demonstrate that extensive training with a highly salient unconditioned stimulus (such as pain) rapidly overrides mild initial priors. The task objective focused strictly on maximizing reward points, and participants underwent extensive training (10 blocks in Experiment 1; 6 blocks in Experiment 2) before the analysed sessions began. During this time, the strong, explicit contingencies (green = pain, yellow = safe) were learned and verbally verified. Therefore, by the time the main experimental data was collected, any weak baseline aversion to green had been overshadowed by the explicit task contingencies, making the learned associative value the primary driver of behaviour. We have added a statement acknowledging this limitation and outlining this theoretical rationale in the Methods section.
  
  “While the colour association (green for painful, yellow for pain-free) was not counter-balanced across subjects, any inherent aversive value of green pineapples (e.g., as 'unripe' fruit) is expected to have a minimal confounding effect on the analysed data. In associative learning frameworks, while mild prior biases may influence initial value estimations, extensive training with a highly salient unconditioned stimulus (e.g. phasic pain) rapidly updates these values, driving them toward an asymptote determined entirely by the explicit task contingencies (Rescorla & Wagner, 1972; Sutton & Barto, 2018). Because participants underwent extensive training (10 blocks in Experiment 1 and 6 blocks in Experiment 2) to establish the explicit pain associations prior to the analysed sessions, the observed avoidance behaviour was predominantly driven by the learned phasic pain contingencies rather than baseline colour preferences.”
  
  (c) In the "Avoidance increases with increasing phasic pain intensity" section, clarify upfront that pain ratings and choice probabilities were estimated at the block level. This information is provided only in a later section.
  
  We agree with the reviewer that this information should be stated earlier for clarity. We have updated the beginning of the "Avoidance increases with increasing phasic pain intensity" section to specify that these metrics were estimated at the block level:
  
  “For this analysis, both aversive choice probabilities and subjective pain ratings were estimated at the block level.”
  
  (3) Results/Experiment 2:
  
  (a) ERP visualizations (Figure 5) should include standard error indicators.
  
  We have updated Figure 5 (now Figure 6) to include 95% confidence intervals for standard error of the mean across subjects for all ERP traces. This provides a clearer visualization of the variance in the neural response.
  
  (b) In the section "A unified model...", clarify what is meant by saying that the unified model is "validated by the behavioural data", since behavioral data is what is being modeled in the first place.
  
  We clarify that "validation" in this context refers to the consistency between the parameters estimated by our generative unified model and the results obtained from the independent, model-free regression analysis of the raw behavioural data. While both approaches use the same source data, the unified model provides a finer-grained analysis of latent internal states (like motivational vigour), whereas the regression provides a direct empirical benchmark (more details were discussed in the response to major comment (8)). We have rephrased this section to better describe this as a consistency check against empirical regression results.
  
  (c) In the context of Figure 8a, the term "correlations" is misleading if referring to pairwise comparisons.
  
  We appreciate the opportunity to clarify our terminology. The results presented in Figure 8a (and the associated text) are derived from a Linear Mixed Model (LMM) where the tonic pain condition was treated as a binary independent variable. The term "correlation" was used to describe the statistical association (represented by the t-values) between the presence of tonic pain and EEG band power, accounting for subject-level random effects. It does not refer to simple pairwise comparisons (like t-tests). However, we agree that "correlation" can be ambiguous when applied to a binary predictor. We have revised the text and figure legends to use the terms "associated with" or "predicted by" to more accurately reflect the LMM framework.
  
  (d) Based on the presented data, there is no evidence for the section headings claim "Neural activities link to vigour".
  
  We agree with the reviewer that our results primarily provide evidence for a significant neural association with the tonic pain condition rather than a direct, statistically robust correlation with the vigour parameter itself (after Bonferroni correction). While tonic pain is associated with reduced vigour behaviourally, the EEG markers we identified are more accurately described as signatures of the pain state. We have revised the section heading and the corresponding text to focus on the characterisation of the tonic pain state to ensure our claims are strictly supported by the statistical evidence.
  
  (4) Methods:
  
  In the supplementary materials, the headings pertaining to different LMMs are confusing and not consistent with the Figure labeling in the manuscript (e.g., 4(ii)b likely corresponds to Figure 4d).
  
  We thank the reviewer for identifying these inconsistencies in the supplementary material. We apologize for the confusion caused by the labelling errors during reformatting the manuscript. We have now thoroughly audited the supplementary headings and updated them to ensure they correspond directly and consistently with the figure labels in the main manuscript.
  
  References
  
  (1) Inui, K., Tran, T. D., Hoshiyama, M., & Kakigi, R. (2002). Preferential stimulation of Adelta fibers by intra-epidermal needle electrode in humans. Pain, 96(3), 247–252. https://doi.org/10.1016/S0304-3959(01)00453-5
  
  (2) Mørch, C.D., Hennings, K. & Andersen, O.K. Estimating nerve excitation thresholds to cutaneous electrical stimulation by finite element modeling combined with a stochastic branching nerve fiber model. Med Biol Eng Comput 49, 385–395 (2011). https://doi.org/10.1007/s11517-010-0725-8
  
  (3) Höffken, O., Özgül, Ö.S., Enax-Krumova, E.K. et al. Evoked potentials after painful cutaneous electrical stimulation depict pain relief during a conditioned pain modulation. BMC Neurol 17, 167 (2017). https://doi.org/10.1186/s12883-017-0946-7
  
  (4) Enax-Krumova, E., Plaga, A.-C., Schmidt, K., Özgül, Ö. S., Eitner, L. B., Tegenthoff, M., & Höffken, O. (2020). Painful Cutaneous Electrical Stimulation vs. Heat Pain as Test Stimuli in Conditioned Pain Modulation . Brain Sciences, 10(10), 684. https://doi.org/10.3390/brainsci10100684
  
  (5) Enrico Schulz, Elisabeth S. May, Martina Postorino, Laura Tiemann, Moritz M. Nickel, Viktor Witkovsky, Paul Schmidt, Joachim Gross, Markus Ploner, Prefrontal Gamma Oscillations Encode Tonic Pain in Humans, Cerebral Cortex, Volume 25, Issue 11, November 2015, Pages 4407–4414, https://doi.org/10.1093/cercor/bhv043
  
  (6) Mahajan Pranav, Tong Shuangyi, Lee Sang Wan, Seymour Ben (2024) Balancing safety and efficiency in human decision making eLife 13:RP101371 https://doi.org/10.7554/eLife.101371.2
  
  (7) Enrico Schulz, Elisabeth S. May, Martina Postorino, Laura Tiemann, Moritz M. Nickel, Viktor Witkovsky, Paul Schmidt, Joachim Gross, Markus Ploner, Prefrontal Gamma Oscillations Encode Tonic Pain in Humans, Cerebral Cortex, Volume 25, Issue 11, November 2015, Pages 4407–4414
  
  (8) Suyi Zhang, Hiroaki Mano, Michael Lee, Wako Yoshida, Mitsuo Kawato, Trevor W Robbins, Ben Seymour (2018) The control of tonic pain by active relief learning eLife 7:e31949
  
  (9) Hewitt, D., Tong, S., Schreiber, S., & Seymour, B. (2026). Tonic pain modulates neural correlates of associative phasic pain memories. PAIN. DOI: 10.1097/j.pain.0000000000003917
  
  (10) Gramann, K., Gwin, J. T., Ferris, D. P., Oie, K., Jung, T.-P., Lin, C.-T., Liao, L.-D., and Makeig, S. (2011). Cognition in action: imaging brain/body dynamics in mobile humans. Reviews in the Neurosciences, 22(6):593–582.
  
  (11) Klug, M. and Gramann, K. (2021). Identifying key factors for improving ica-based decomposition of eeg data in mobile and stationary experiments. European Journal of Neuroscience, 54(12):8406–8420.
  
  (12) Delorme, A. EEG is better left alone. Sci Rep 13, 2372 (2023). https://doi.org/10.1038/s41598-023-27528-0
  
  (13) Bach, D. R., Flandin, G., Friston, K. J., and Dolan, R. J. (2010). Modelling event-related skin conductance responses. International Journal of Psychophysiology, 75(3):349–356.
  
  (14) Rescorla, R. and Wagner, A. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement, volume Vol. 2
  
  (15) Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction, 2nd ed. Adaptive computation and machine learning. The MIT Press, Cambridge, MA, US.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.02.10.637253v3
www.biorxiv.org www.biorxiv.org

Economic and Social Modulations of Innate Decision-Making in Mice Exposed to Visual Threats

1
1. Public_Reviews 30 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  This study by Li and colleagues examines how defensive responses to visual threats during foraging are modulated by both reward level and social hierarchy. Using a naturalistic paradigm, the authors test how the availability of water or sucrose, with sucrose being more rewarding than water, shapes escape behavior in mice exposed to looming stimuli of different intensities, which are used to probe perceived threat level and defensive responses. In parallel, the study compares dominant and subordinate animals to assess how social rank biases the trade off between reward seeking and threat avoidance. By combining detailed behavioral analyses with computational modeling, the work addresses how reward level and social context jointly influence escape decisions in an ethologically relevant setting.
  
  Across the different experimental conditions, perceived threat level is the main determinant of behavior. The authors show that looming stimuli associated with higher threat (contrast) consistently elicit faster and more robust escape responses than lower threat stimuli. This effect is particularly evident during early exposures, when animals are highly vigilant and have not yet habituated to the looming stimulus (learned that it is not dangerous). Later they described that as animals gain experience and habituate, behavior becomes more flexible, and reward level begins to exert a graded modulation of the escape response. Importantly, the authors show that under high threat conditions increasing reward value leads to more frequent and faster escape rather than greater reward pursuit. This finding is particularly relevant, as it suggests that highly valued rewards can heighten vigilance and thereby enhance responsiveness to threat, highlighting that reward does not simply compete with defensive behavior but can also reshape it depending on the perceived level of danger, in contrast to low threat conditions, where threat can be more easily outweighed by reward. Thus, an important conceptual contribution of the study is the introduction of vigilance as a useful framework to interpret these effects. Vigilance is treated as a behavioral state reflecting heightened attention to potential danger. In line with what is known from natural foraging, mice initially maintain high vigilance when confronted with an innate threat. This perspective helps clarify a finding that might otherwise appear counterintuitive. One might expect higher rewards to motivate animals to tolerate risk, explore more, and habituate faster in any scenario. Instead, the data suggest that highly rewarding outcomes can elevate vigilance, making animals more responsive to threat and leading to faster or more frequent escape under high threat conditions. In this sense, reward does not simply compete with threat but can also amplify sensitivity to it, depending on the internal state of the animal.
  
  The social results are particularly interesting in this context as well. Dominant mice consistently prioritize avoidance over reward, showing stronger escape responses and slower habituation than subordinates. This behavior is well captured by the vigilance framework proposed by the authors: dominant animals appear to maintain higher vigilance, which biases decisions toward threat avoidance. The authors further suggest that stable social relationships sustain high vigilance and slow habituation, framing this as an evolutionarily conserved strategy that may enhance survival. This interpretation provides a valuable perspective on how social structure shapes defensive behavior beyond immediate physical interactions. At the same time, there are important limitations to this interpretation. All experiments were conducted in male mice, and it is possible that the relationship between social hierarchy, vigilance, and defensive behavior would differ substantially in females. In addition, the idea that stable social relationships maintain elevated vigilance does not straightforwardly align with broader views of social stability as protective for mental health and as a buffer against anxiety and stress. These points do not undermine the findings but suggest that the social effects described here should be interpreted with caution and within the specific context of the task and sex studied.
  
  We thank the reviewer for raising this important point. In the context of repeated looming exposure, slower habituation reflects more sustained vigilance over time. Compared to individually housed mice, group-housed mice exhibit slower habituation (Lenz et al., 2022), and pair-housed mice showed even slower habituation in our current work. Importantly, this pattern does not indicate that pair-housed mice have higher overall vigilance than individually housed animals. Although individually housed mice habituate more quickly, they display higher initial vigilance, as reflected by their increased probability of escaping in response to looming stimuli (Lenz et al., 2022). Thus, pairhoused mice exhibited reduced defensive responses compared to individually housed animals, consistent with a social buffering effect.
  
  Furthermore, in a separate study (Rank- and Threat-Dependent Social Modulation of Innate Defensive Behaviors; Li, Gao, Li, 2026, eLife 15:RP109571), we directly compared responses to looming stimuli when mice were tested alone versus in the presence of a social partner and observed clear evidence of social buffering.
  
  Another important limitation is that the neural mechanisms underlying these effects remain speculative. The manuscript includes an extensive discussion of candidate circuits, particularly involving the superior colliculus and downstream structures, but this section is necessarily based on prior literature rather than on data presented in the study. Given the complexity of the circuits involved in integrating internal state, reward, social context, and vigilance, the current work should be viewed as providing a strong behavioral and conceptual framework rather than direct insight into underlying neural mechanisms.
  
  We fully agree that the proposed neural mechanisms remain speculative and that the circuits involved in integrating internal state, reward, and social context are likely far more complex. We have revised the manuscript to acknowledge this limitation.
  
  Methodologically, the behavioral paradigm is well suited for studying escape decisions in socially housed animals, and the machine learning based classification of defensive responses is a clear strength. The computational model provides a useful formalization of how threat level, reward level, and vigilance interact and may be valuable for other laboratories studying escape, approach avoidance, or conflict situations, particularly as a way to classify behavioral outcomes after pose estimation. More generally, the work will be of interest to the neuroethology community for its detailed characterization of escape behavior under naturalistic conditions.
  
  Given the ethological nature of the study and the high inter individual variability reported by the authors, clarity and precision in the methods are especially important for reproducibility. While the revised manuscript addresses many earlier concerns, some aspects remain slightly difficult to follow. For example, the main text states that animals were not water deprived to avoid differences in internal state, whereas parts of the methods describe conditions in which animals were water deprived, suggesting that internal state manipulation may differ across experiments. Clearer separation and explanation of these conditions would further strengthen confidence in the work.
  
  To improve clarity, we have revised the Methods section to clearly distinguish between experimental conditions that involved water deprivation and those that did not.
  
  Overall, this study provides a rich and thoughtful analysis of how reward level and social hierarchy modulate defensive behavior through changes in vigilance. It offers a useful conceptual advance for thinking about escape behavior in naturalistic settings and lays a solid foundation for future work aimed at linking these behavioral states to underlying neural circuits.
  
  Reviewer #2 (Public review):
  
  Zhe Li and colleagues investigate how mice exposed to visual threats and rewards balance their decisions in favour of consuming rewards or engaging in defensive actions. By varying threat intensity and reward value, they first confirm previous findings showing that defensive responses increase with threat intensity and that there is habituation to the threat stimulus. They then find that water-deprived mice have a reduced probability of escaping from low contrast visual looming stimuli when water or sucrose are offered in the environment, but that when the stimulus contrast is high, the presence of sucrose or water increases the probability of escape. By analysing behaviour metrics such as the latency to flee from the threat stimulus, they suggest that this increase in threat sensitivity is due to increased vigilance. Analysis of this behaviour as a function of social hierarchy shows that dominant mice have higher threat sensitivity, which is also interpreted as being due to increased vigilance. These results are captured by a drift diffusion model variant that incorporates threat intensity and reward value.
  
  The main contribution of this work is quantifying how the presence of water or sucrose in water-deprived mice affects escape behaviour. The differential effects of reward between the low and high contrast conditions are intriguing, but I find the interpretation that vigilance plays a major in this process not supported by the data. The idea that reward value exerts some form of graded modulation of the escape response is also not supported by the data. In addition, there is very limited methodological information, which makes assessing the quality of some of the analyses difficult, and there is no quantification on the quality of the model fits.
  
  (1) The main measure of vigilance in this work is reaction time. While reaction time can indeed be affected by vigilance, reaction times can vary as a function of many variables, and be different for the same level of vigilance. For example, a primate performing the random dot motion task exhibits differences in reaction times that can be explained entirely by the stimulus strength. Reaction time is therefore not a sound measure of vigilance, and if a goal of this work is to investigate this parameter, then it should be measured. There is some attempt at doing this for a subset of the data in Figure 3H, by looking at differences in the action of monitoring the visual field (presumably a rearing motion, though this is not described) between the first and second trials in the presence of sucrose. I find this an extremely contrived measure. What is the rationale for analysing only the difference between the first and second trials? Also, the results are only statistically significant because the first trial in the sucrose condition happens to have zero up action bouts, in contrast to all other conditions. I am afraid that the statistics are not solid here. When analysing the effects of dominance, a vigilance metric is the time spent in the reward zone. Why is this a measure of vigilance? More generally, measuring vigilance of threats in mice requires monitoring the position of the eyes, which previous work has shown is biased to the upper visual field, consistent with the threat ecology of rodents.
  
  (2) In both low and high contrast conditions, there are differences in escape behaviour between no reward and water or sucrose presence, but no statistically significant differences between water and sucrose (eg: Figure 3B). I therefore find that statements about reward value are not supported by the data, which only show differences between the presence or absence of reward. Furthermore, there is a confound in these experiments, because according to the methods, mice in the no-reward condition were not water-deprived. It is thus possible that the differences in behaviour arise from differences in the underlying state.
  
  (3) There is very little methodological information on behavioural quantification. For example, what is hiding latency?
  
  Is this the same are reaction time? Time to reach the safe zone? What exactly is distance fled? I don't understand how this can vary between 20 and 100cm. Presumably, the 20cm flights don't reach the safe place, since the threat is roughly at the same location for each trial? How is the end of a flight determined? How is duration measured in reward zone measures, e.g., from when to when? How is fleeing onset determined?
  
  (4) There is little methodological information on how the model was fit (for example, it is surprising that in the no reward condition, the r parameter is exactly 0. What this constrained in any way), and none of the fit parameters have uncertainty measures so it is not possible to assess whether there are actually any differences in parameters that are statistically significant.
  
  These are the public reviews for the original submission. The corresponding authors responses are provided below.
  
  (1) We agree that reaction time can be influenced by multiple factors, including stimulus strength. Consistent with this, reaction times (i.e. latencies to flee) were substantially shorter under high-contrast conditions (Figure 3E). However, even under the same high-contrast condition, reaction times were significantly shorter in the water condition compared to the no-reward condition, suggesting that other factors such as vigilance may contribute.
  
  Upward-directed attention includes rearing, up-stretching, and upward head orientation, which will be clarified in the Method section. To address concerns about statistical validity, we will quantify these behaviors across the first 10 trials rather than limiting the analysis to the first two.
  
  As for the dominance-related results, we interpret them as reflecting both enhanced vigilance and reduced reward-seeking behavior. Time spent in the reward zone is not a measure of vigilance but an indicator of reward-seeking motivation. We will clarify this in the revised manuscript.
  
  (2) In Figure 3B, the difference between water and sucrose conditions did not reach statistical significance (p = 0.08). We plan to collect additional data to determine whether this is due to limited statistical power. It is also possible that some behavioral readouts are more sensitive to the differences between water and sucrose conditions. For example, Figure 3F shows that escape speed was significantly higher in the sucrose than in the water condition under high-contrast stimulation.
  
  Thank you for pointing this out. To control for the potential confounds related to internal state, mice were not water-deprived under any of the three conditions in Figures 3A-3H. We will clarify this in the main text and Methods. For Figures 3I-3M, which compare decision-making under no-reward and water conditions, we will conduct additional experiments using non-deprived mice in the water condition.
  
  (3) Hiding latency was defined as the time from stimulus onset to the animal’s arrival at the safe zone. Reaction time was quantified as the latency to flee, measured from stimulus onset to the initiation of the first flight state. The flight state was defined as locomotion exceeding 10 cm at a speed greater than 10 cm/s. Distance fled was defined as the distance covered between stimulus onset and offset for all trials. However, in trials classified as no reaction or freezing, this measure does not accurately reflect escape behavior. We will therefore rename it as distance under threat to better capture its meaning. The reward zone was defined as the region within 15 cm of the reward port at the end of the arena. Duration in the reward zone was measured as the time spent within this region during the 20 seconds following stimulus onset. In Figure 4E, the percentage of time spent in the reward zone was calculated relative to the total time the mouse remained in the arena during the 2-hour social session.
  
  All definitions and additional details on behavioral quantification will be included in the revised Methods section.
  
  (4) We appreciate the comment and agree that further clarification is needed. We will provide a more detailed description of the model fitting procedure in the revised Methods section. Specifically, the drift rate parameter (r), which reflects the perceived reward value, was constrained to zero in the no-reward condition. To enable statistical comparison across conditions, we will report uncertainty measures for all fit parameters.
  
  Comments on the revised manuscript:
  
  The manuscript has been revised and improved significantly by the addition of methodological details and new analysis. I remain, however, unconvinced by the argument that increased vigilance in the presence of reward leads to heightened escape behaviour.
  
  In response to my criticism that the work does not measure vigilance directly, the authors have included measures of foraging interval and foraging speed, which they state are "two direct behavioral analyses of vigilance". I disagree - like reaction time, foraging speed and foraging interval can be modulated, for example, by changes in threat sensitivity. Increased threat sensitivity comes with diverse behavioral changes that may well include increased vigilance, but foraging interval and foraging speed can certainly change without the animal expressing increased vigilance behaviors. A bigger issue I still have though, is with the conclusion that the presence of reward increases "direct escape behaviors". Comparing the no reward, water and sucrose groups indeed shows a difference (which is now clear after the split into early and late phases), but the issue is that these are different mice. As the text is written, is sounds like introducing reward will acutely increase escape. But if we look at the raw data show in Figure 2C, what I think is happening is that the presence of reward is decreasing habituation to the stimulus. The data for trials 1 and 10 in the three conditions show this - there is habituation with no reward (reaction times are all shifting to the right), a bit less with water and very little with sucrose. This is interesting in its own right and we can speculate why it might be happening, but I think this is conceptually different from what the authors are proposing.
  
  We agree that vigilance is not directly observable as a single variable. Our intent was not to claim that foraging speed and foraging interval provide a direct measure of vigilance, but rather to suggest that they may serve as indirect behavioral correlates.
  
  We also considered an alternative interpretation: these two measures could reflect perceived reward value under high-threat conditions across distinct reward types. If that were the case, animals would be expected to exhibit shorter intervals and faster speeds across no reward, water, and sucrose conditions. However, our data do not support this interpretation (Figures 3L and 3M), suggesting that these measures are more likely correlated with vigilance.
  
  Furthermore, it is unlikely that changes in foraging interval and speed are driven by altered threat sensitivity, as animals could not see the threat during most of the foraging bout and only encountered it at the end.
  
  Regarding the conclusion that the presence of reward increases direct escape behaviors, our interpretation is that increased reward value reduces habituation, thereby maintaining higher vigilance during the late phase. This was discussed in the second-to-last paragraph of the "Economic and social modulations of innate decision-making under threat" subsection in the Discussion.
  
  Reviewer #3 (Public review):
  
  Male mice were tested in a classic behavioral "flee the looming stimulus" paradigm. This is a purely behavioral study; no neural analyses were done. Mice were housed socially, but faced the looming stimulus individually, using an elegant automated tunnel (see videos for clarity).
  
  The additional changes made to the paper clarify the work done. While there are some limitations (male mice, weird stimulus), the general results are interesting and a valuable addition to the experimental literature. The main claim of the paper is that the different rewards (none, water, sucrose) did not change the escape properties early in learning, but did late, particularly that in the late (already experienced) conditions, reward value (assuming sucrose > water > no reward) interacted with the salience of the looming stimulus (light gray, dark gray). (Panels 3D, 3G, 3K, 3N).
  
  For readers, I want to note that one of the most interesting results is actually in Figure S2, where they find that a looming stimulus behind the mouse still makes a mouse run to the nest. In these conditions, the mouse runs past the looming stimulus to get to safety! (I also do love the video of the mouse running around the barriers like a snake to get home.)
  
  I have a few minor clarification questions and a few notes that I think would be useful additions for authors and readers to think about.
  
  Dominance: What does the mouse social science literature say about the "test tube" test? What can we conclude from this test? This would be useful when trying to understand what is causing the dominance/submissive difference in responses. Figure 4 shows that the dominant mice are more risk-averse than the submissive mice. Is "dominance" in the test-tube actually a measure of risk-seeking? Is the issue that the submissive mice don't think they can get back to the food-site easily, so they are less willing to sacrifice the current (if dangerous) foraging opportunity? Is the issue that the submissive mice can't get back to the nest? As I understand it, the nest was always available to all the mice, so I suspect inability to get to the nest is an unlikely hypotheses. Is the issue that the submissive mice also don't feel safe in the nest?
  
  The tube test is a widely used assay in the rodent social behavior literature to assess dominance hierarchies, operationally defined by the ability of one animal to force its opponent to retreat from a narrow tube. Importantly, this assay does not directly measure risk-seeking or anxiety-related traits, but rather competitive outcomes during social conflict. Furthermore, our data indicate that the behavioral responses of subordinate mice to looming stimuli are primarily driven by the visual threat itself rather than by social avoidance. This point was elaborated in the second paragraph of the “Social modulation of innate decision-making” subsection in the Results section.
  
  Limitations of the study: There is an acknowledged limitation to male mice, and the limitations of the small data sets that are typical of such experiments. In addition, however, it is also worth noting the strangeness of the looming stimulus, which is revealed clearly in the videos. The stimulus is a repeating growing circle, growing in a single location within the environment. The stimulus repeats 10 times, once per second. This is not what an attacking hawk or owl would look like. (I now have this image of an owl diving down, and then teleporting up and diving down again.) Note - I am fine with this stimulus. It produces an interesting experiment and interesting results. I do not think the authors need to change anything in their paper, but readers need to recognize that this is not a "looming predator".
  
  These "limitations" are better seen as "caveats" when folding these results in with the rest of the literature that has gone before and the literature to come. (Generally, I do not believe that science works by studies making discoveries that change how we think about problems - instead, science works by studies adding to the literature that we integrate in with the rest of the literature.) Thus, these caveats should not be taken as problems with the study or as fixes that need to be done. Instead, they are notes for future researchers to notice if differences are found in any future studies.
  
  Thus, my only suggestion is that I think authors could write a more careful paper by using the past and subjunctive tense appropriately. Experimental observations should be in past tense, as in "the influence of reward was contextdependent and emerged in the late phase" instead of "the influence of reward is context-dependent and emerges in the late phase" - it emerged in the late phase this once - it might not in future experiments, not due to any fault in this experiment nor due to replicability problems, but rather due to unexpected differences between this and those future experiments. At which point, it will be up to those future experiments to determine the difference. Similarly, large conclusions should be in the subjunctive tense, as in "these data suggest that threat intensity is likely to be the primary determinant of decision making" rather than "threat intensity is the primary determinant of decision making", because those are hypotheses not facts.
  
  We thank the reviewer for the helpful suggestions and have revised the Abstract accordingly.
  
  Recommendations for the authors:
  
  Reviewer #3 (Recommendations for the authors):
  
  Figure 5: The points in panel 5G and 5H are unreadable. What are these stars and symbols supposed to mean? They are also too small to see without zooming way in.
  
  We have increased the symbol size.
  
  Figure 5: What is the final panel of 5J? I did not understand this panel at all. The first three panels of 5J (threat-based detection, reward-based detection, vigilance-based detection) are, I believe, three patterns we should look for in the data. But then what is the "experimental results" section? It contains all three, but they don't overlap? Shouldn't we have an experimental results section for each condition?
  
  Panel 5J was to compare three hypothesized decision patterns with the experimentally observed data. To make this distinction explicit, we have revised the panel titles to: “H1: Threat-based decisions,” “H2: Reward-based decisions,” “H3: Vigilance-based decisions,” and “Experimental results.”
  
  Thank you for including the videos. They made the task construction and the stimulus much clearer.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.05.12.653401v3
www.biorxiv.org www.biorxiv.org

Distinct gradients of cortical architecture capture visual representations and behavior across the lifespan

1
1. Public_Reviews 30 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Weaknesses:
  
  While the evidence in favour of the two gradients largely supports the claims, the evidence for a new visual field map cluster in the anterior temporal lobe falls short of the level used historically when identifying visual field maps in the visual cortex and is, at present, not convincing. More specifically, the progressions of polar angle within the putative anterior lobe cluster are highly variable across subjects. Few subjects have convincing polar angle reversals at either the horizontal or vertical meridians. In other cases, a putative border is shown that spans different polar angles, which does not align with the accepted definitions for visual field maps in the cortex.
  
  We agree with the reviewer that more evidence could be provided in support of retinotopic representations within the anterior temporal lobe. We have performed a number of new analyses to further explicate the receptive field properties of this anterior temporal lobe visual representation. We have pasted updated Figure 2e-i. We have added additional participants, increasing the total number from N=12 to N=21. In panel g, we show that in this larger group, we can still observe pRFs that are about 3x larger than those in early visual cortex, and that the relationship between their size and eccentricity shows the expected steeper slope compared to these early representations. In this new participant group, we also illustrate the visual field coverage of the left and right anterior temporal lobe representations (panel h). As expected, the left hemisphere pRFs largely sample the right visual field, and right hemisphere pRFs largely sample left visual space. One can also see that both the upper and lower visual fields are sample quite evenly, consistent with the hemi-field representation of visual field maps observed in earlier visual cortex. To quantify whether there is a left-right contralateral bias in the sampling of visual space (and to test whether such a bias is significantly different in each hemisphere), we calculated for each pRF a laterality index as previously defined by Sheremata and Silver (2015) according to the equation below:
  
  Where resulting values of 1 mean the pRF is contralateral, 0.5 is no laterality bias, and 0 is ipsilateral bias. Additionally, we input pRF sigma values that were adjusted for the non-linearity exponent as defined by Kay et al. (2013). For the purposes of visual comparison, we subtracted 0.5 from index values so that resulting laterality scores were relative to 0 to represent the center of the visual field, and then values were inverted with a -1 scalar so that left hemisphere pRF laterality index values are plotted on the right side of space, and the right hemisphere on the left as shown in panel i. The laterality index was calculated for each pRF for a given participant and then averaged within that participant to result in a single mean laterality index for the left hemisphere pRFs and a single index for their right hemisphere pRFs. The histograms illustrated in panel i depict density of participants (kernel smoothed). We find a significant difference between laterality indices with left AT pRFs showing significantly rightward index values compared to right AT pRFs (paired-samples t-test, t(20) = 7.6, p = 2.7 x10<sup>-7</sup>). These data thus offer stronger evidence of a hemifield representation with a contralateral bias, and it should also be noted that there is stronger ipsilateral coverage in these high-level visual pRFs compared to earlier visual field maps like V1, which is consistent visual field maps in latera stages of the visual processing hierarchy as quantified by Mackey et al. (2017).
  
  Lastly, we note that the progression of polar angle values on the cortical surface is certainly not as strikingly topographic as in visual field maps V1 through hV4. This is perhaps a result of the strong ipsilateral visual field coverage in which pRFs whose centers were near or within the ipsilateral field (especially those near the fovea) are not visualized appropriately when using a contralateral colormap. It is also possible that at this very late stage of visually-responsive cortex within entorhinal cortex that retinotopic topography becomes less clear as is the case in higher stages of the dorsal visual stream. To improve visualization, we have created a new Supplemental Figure 6 using a binary color map that colors lower and upper visual field in separate colors and extends into the ipsilateral visual field (pasted below for convenience). We hope that this color map helps to show the upper and lower visual field coverage. While there is a clear radial eccentricity gradient within these AT pRF clusters, and while most participants do show a polar angle gradient that runs perpendicular to this radial eccentricity gradient as expected for a visual field map, we do agree that it is difficult to observe polar angle traversals as clearly as in earlier visual cortex. Nonetheless, the presence of these pRF clusters which show their own distinct eccentricity representation (i.e., a foveal confluence) and a full sampling of the contralateral visual space is still consistent with our anatomical model’s prediction in which PC2 anchor points predict foveal representations shared by visual field map clusters. While the topographic clarity of these representations on the cortical surface is less than earlier visual cortex, the existence of contralateral representations of visual space with a full eccentricity gradient that spans the upper and lower visual field is strongly supported by the data and consistent with our anatomical model’s prediction that there should have been a distinct eccentricity gradient. These findings are also consistent with work showing that the human hippocampus also shows sensitivity to contralateral visual space (Silson et al., 2021) and suggests the hippocampus may inherit this contralateral bias from this entorhinal visual representation. We have updated the manuscript to incorporate these new findings, and refer to these AT clusters as contralateral visual representations, remaining agnostic to whether or not they can be fully defined as topographic maps which can be the focus of future work using smaller voxel sizes to better capture small topographic gradients.
  
  We have revised the manuscript to incorporate these points in the following sections.
  
  Line 466: “We performed pRF mapping on 21 participants with high-contrast, …”
  
  Line 601-625: “To produce maps of visual field coverage (Figure 2h) similar to previous work, … The histograms illustrated in Figure 2i depict density of participants (kernel smoothed).”
  
  Line 236-246: “We find that consistent with its high position within the processing hierarchy, … We find a significant difference in laterality indices between left and right AT pRF’s (pairedsamples t-test, t(20) = 7.6, p = 2.7 × 10-7).”
  
  Line 373-383: “The organization of polar angle in anterior temporal cortex was not as orderly as earlier visual cortex, … in more posterior portions of ventral occipitotemporal cortex.”
  
  Reviewer #2 (Public review):
  
  Weaknesses:
  
  (1) The neurobiological model does not take into consideration present knowledge about the microstructural organization of the visual system. This limits the way the results are interpreted correctly. Critical information on the layer-specific myeloarchitecture and cytoarchitecture (and their relation to cortical thickness), as explored for example by Sereno et al. 2013 Cereb Cortex, is missing. There is no information given with respect to how different visual areas differ in their microstructural profile. It is also not mentioned that cortical parcellation is indeed characterized by sharp boundaries between areas, rather than structural gradients, so it remains unclear why focusing on a gradient is of interest. The authors cite the parcellation atlas by Glasser et al. 2016, but do not discuss the rationale of this publication, which was not the definition of gradients, but the definition of sharp boundaries for cortex parcellation. Indeed (as explained below), the results of the authors seem to a large extent to be driven by cortex parcellation, but instead of acknowledging this fact, the authors write (line 179) that "we hypothesize that these local deviations from the canonical thickness and density of cortex underlie the finer-scale division of visual cortex into categorically distinct regions. That is, does the realization of the cortex into distinct regions involve these regions becoming more distinct from a prototypical cortical sheet (i.e., gradient 1)?" - While the first sentence is reasonable, the second sentence is pure speculation ignoring present knowledge on cortical parcellation of this area according to which there is no "prototypical cortical sheet", but each area has its distinct microstructural profile.
  
  We thank the reviewer for this important comment. We first want to point out that we believe there is a conceptual misunderstanding on the part of the reviewer, as we address in our lengthy response below. In this response, we explain that our findings capture what we believe is a novel finding—that variation across participants in the cortical sheet is not random across the spatial expanse of cortex but respects its functional boundaries—which we view as a finding that is complimentary to the current knowledge about the microstructure of visual cortex. It was not our intention to ignore or gloss over this present knowledge, but instead show that variation in these cortical microstructures across brains is not random.
  
  We agree that incorporating current knowledge about the microstructural organization of visual cortex, including its laminar architecture and sharp areal boundaries, is critical for situating our findings within the broader literature. In response, we have added key background information on the relationships among cytoarchitecture, myeloarchitecture, and cortical thickness, as described in previous studies (for example, Maingault et al., 2021; Sereno et al., 2013; Shafee et al., 2015). While our study does not aim to capture layer-specific properties per se, which would require different imaging modalities and higher-resolution data, we focus on spatial properties tangential to the cortical surface.
  
  We first address a concern that the particular parcellation might be driving effects with an analysis showing that we believe our finding is robust to this concern. As suggested by the overall negative covariance observed between cortical thickness and tissue density, we further confirmed this relationship not only across larger visual ROIs, which could potentially reflect effects of arealization, but also within individual ROIs at a finer spatial scale. To avoid potential circularity in ROI definition, we used a visual ROI atlas derived from population-level retinotopy based on independent datasets (Abdollahi et al., 2014). We found that at the global level, cortical thickness and T1w/T2w ratio showed a strong negative correlation across visual ROIs (Fig. 3, revised Supp. Fig. 3a & b). Although only a portion of the visual cortex is clearly delineated in this atlas, we replicated similar results across the entire visual cortex using the MMP atlas (Glasser et al., 2016). At the within-ROI level, we found robust negative correlations between cortical thickness and T1w/T2w ratio across most visual ROIs in both hemispheres, with the notable exception of V1, V2 and VO1, which exhibited a positive relationship, consistent with prior work (for example, Maingault et al., 2021; Sereno et al., 2013; Shafee et al., 2015). These results highlight both common and distinct microstructural profiles across the visual cortex and provide important context for interpreting our data-driven findings.
  
  We also want to address what we think is a conceptual misunderstanding by the reviewer, which likely resulted from a lack of clarity on our part. The reviewer’s confusion likely results from the fact that we theoretically “transposed” the typical PCA analysis such that we get a subject-wise contribution (PC loadings) per participant (also see response to next point), which is how we’re able to relate inter-participant variability in their loadings to behavior in Figure 3. This is also why we refer to a “typical” cortex/cortical sheet because the surface maps being visualized for PC2 can be thought of as a map explaining variance of deviation orthogonal to PC1 (which captures the primary relationship between thickness and T1/T2). Thus, because PC2 is orthogonal to PC1, it captures the spatial pattern in which participants deviate from the primary relationship (e.g., the typical relationship). Therefore, if a given participant is far from the PC1 vector and has high PC2 loading, their cortical sheet is either thicker or more myelinated than predicted by the PC1 relationship and is therefore more distinct from the “typical” or “average” cortical sheet values captured by PC1. We want to emphasize that PCA is agnostic to spatial structure across the cortex. Thus, the fact that deviation from the primary thickness-myelination relationship (i.e. PC2) captured by PC1 had any spatial structure at all is interesting. Furthermore, the fact that the spatial structure of PC2 across the cortical sheet seems to separate visual cortex into its constituent processing streams is also interesting. Therefore, we are not speculating but rather describing the PCA model itself whereby a participant’s loading on PC2 describes their deviation or distinctness from the PC1 relationship. The fact that PC2 has spatial structure on the cortical sheet (which did not have to be true) and the fact that this structure seems to capture broad borders between visual processing streams and field maps is what we find interesting and quantify within the paper. We hope this additional explanation clarifies the broader theoretical thrust of the paper. We view these findings as complimentary to the present knowledge of the microstructural organization of the visual system. Our findings suggest that variability in these microstructural features across participants (PC2) don’t occur randomly across cortex but seem to respect the functional borders of the neural populations of the underlying cortical sheet.
  
  Regarding the concern that our gradient approach may contradict established knowledge of cortical arealization, we would like to clarify that the primary goal of our gradient analysis is not to redefine visual areas, or to go against cortical arealization, but to explore the continuous variation in cortical architecture across brains that may co-exist alongside sharp boundaries which is phenomenon complementary to the arealization. In our study, cortical thickness maps were regressed for curvature before entering any analyses, given the covariance between cortical folding and area borders (Fischl et al., 2008). We acknowledge that cortical parcellation is traditionally characterized by discrete transitions between areas. However, our results suggest that gradients of cortical properties—particularly those shared across participants—may capture supra-areal organizing principles that reflect how distinct regions relate to one another within a broader cortical sheet.
  
  Finally, we agree with the reviewer that the phrase “prototypical cortical sheet” was speculative and potentially misleading. We have removed this language from the manuscript and revised the corresponding discussion.
  
  We have revised the manuscript to incorporate these points in the following sections.
  
  Line 92-94: “Thickness and density maps showed a robust anti-correlation both at the coarse across-area level based on an independent parcellation and at the finer within-area level, except in primary regions (Figure S3a, b).”
  
  Line 350-353: “The convergence pattern, arising from the negative correlation between thickness and density, is consistent with previous findings and may support the balloon model, whereby cortical thinning is associated with tangential stretching due to myelination.”
  
  Line 188-189: “That is, does the arealization of cortex into distinct regions involve these regions becoming more distinct from a typical cortical sheet (i.e., gradient 1)?”
  
  (2) Instead of building on present, detailed knowledge of brain anatomy and in-vivo cortex parcellation of the visual system and its known relation to visual maps, the authors focus on two metrics of cortex architecture (mean T1/T1 over depth and cortical thickness), and conduct a PCA to explore their shared variance. It needs to be clarified if the PCA was conducted correctly. There is no mention of standardizing the variables, which could bias the results. In addition, in a PCA, all possible features are categorized as vector components, and those are scanned through the samples, hence, one such analysis per vertex. But the authors write "in which participants are features and cortical vertices are samples" and "the thickness and tissue density maps were concatenated". This needs clarification. The architecture of the PCA should be visualized better.
  
  We thank the reviewer for pointing out the need to clarify the PCA methodology. In response, we have revised the Methods section to provide a clearer and more accurate description of our approach.
  
  We also would like to point the reviewer’s attention to Figure 1a, in which the PCA was illustrated graphically. The reviewer’s confusion likely results from the fact that we theoretically “transposed” the typical PCA analysis such that we get a subject-wise contributions (PC loadings) per participant, which is how we’re able to relate inter-participant variability in their loadings to behavior in Figure 3. This is also why we refer to a “typical” cortex/cortical sheet because the surface maps being visualized for PC2 can be thought of as a map explaining variance of deviation orthogonal to PC1 (which captures the primary relationship between thickness and T1/T2). Thus, because PC2 is orthogonal to PC1, it captures the spatial pattern in which participants deviate from the primary relationship (e.g., the typical relationship).
  
  We have revised the manuscript in the following sections.
  
  Line 493-502: “For each hemisphere, individual cortical thickness and T1/T2-weighted ratio maps from all HCP-YA participants—each represented as an M × N matrix, … corresponding participant-wise contributions (i.e., PC loading or individual weights) in pairs.”
  
  (3) Because the PCA only contains two features, PC1 is driven by the positive relationship between cortical thickness and mean T1/T2, whereas PC2 is driven by their negative relationship. Because in the early visual cortex, cortical thickness and mean T1/T2 correlate positively, it naturally follows that PC1 relates to pRF size (but mediated by the actual cortex parcellation). However, it is unclear why this insight is interesting. I also do not share the view that "these findings demonstrate that gradient 1 acts as a global gradient enveloping the entire visual cortex (...) while gradient 2 acts as a local gradient specific to individual visual streams". I think this relationship between cortical thickness and T1/T2 ratio does not have much to do with local and global gradients. But if so, stronger arguments as to why this should be the case should be presented. What the authors make of this result (particularly the discussion starting line 366) is not clear to me. I cannot follow the line of argumentation, which in my view is too far away from the data.
  
  We appreciate the reviewer’s thoughtful comments and agree that, in general, cortical thickness and T1w/T2w ratio tend to be negatively correlated, with early visual areas (i.e., V1 and V2) representing a notable exception—an observation we highlight and support with evidence in R2. Given this overall pattern of correlation, it may seem intuitive to interpret PC1 as capturing a convergent relationship across the two metrics, and PC2 as reflecting their divergence. Alternatively, one can think of PC2 as the orthogonal residuals from the linear relationship between thickness and myelin captured by PC1. In this framework, PC2 is not necessarily the inverse correlation, but instead what is left unexplained through a simple linear model. However, it is important to note that PCA is inherently agnostic to spatial structure, as our PCA operates solely on inter-subject variance. As such, the spatial patterns observed in the resulting component maps are not direct or trivial consequences of the input correlations.
  
  Upon examining the spatial properties of the PCA-derived maps (Fig. 1d), we found that PC1 manifests as a large-scale, low-frequency gradient spanning broad portions of the visual cortex, whereas PC2 exhibits a fine-scale, high-frequency pattern confined to subregions of the visual cortex (quantified in Fig. 1f, g). Our initial use of the terms “global” and “local” may have inadvertently implied functional interpretations beyond our intent. We have revised the manuscript to clarify that these descriptors were intended purely to convey differences in spatial scale based on the observed frequency content of the gradients.
  
  Motivated by the reviewer’s comment, we performed additional analyses to explicitly test whether the PCA components reflect consistent (i.e., global) or variable (i.e., local) relationships across visual ROIs. Specifically, we examined whether the direction and magnitude of PC1 and PC2 scores within each ROI align with the global relationships between cortical thickness and tissue density. As shown in the revised Supp. Fig. 3e, we found that in most ROIs, vertices with high PC1 scores consistently exhibit high cortical thickness and low T1w/T2w ratios, while those with low PC1 scores show the opposite pattern. This within-ROI consistency mirrors the largescale cross-ROI correlation structure (see Supp. Fig. 3a), supporting the interpretation of PC1 as reflecting a large-scale, cortex-wide organizational principle. In contrast, PC2 shows more heterogeneous profiles across ROIs, with peaks and troughs that differ in the two metrics. This variability suggests that PC2 captures more localized, region-specific features.
  
  We have incorporated the results of these new analyses into the Results section to strengthen our argument regarding the spatial scale and cross-regional consistency of the PCA-derived gradients:
  
  Line 102-107: “Within-area analyses further confirmed that PC1/2 represent the consistent/deviating components … while PC2 represents the spatial divergence from this commonality.”
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  Through collaborative discussions among the reviewers, we first summarised the key recommendations for enhancing the significance and strengthening the evidence of the work - integrating public reviews and recommendations to authors by each reviewer individually. The individual reviewer recommendations can be found below this.
  
  (1) Modelling component 2
  
  The geodesic model for component 2 is interesting but we can recommend ways to improve the evidence and interpretation (see Reviewer 1 comments). As the polar angle reversals are inconsistent and boundaries ambiguous, the OTS maps do not meet the standard of evidence required for showing a new map. The 181 pRF maps available for these HCP data would provide an independent more powerful test of the OTS map cluster. To further strengthen the evidence for the proposed correspondence of foveal confluences and gradient 2, why not define the geodesic model anchoring points based on retinotopic measures, e.g., using HCP pRF data? About the current anchoring points for the geodesic model, what were the criteria - were they objective to avoid circularity?
  
  We appreciate the reviewer’s suggestion to incorporate the HCP 7T retinotopy dataset as an independent test of the proposed geodesic model and its relation to foveal confluences and gradient 2. We agree in principle that such data could provide a valuable validation resource. However, as detailed in the publication accompanying the HCP 7T retinotopy dataset (Benson et al., 2018), the authors recommend a threshold of 9.8% variance explained to distinguish reliable pRF estimates from noise. As illustrated in their Figure 4, this thresholded pRF data shows poor signal coverage in higher-order visual regions, particularly those along the occipitotemporal sulcus (OTS), where gradient 2 effects are most prominent in our data. This lack of reliable pRF signal in these regions limits the utility of the HCP retinotopy data for anchoring the geodesic model or validating the observed spatial gradients.
  
  To address this limitation, we relied on our in-house data collected using high-contrast, naturalistic images designed to robustly activate high-level visual areas. This approach allowed us to define more complete and consistent topographic patterns in the regions of interest. We have thus expanded the size of this in-house dataset to N=21. We also point the editor’s attention to the response to Reviewer 1’s first comment regarding the visual field maps for a more detailed response to this point. For convenience, we have pasted the Figure 2 e-i panels in which we conduct additional analyses showing that these anterior temporal pRF clusters tile contralateral visual space as one might expect (Fig 2h), and significantly differ across hemispheres in their laterality bias (Fig 2i). We have revised the manuscript accordingly.
  
  To mitigate the concern of circularity in defining the geodesic model’s anchor points, we conducted a split-half cross-validation. Anchors were defined on one half of the participants and used to predict the PC2 map in the other half. The PC2 maps across the two halves were highly similar (r = 1.00, p < 0.001), indicating strong reliability. Importantly, the cross-predicted geodesic model accounted for a significant portion of variance (r<sup>²</sup> = 0.23) in the held-out PC2 map, suggesting that the geodesic organization is not an artifact of overfitting or circular reasoning. We have revised the manuscript accordingly:
  
  Line 139-142: “A split-half cross-validation yielded similar results, … underlying the spatial organization of PC2.”
  
  (2) Speculation about prototypical cortical sheet
  
  You hypothesise that gradient 1 characterises a global "prototypical cortical sheet" characteristic, with gradient 2 reflecting that regions become more distinct from this prototype. There is an alternative simpler possibility: the data can be explained by the stronger relationship between cortical thickness and T1/T2 ratio in early compared to late sensory areas, as can for example be seen in Glasser et al. 2016 Nature, Figure 4. We recommend omitting or balancing the statement about a "prototypical" cortex, and integrating findings on cortex parcellation and the view that sharp boundaries characterize transitions between high and low T1/T2 and cortical thickness areas.
  
  Please see R2 for reviewer #2
  
  (3) Confounds
  
  We'd like to see more data to understand the contributions of data quality to these results. For the component 1 gradient specifically, could its features be influenced by spatial SNR inhomogeneities? Could the developmental effects for both gradients be explained by lower SNR and other data quality markers in younger and older participant data? We missed appropriate tests that gradients develop differently across age, controlling for such confounds (Reviewer 1 comments).
  
  Regarding the reviewer’s concern about the component 1 gradient, we believe it is unlikely to be merely a consequence of uneven spatial SNR. Our findings are consistent with previous histological studies demonstrating systematic variations in cortical architecture—specifically, thinner cortex (Wagstyl et al., 2020) and higher myelin content (Dinse et al., 2015) in occipital compared to ventral visual regions. This correspondence between in vivo MRI-derived measures and postmortem histology suggests that the large-scale organization captured by PC1 is grounded in biologically meaningful cortical architecture, and not an artifact of SNR variability.
  
  To statistically assess whether the two PCs show different developmental trajectories across age, we performed an ANOVA with age, LC, and their interaction as factors on LC’s similarity to PC (i.e., r ~ age + LC + age × LC). Significant age × LC interactions were observed in the developmental (HCPD: F<sub>1,118</sub> = 257.01, p < .001) and aging (HCPA: F<sub>1,132</sub> = 263.85, p < .001) cohorts, but not in the young adult cohort (HCPYA: F<sub>1,202</sub> = 0.02, p = 0.80). These findings indicate that the two gradients show distinct age-related changes during development and aging but remain stable in young adulthood. We have revised the manuscript accordingly:
  
  Line 313-327: “Examining the correlation between the young adult gradient and LC … F<sub>1,132</sub> = 263.85, p < 0.001).”
  
  (4) Implementation of PCA
  
  The manuscript raises questions about the correct implementation of the PCA - please clarify that the variables were first standardised to enable fair weightings, and visualise the PCA matrix in more detail than in Figure 1a to ensure the samples and features are correctly defined (Reviewer 2).
  
  Please see R3 for reviewer #2
  
  References
  
  Abdollahi, R. O., Kolster, H., Glasser, M. F., Robinson, E. C., Coalson, T. S., Dierker, D., Jenkinson, M., Van Essen, D. C., & Orban, G. A. (2014). Correspondences between retinotopic areas and myelin maps in human visual cortex. NeuroImage, 99, 509–524. https://doi.org/10.1016/j.neuroimage.2014.06.042
  
  Benson, N. C., Jamison, K. W., Arcaro, M. J., Vu, A., Glasser, M. F., Coalson, T. S., Van Essen, D. C., Yacoub, E., Ugurbil, K., Winawer, J., & Kay, K. (2018). The HCP 7T Retinotopy Dataset: Description and pRF Analysis. https://doi.org/10.1101/308247
  
  Dinse, J., Härtwich, N., Waehnert, M. D., Tardif, C. L., Schäfer, A., Geyer, S., Preim, B., Turner, R., & Bazin, P.-L. (2015). A cytoarchitecture-driven myelin model reveals area-specific signatures in human primary and secondary areas using ultra-high resolution in-vivo brain MRI. NeuroImage, 114, 71–87. https://doi.org/10.1016/j.neuroimage.2015.04.023
  
  Fischl, B., Rajendran, N., Busa, E., Augustinack, J., Hinds, O., Yeo, B. T. T., Mohlberg, H., Amunts, K., & Zilles, K. (2008). Cortical Folding Patterns and Predicting Cytoarchitecture. Cerebral Cortex, 18(8), 1973–1980. https://doi.org/10.1093/cercor/bhm225
  
  Glasser, M. F., Coalson, T. S., Robinson, E. C., Hacker, C. D., Harwell, J., Yacoub, E., Ugurbil, K., Andersson, J., Beckmann, C. F., Jenkinson, M., Smith, S. M., & Van Essen, D. C. (2016). A multimodal parcellation of human cerebral cortex. Nature, 536(7615), 171–178. https://doi.org/10.1038/nature18933
  
  Kay, K. N., Winawer, J., Mezer, A., & Wandell, B. A. (2013). Compressive spatial summation in human visual cortex. Journal of Neurophysiology, 110(2), 481–494. https://doi.org/10.1152/jn.00105.2013
  
  Mackey, W. E., Winawer, J., & Curtis, C. E. (2017). Visual field map clusters in human frontoparietal cortex. eLife, 6, e22974. https://doi.org/10.7554/eLife.22974
  
  Maingault, S., Pepe, A., Mazoyer, B., Tzourio-Mazoyer, N., & Crivello, F. (2021). Characterization of late structural maturation with a neuroanatomical marker that considers both cortical thickness and intracortical myelination. https://doi.org/10.1101/2021.02.24.432645
  
  Sereno, M. I., Lutti, A., Weiskopf, N., & Dick, F. (2013). Mapping the Human Cortical Surface by Combining Quantitative T1 with Retinotopy†. Cerebral Cortex, 23(9), 2261–2268. https://doi.org/10.1093/cercor/bhs213
  
  Shafee, R., Buckner, R. L., & Fischl, B. (2015). Gray matter myelination of 1555 human brains using partial volume corrected MRI images. NeuroImage, 105, 473–485. https://doi.org/10.1016/j.neuroimage.2014.10.054
  
  Sheremata, S. L., & Silver, M. A. (2015). Hemisphere-Dependent Attentional Modulation of Human Parietal Visual Field Representations. The Journal of Neuroscience, 35(2), 508–517. https://doi.org/10.1523/JNEUROSCI.2378-14.2015
  
  Silson, E. H., Zeidman, P., Knapen, T., & Baker, C. I. (2021). Representation of Contralateral Visual Space in the Human Hippocampus. The Journal of Neuroscience, 41(11), 2382–2392. https://doi.org/10.1523/JNEUROSCI.1990-20.2020
  
  Wagstyl, K., Larocque, S., Cucurull, G., Lepage, C., Cohen, J. P., Bludau, S., Palomero-Gallagher, N., Lewis, L. B., Funck, T., Spitzer, H., Dickscheid, T., Fletcher, P. C., Romero, A., Zilles, K., Amunts, K., Bengio, Y., & Evans, A. C. (2020). BigBrain 3D atlas of cortical layers: Cortical and laminar thickness gradients diverge in sensory and motor cortices. PLOS Biology, 18(4), e3000678. https://doi.org/10.1371/journal.pbio.3000678
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.29.569190v3
www.biorxiv.org www.biorxiv.org

Developmental bias explains the evolutionary trend towards simple leaf shapes

1
1. Public_Reviews 30 Jun 2026
  
  in eLife
  
  Author response:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors aim to understand, in the context of leaf shape, how the constraints imposed by development inform evolution. Leaf shape is a good place to study the influence of development on evolution because it is a trait that exhibits a lot of diversity, and the developmental mechanisms that give rise to leaf shapes are apparently rather conserved across angiosperms.
  
  As part of the motivation for their work, the authors cite a previous study (Geeta et al), which found that in angiosperm phylogenies, transitions from complex to simple leaf shapes occur through evolution more often than transitions in the opposite direction. Is this due to developmental constraints or adaptation?
  
  The authors undertake two parallel lines of work:
  
  (1) Extending the study of Geeta et al with more data, consisting of both phylogenies and a shape classification dataset. The conclusion from this line of inquiry is that transitions from lobed to unlobed leaves are more common than transitions away from unlobed leaves.
  
  (2) The authors conduct evolution simulations in a computational model of leaf development. Here, they look at {\it neutral} mutations and whether simply neutral evolution is sufficient to drive the observed trend.
  
  The conclusion of the second part of the work is that the driver of the evolution toward simple leaf shape is entropy: there are more ways to make unlobed leaves than to make lobed leaves (at least in terms of gene regulation parameters that will produce the two leaf types). The argument is that random gene regulatory networks are more likely to produce unlobed leaves than lobed leaves; therefore, neutral evolution drives this trend.
  
  Data Analysis
  
  Roughly $9000$ images of leaves were classified into 4 categories: unlobed, lobed, dissected, and compound. These labels were applied to the tips of 5 phylogenetic trees of angiosperms (3 resolved at the genus level and 2 at the species level). By fitting a continuous-time Markov chain to the labelled trees, the authors claim that there is a significantly higher rate of transition to the unlobed leaf shape compared to transitions to more complex shapes.
  
  Simulation
  
  First, the authors validate a computational model (Runions et al) for leaf growth on an experimental dataset. By changing parameters in the model, they can recapitulate the morphological changes in the shapes of Arabidopsis leaves engendered by expression of two particular genes.
  
  Then the authors run an evolutionary model (without selection, just random mutations) on top of the computational leaf development model. As the random walk in parameter space reaches a stationary distribution, they look at both the proportions of the leaf categories in the steady state as well as the transition rates between different categories. The result is that transitions to unlobed leaves are more common than from unlobed leaves.
  
  We thank the reviewer for the helpful and clear summary of our work.
  
  General Comments
  
  The authors use angiosperm phylogenies from other works as the basis for the data analysis part of their work. Given the centrality of these phylogenies for their conclusions, more information is needed about how these phylogenies were constructed and what they mean. What is the timescale that they span? What method is used to infer them? What regions of DNA were sequenced in order to build the phylogenies? Also, maybe some more discussion of angiosperm evolution (e.g., when was the most recent common ancestor of all angiosperms?) would help put the study in context.
  
  We also need a more in-depth discussion of the computational model. What are all the $>100$ parameters doing, and what informs the seemingly strange mutational model that changes parameters by 3 orders of magnitude?
  
  I am confused about how the rates of transitions were inferred from the phylogeny. Here, one has a phylogeny inferred by some method (which needs to be described in more detail), and just the leaves are labelled. It is stated in the methods that BayesTraits was used to infer the transition rates. I realize this method is probably documented elsewhere, but a bit of a summary of how it works and how to interpret its results would (1) make the paper more selfcontained and (2) if the algorithm is credible, make the results firmer.
  
  We thank the referee for the suggestion to make the paper more accessible. The tool we use to infer transition rates from the phylogenies, BayesTraits, is standard in the field. However, the referee is right that for an interdisciplinary journal, it may be helpful to more fully flesh out how these methods work. To that end, we have added an additional section "Phylogenetic rate inference" in the supplementary information that includes a longer description of how BayesTraits works, and how we used it to infer transition rates from phylogenies.
  
  All trees are shown in the supplementary information section "Phylogenetic trees" with scale-bars showing the amount of time or genetic change that the trees span. For a broader discussion of angiosperm evolution, there is supplementary information section "The adaptive significance of leaf shape review".
  
  Regarding the more in-depth discussion of the computational model, we have added supplementary information section S1 "Leaf model details" to give a more detailed description of the leaf model.
  
  I am a bit skeptical of the authors' interpretation of the biological trend (of complex to simple leaf shapes) as being driven by neutral evolution. Why does one expect that the mutations generated by the random walk models described in the work are in fact neutral mutations?
  
  A random walk is a well-established way of modelling the dynamics of neutral evolution in the monomorphic regime, where the population has a narrow diversity of different genotypes. In the higher mutation rate polymorphic regime, where the diversity of genotypes in the population is larger, we also expect that a random walk should still recapitulate the correct average transition rates. The purpose of the simulations is not to model every aspect of population genetics, but to ask whether developmental bias alone is sufficient to generate the observed directional asymmetry. By assigning equal fitness to all viable leaves, we isolate the contribution of development from that of selection. The agreement with the phylogenetic transition rates therefore demonstrates sufficiency rather than exclusivity: selection may also contribute, but it is not required to explain the observed bias We discuss the evidence for the role adaptation in leaf shape further in supplementary information section "The adaptive significance of leaf shape review".
  
  If the entropy of simple leaf shapes is higher than that of complex leaf shapes, why did we have complex leaves at all? I suspect the authors might argue that this is due to selection. In that case, what allows these complex shapes to become simpler? Wouldn't they be losing the selective advantage that drove them to be more complex in the first place? Or maybe the idea is that the rates are inferred assuming some steady state that generates the phylogeny? I did not understand this point.
  
  The entropy language is a useful framing. Within that framework, one can view our study as showing that the entropy (defined here as the logarithm of the volume of parameter space mapping to a phenotype) of simple leaf shapes is higher than that of complex leaf shapes. If this entropy were to be ignored, then all states would be equally likely in our simulations, where we do not take fitness differences into account. What we show is that the differences in entropy -- related to differences in volumes of the parameter space that map to different phenotypes -- also affects the rates. The inferred transition rates for both simulation and phylogeny from unlobed to more complex shapes are lower than vice versa but not zero. Therefore, complex leaf shapes arise stochastically through mutation and in this model would eventually reach a steady state proportion, even in the absence of selection.
  
  Are the rates of transitions between leaf types inferred for the phylogeny assuming that the phylogeny is generated by the steady state of some Markov process? (I think the answer is no: in that case, how does one explain the initial condition?)
  
  The tool we use to infer transition rates from phylogenies—BayesTraits—allows the initial state at the root of the tree to vary during the numerical optimisation (Pagel, 1994). Therefore, it is not assumed that the initial state is generated by the steady state of the Markov process.
  
  If I take the mutation model (random walk) seriously, then shouldn't I expect that this steady state obeys detailed balance? In that case I should have $p_i r_{i\to j} = p_j r_{j\to i}$ for each of the occupancies $\{ p_i\}$ and transition rates $r_{i\to j}$ for the shape categories. How close are the rates inferred from the phylogenies to obeying detailed balance? Presumably, the Markov chain fitted to the simulation data obeys detailed balance because the mutation model itself does?
  
  BayesTraits allows off-diagonal transition rates of the rate matrix to vary freely during numerical optimisation (Pagel, 1994). Therefore, there is no requirement for the detailed balance to hold for the inferred rate matrix. For our simulations, the mutations are symmetric at the parameter level, therefore at this level, the process would be expected to obey the detailed balance.
  
  I find it hard to take the discussion of development seriously without some consideration of mechanics. Presumably, the mechanics are hidden in the computational leaf development model, but this model is not discussed in enough detail for the reader to know. It seems to me that the interesting question is: what are the {\it mechanical} constraints on development that drive the apparent trend in evolution towards simpler leaf shapes? Maybe it is something about the type of differential growth needed to make complex leaf shapes less robust to mutation. But in this case, I would assume that selection plays a role in the complexity of shape. In any case, a better understanding (or explanation) of the computational model is needed to make this interpretation.
  
  We thank the referee for the suggestion to make the paper more accessible. We have added a more detailed and pedagogical description of the model from (Runions, Tsiantis and Prusinkiewicz, 2017) in the supplementary information section S1 "Leaf model details". We also note that Fig. 5 in the methods that gives an overview of how the model works, including some mechanical aspects of development and growth.
  
  More generally, mechanics is one component of the developmental map that determines which parameter combinations produce viable leaf morphologies. Our analysis concerns the geometry of this complete developmental map, irrespective of whether its constraints arise from gene regulation, tissue mechanics, or their interaction.
  
  On the interesting question of what is causal, perhaps the example in figure 2 is helpful. We focus on two parameters, a morphogen repression strength, and a duration of growth. A key physical process here is called webbing, where cellular growth fills in the gaps between branching veins. This process flattens the leaf structure and creates a continuous, solid leaf blade (lamina). Strong webbing, characterized by a significant resistance to stretching and bending, results in a smoother margin (Runions, Tsiantis and Prusinkiewicz, 2017). The morphogen repression strength affects the physical parameters that determine how strong the webbing is. The duration of growth determines how long the leaf has to grow. Varying these two parameters varies the physical processes that determine leaf shape. The mechanics of growth operate downstream of these parameters that we vary in our evolutionary simulations according to the details of the leaf developmental model.
  
  Some discussion of timescales is needed, especially when invoking neutral evolutionary arguments. If a neutral mutation occurs, its time to fix in a population of size $N$ is $\sim N$ generations. What are the relevant angiosperm population sizes and the number of mutations that separate branches on the tree? Are timescales remotely consistent with e.g., the age of angiosperms on Earth?
  
  Neutral processes have a well-established role in key aspects of angiosperm evolution, for example genome complexity (Lynch and Conery, 2003). This would suggest that the relevant time scales and generation times are not completely prohibitive of neutral processes also playing a role in the evolution of angiosperm leaf shape. Effective population sizes in plants are highly variable but estimates span 10^3-10^6. Assuming diploidy (and therefore average fixation time of 4Ne) and generation times of 1-10 years, this gives fixation timescales of 10^3-10^7 years. This is within the timescales of the trees we analyse, which span >150 million years.
  
  Reviewer #2 (Public review):
  
  Strengths:
  
  The paper's underlying question is interesting, extending the authors' prior work on RNA along similar conceptual lines. The paper combines both image analysis of leaves and a computational analysis of a simple model of leaf development.
  
  Weaknesses:
  
  The entire paper is based on the Runion model. More intuition about the Runion model would be useful for a broader readership that cares about the evolutionary aspect of this, but may not know the developmental model in question. Obviously, this is prior well-established work, but 2 - 3 sentences highlighting the key structural aspects of such a model would be great. Currently, that intuition is found implicitly in a sentence on page 2 ("complex leaf shapes need more specificity in their GRNs than their simpler unlobed leaf shape"), but the reader is left wondering - is the Runion model a detailed mechanistic one with multiple interacting genes/proteins? If so, how many? Or is it just 2 - 3 genes but with complexity entirely in how long they are each expressed/when they are turned off, etc.
  
  We thank the referee for the suggestion to make the paper more useful for a broader readership. To that end, we have added a more detailed description of the (Runions, Tsiantis and Prusinkiewicz, 2017) model in supplementary information section S1 "Leaf model details".
  
  The Runions model has nearly 100 free parameters. Random walks in 100dimensional spaces have generic properties like a tendency to move toward regions of larger volume that have nothing to do with leaf biology. How do you disentangle the geometry of high-dimensional random walks from genuinely biological developmental bias? Would a toy model with 100 parameters and arbitrary phenotype categories also show "bias toward simplicity" if "simple" phenotypes occupy more volume?
  
  Our argument is largely independent of the number of parameters. While it is true that most of the volume is near the surface in a high-dimensional space, our argument is about the relative volumes of the sets of parameters that map to each of the four phenotypes, an entropic argument if you wish. The basic intuition is that a simple phenotype needs fewer parameters to be fine-tuned, and so a larger volume of parameter space will map to a simpler phenotype.
  
  The question about a toy-model with arbitrary phenotypes is helpful, because it allows us to clarify that what we are illustrating here with the biologically realistic example of leaf shapes is a much more generic principle. We can say with confidence that if the toy-model generates a many to one set of outputs (phenotypes) through an algorithmic process whose description length does not grow faster than logarithmically with the size of the genotype space, then it should produce a bias towards simplicity regardless of the number of dimensions, see for example Johnston et al. (2022) and Dingle, Camargo and Louis (2018) for a longer discussion of this more general point which is based on arguments from algorithmic information theory (AIT). We don’t use that framing in the current paper because the basic intuition for GRNs that more complex phenotypes need more parameters fine-tuned, and so have relatively smaller volumes, is more straightforward to understand that the more abstract AIT arguments. Our general prediction that this principle should hold more widely for GRNs can be made both by the more formal AIT route, or via the more heuristic fine-tuned parameter route.
  
  The discussion of Figure 4 (PCA of parameter space) uses "area" loosely when what's actually being measured is bin count in a 2D projection of a highdimensional space. I would think that, in general, PCA projections can be misleading about volume in the full parameter space, but I can't tell if that's an issue in this case. Some comments/thoughts here would be useful.
  
  The quantitative estimate of phenotype frequencies is computed directly in the full parameter space and does not depend on PCA. Ie. We estimate that the total volume of viable leaves maps to simple unlobed leaves about 80% of the time. However, the volume is extremely high-dimensional, and so hard to visualise. PCA is used solely to provide an interpretable visualization of this otherwise high-dimensional structure. The PCA plots in Fig 4 and Fig S16 are there to be illustrative, not quantitative. Because the volume differences are large, we do not think that the projections of the main PCA components would be misleading on at least the ordering of the sizes of the parameter space components that map to each leaf shape. We provided a similar analysis for other projections -- PC1-PC6 (supplementary information section "PCA occupancy for higher dimensions"), finding the same trend. To make this point clearer, we have now changed the sentence in the Fig. 4 caption slightly “This (reveals that --> illustrates how) unlobed leaves occupy a larger region of model parameter space than more complex shapes and that this larger space also contains the majority of more complex leaves.”
  
  The classifier validation section is in the Methods section, but it seems critical to the whole story. The < 80% agreement with manual classification could propagate to the rest of the estimates in the paper. Again, some comments/thoughts here would be useful.
  
  We have repeated the analysis of the agreement between by-eye and automatic morphometric classification. Generating a confusion matrix for the two classification methods shows that the agreement is high for unlobed, dissected and compound, with the main source of disagreement being leaves that were classified as lobed by-eye being classified as either unlobed or dissected by the automatic-morphometric method. The proportion of by-eye lobed leaves classified by the automatic morphometric method as either unlobed (27%) or dissected (23%) is relatively balanced, which we think will help cancel out some error as well. Moreover, we find that the agreement between the automatic-morphometric method and by-eye classification increases to 90.0% when using the categories unlobed and all other categories grouped into one. This is the most important classification for our finding that development and phylogeny are both biased towards unlobed.
  
  The authors should explain Mut2 and Mut5 in the main paper with a sentence or two, at least schematically, because how you mutate is obviously very relevant to interpreting a paper about biases in variation.
  
  In the results section we have added a sentence for more detail on the random walk.
  
  "[We mutated the initial sample using a random walk algorithm with two different mutational schemes, MUT2 (alg. 1) and MUT5 (alg. S2).] These algorithms work by iterating through model parameters one by one and perturbing the value by a small amount. We then [automatically classified the resulting shapes...]"
  
  Moreover, in methods section C there is already a more detailed description of both algorithms.
  
  “MUT2 (alg. 1) iterates through the parameters in a random order, and attempts to change the parameter by a value selected at random from an array of numbers randomly generated at 3 different orders of magnitude. MUT5 (alg. S2) is the same as MUT2 except the value each parameter is multiplied by 10% of the range of that parameter within the initial leaves (fig. S1). The aim here was to provide some way of accounting for the biologically relevant sampling range. "
  
  Moreover, the MUT2 algorithm is described in pseudocode in Algorithm 1 in the main text, and the pseudocode for MUT5 is in supplementary information section S1 C, as algorithm S2.
  
  The two mutational schemes use additive perturbations to individual parameters. Real mutations presumably affect regulatory networks in more structured ways (e.g., changing binding affinities that affect multiple parameters simultaneously). How sensitive are the results to the assumption of independent single-parameter mutations?
  
  The referee raises an interesting and well-known issue concerning this widely studied class of GRN models. Without a detailed understanding of how individual genetic mutations map onto model parameters, it is difficult to determine with confidence whether a mutation would produce correlated changes in certain sets of parameters. Our main argument, however, is that the primary source of the observed bias is geometric: the volume of parameter space (or equivalently, the entropy) corresponding to simple leaf morphologies is substantially larger than that corresponding to complex morphologies. As long as mutations explore parameter space approximately symmetrically, even if they involve correlated changes in multiple parameters, larger phenotype regions will tend to be encountered more frequently and retained for longer than smaller regions. We therefore expect the observed bias to be robust to many alternative mutation models, although quantifying this robustness is an interesting direction for future work.
  
  The connectedness argument is made using a 2D PCA projection. Is there a way to check this statement in the full parameter space or perhaps in higher dimensional projections to test the robustness of this result? Connected components can merge/split under different projections.
  
  Constructing the nearest neighbour graph for the full dimensional data results in the following no. connected components: unlobed-146, lobed-274, dissected-255, compound-315. This follows the same pattern identified for the PC1-PC2 projection, that unlobed splits into fewer connected components than other leaf shape categories.
  
  References:
  
  Dingle, K., Camargo, C.Q. and Louis, A.A. (2018) ‘Input–output maps are strongly biased towards simple outputs’, Nature Communications, 9(1), p. 761. Available at: https://doi.org/10.1038/s41467-018-03101-6.
  
  Johnston, I.G. et al. (2022) ‘Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution’, Proceedings of the National Academy of Sciences, 119(11), p. e2113883119. Available at: https://doi.org/10.1073/pnas.2113883119.
  
  Lynch, M. and Conery, J.S. (2003) ‘The Origins of Genome Complexity’, Science, 302(5649), pp. 1401–1404. Available at: https://doi.org/10.1126/science.1089370.
  
  Pagel, M. (1994) ‘Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters’, Proceedings of the Royal Society of London. Series B: Biological Sciences, 255(1342), pp. 37–45. Available at: https://doi.org/10.1098/rspb.1994.0006.
  
  Runions, A., Tsiantis, M. and Prusinkiewicz, P. (2017) ‘A common developmental program can produce diverse leaf shapes’, New Phytologist, 216(2), pp. 401–418. Available at: https://doi.org/10.1111/nph.14449.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.17.670617v2
www.biorxiv.org www.biorxiv.org

Abundant Parent-of-origin Effect eQTL: The Framingham Heart Study

1
1. Public_Reviews 30 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer 1 (Public review):
  
  Summary:
  
  This study presents a systematic investigation of parent-of-origin effects on gene expression using trio-based data from the Framingham Heart Study, which is notable for its relatively large number of trios. By combining whole-genome and RNA sequencing data, the authors examined the extent to which gene expression is influenced by whether genetic variants are inherited maternally or paternally.
  
  The authors report that parent-of-origin eQTLs are widespread, identifying 15,893 eQTLs from 14,733 variants and 1,824 genes that were significant in paternal, maternal, or joint tests but not detected by traditional eQTL approaches. They further classified these associations based on the relative strength and direction of paternal and maternal effects, highlighting a subset with opposing directions. The study also highlighted eGenes linked to known imprinted genes as well as those with opposing parent-specific effects, and observed that paternal eGenes are enriched for drug targets. Finally, the work revisits previous findings in which eQTL studies were used to interpret disease-associated loci, emphasizing that conventional eQTL analyses without testing the parent-of-origin may mislead gene prioritization efforts. The study recommends that future downstream analyses, such as Mendelian randomization, take into account the provided lists of SNPs and eGenes and exclude those with strong parent-of-origin effects when linking genetic regulation to disease risk.
  
  Strengths:
  
  The major strength of the study lies in the scale and quality of the dataset, the trio-based design, and the systematic application of statistical tests for parent-of-origin effects. The strengths thoughtfully employed Bayes factors rather than p-values to provide stronger evidence of association, which adds rigor to their analyses. These design choices provide compelling evidence that parent-of-origin effects are widespread and that conventional eQTL analyses miss a substantial fraction of regulatory variation. The results are clearly presented and supported by robust analyses, including the identification of opposing parental effects and the enrichment of paternal eGenes for drug targets. Notably, the two examples demonstrating how these findings can reshape disease gene prioritization highlight the broader impact of the study and encourage further work in the community to incorporate parent-of-origin effects.
  
  Weaknesses:
  
  The main limitations of the study are threefold.
  
  First, there is a lack of replication in independent cohorts, which is understandable given the difficulty of identifying datasets with a comparable number of trios, but replication would help establish the generalizability of the findings.
  
  We fully agree with the reviewer that replication in an independent cohort is a crucial step for establishing generalizability. As the reviewer notes, the Framingham Heart Study, with its 1,477 trios possessing both WGS and RNA-seq data, represents a uniquely powerful and, to our knowledge, currently unmatched resource for this specific type of parent-of-origin eQTL analysis.
  
  In the absence of an external cohort of comparable size and data richness, we have taken several steps to ensure the internal validity and robustness of our findings within the current study, which we will clarify and expand upon in the revised manuscript:
  
  Positive Control Validation: We explicitly used well-established, bona fide imprinted genes (e.g., MEG3, NDN, SNURF, as listed in Table 1 and Figure 1) as positive controls. The fact that our analysis correctly identifies their known parent-of-origin expression patterns (e.g., maternal eQTL for MEG3, paternal eQTL for NDN) serves as a powerful internal validation of our phasing methodology, statistical models, and significance thresholds. This demonstrates that our approach has the power to detect true POE signals.
  
  Conservative Calling Criteria: As the reviewer suggests, we prioritized specificity. Our definition of eQTL sets (Section 4.6) uses stringent thresholds (e.g., log<sub>10</sub> BF > 4 for primary signals and θ = log<sub>10</sub> 2 for exclusivity). We explored different θ parameters (Supplementary Table S2) and chose the one that minimized the inclusion of false positives, ensuring that our core gene sets (e.g., G<sub>1</sub>,G<sub>0</sub>,G<sub>2</sub>) are high-confidence discoveries.
  
  Rigorous Analytical Pipeline: As we note in the revised text, our conclusions are supported by a robust analytical pipeline. This includes trio-based phasing validated by simulation (Supplementary Table S1), the use of linear mixed models to control for relatedness and population structure, and the application of Bayes factors which inherently penalize variants with low minor allele frequencies, thereby reducing spurious associations.
  
  We believe these internal consistency checks and methodological rigor provide strong confidence in our findings. To further facilitate external replication, we will make the full list of POE eQTLs and eGenes available as a comprehensive resource (as noted in the Discussion and Supplementary Materials), enabling other researchers to validate these findings as appropriate datasets become available.
  
  Second, while Bayes factors are thoughtfully used to assess evidence of association, the paper does not fully explore how the chosen thresholds translate to the expected rate of false positives. For example, a minor allele frequency cutoff of 1% was applied, which seems somewhat arbitrary, and without reporting the allele frequency distribution of the identified eQTLs, it is unclear whether rare variants disproportionately contribute to the signals, potentially affecting the reliability of discoveries.
  
  We thank the reviewer for raising this important point regarding the calibration of our significance thresholds and the potential role of rare variants. We address this by clarifying the relationship between Bayes factors, prior odds, and false discovery rates, and by providing a more detailed characterization of the variants we identified.
  
  Bayes Factors and False Discovery: The reviewer is correct that the connection between a Bayes factor threshold and a false positive rate is not direct as it has to take into account of prior odds. As we briefly noted, for a given prior odds of association (e.g., 1 in 100 or 1 in 1000 for a cis-eQTL), a log<sub>10</sub> BF = 4 corresponds to a posterior probability of association (PPA) of 0.99 or 0.90 respectively. Consequently, 1 − PPA can be interpreted as the local false discovery rate (lfdr), as we have now explicitly stated in Section 2.2 (citing Soloff et al., 2024). Our choice of log<sub>10</sub> BF = 4 was therefore chosen to ensure a very low or modest lfdr (depending on the prior odds) for our primary findings.
  
  Minor Allele Frequency Threshold: The 1% MAF cutoff was indeed a pre-analysis filtering step. It was chosen based on the power afforded by our sample size of 1,477 trios. For variants rarer than 1%, our study is underpowered to detect associations, and any signals would be highly unstable. Importantly, the reviewer’s concern about rare variants disproportionately contributing to signals is further mitigated by our use of Bayes factors. As we note in Section 2.2, the prior used in our Bayes factor computation (with σ = 0.5 in the prior for effect sizes, as described in Section 4.4) inherently penalizes variants with small minor allele frequencies. This is because for a given effect size, the evidence for association is weaker for a rare variant than a common one. Thus, the combination of a pre-analysis MAF filter and the Bayesian analysis itself guards against spurious findings driven by very rare alleles.
  
  Allele Frequency Distribution: To directly address the reviewer’s request for transparency, in the revised manuscript we include a supplementary figure (e.g., Supplementary Figure S4) showing the distribution of minor allele frequencies (1000 genomes European descents) for the SNPs identified in paternal eQTL set S<sub>P</sub> and maternal eQTL set S<sub>M</sub>. This empirically demonstrate that our findings are not disproportionately driven by low-frequency variants and provide a more complete picture of the genetic architecture underlying these POE signals. We also add a sentence to the Results section (Section 2.5) summarizing this distribution.
  
  Third, the ancestry background of the study samples is not reported, which could be a confounding factor in the genetic analyses.
  
  We thank the reviewer for highlighting this omission. In the revised manuscript, we explicitly report the ancestry background of the Framingham Heart Study participants analyzed. Consistent with previous reports on this cohort, the vast majority of samples are of European descent.
  
  Crucially, as the reviewer suggests, population stratification can be a confounder in genetic studies. To mitigate this, our analysis employed a linear mixed model (Section 4.4) that includes a random effect with a covariance structure defined by the genetic relatedness matrix (GRM). This approach is specifically designed to control for spurious associations due to both subtle population structure and known relatedness among individuals, ensuring that our findings are robust to these potential confounders.
  
  Reviewer 2 (Public review):
  
  Summary:
  
  The authors have used 1477 sequenced trios with available gene expression data in the offspring to discover eQTLs that act in a parent-of-origin specific manner. The classified associated SNPs are tested for enrichment for GWAS hits, drug target genes, etc.
  
  Strengths:
  
  The manuscript presents an impressive analysis of a very rich data set of parent-of-origin eQTLs. To my knowledge, it is one of the largest studies of its kind, most analyses are sound, and the results are of interest to many in the field and potentially beyond. The different ideas of follow-up analyses are useful and make sense.
  
  Weaknesses:
  
  While in general the analyses are well-conducted, I noticed a major issue with the POE eQTL classification, which puts into question most of the downstream analysis. In light of this problem, most of the analysis would need to be rerun, which represents a major revision of the paper, but is straightforward to repair.
  
  We appreciate the reviewer’s concern and take it seriously. However, we believe the issue stems from a misunderstanding of our classification framework. We clarify our reasoning below, and we are confident that no re-analysis is necessary. In fact, our Bayesian approach was specifically chosen to avoid the very problem the reviewer raises.
  
  The major problem with the classification of POEs is that simply having significant maternal, but insignificant paternal effect is not an indicator of POE, this happens widely for SNPs with no POE whatsoever (it can happen by chance even when both maternal and paternal effects are the same and non-zero - the authors can see it via simulations under the null [maternal=paternal effect]).
  
  The reviewer raises a valid statistical concern: under the null hypothesis of equal maternal and paternal effects (β<sub>0</sub> = β<sub>1</sub>≠ 0), sampling variation could occasionally produce a scenario where one effect appears significant and the other does not. This is indeed a form of Type II error (failing to detect a true non-zero effect for one of the alleles).
  
  However, this is precisely why we chose Bayes factors over p-values. A key advantage of Bayes factors is that they are not blind to power. P-values are calculated solely under the null hypothesis and do not incorporate any information about the alternative hypothesis or the study’s power to detect it. Consequently, when power is low (e.g., due to minor allele frequency differences between paternal and maternal alleles), p-values can be misleading.
  
  In contrast, Bayes factors are computed under both the null and alternative hypotheses. They inherently incorporate power through the prior specification. As we note in Section 2.2, “Bayes factors penalize genetic variants with small allele frequencies to reduce false positives.” This means that a SNP where, by chance, one allele appears significant and the other does not—but where power is low due to allele frequency imbalance—will not receive a high Bayes factor, because the evidence is appropriately discounted.
  
  In order to be able to talk about POE, first, a significant difference between maternal and paternal effects needs to be claimed. Therefore, none of the 4 sets of POE eQTLs are justified. To me, the only relevant criterion to pick POE SNPs is the P-value when comparing the maternal and paternal effects.
  
  We respectfully disagree with the reviewer’s assertion that our approach to POE eQTL classification are not justified. There are multiple biologically meaningful patterns of parent-of-origin effects, and our classification scheme was designed to capture this diversity:
  
  (1) Paternal-specific eQTL (β<sub>0</sub> = 0, β<sub>1</sub> ≠ 0)
  
  (2) Maternal-specific eQTL (β<sub>0</sub> ≠ 0, β<sub>1</sub> = 0)
  
  (3) Opposing eQTL (β<sub>0</sub> ≠ 0, β<sub>1</sub> ≠ 0,β<sub>0</sub> × β<sub>1</sub> < 0)
  
  (4) Genotype eQTL (β<sub>0</sub>= β<sub>1</sub> ≠ 0)
  
  The reviewer’s proposed test (H<sub>0</sub>: β<sub>0</sub> = β<sub>1</sub>) collapses these distinct biological scenarios into a single binary outcome. For example: A purely paternal-specific eQTL (β<sub>0</sub> = 0, β<sub>1</sub> ≠ 0) would indeed show a significant difference, and would be captured by the reviewer’s test. However, a gene like ZNF890P in Table 1, where both effects are significant and in the same direction but of different magnitudes, would also show a significant difference. In the reviewer’s framework, this would be classified as a POE eQTL, yet biologically it behaves more like a genotype eQTL with an allelic imbalance. Our framework correctly separates these cases.
  
  Moreover, the reviewer’s proposed test is a nested special case of our broader approach. As we note in our response, our paternal-specific test (H<sup>0</sup>: β<sub>0</sub> = β<sub>1</sub> = 0 vs H<sub>1</sub>: β<sub>0</sub> = 0,β<sub>1</sub> ≠ 0) is a more constrained hypothesis that yields a subset of the SNPs that would be identified by the reviewer’s difference test, were it to have sufficient power. Our approach is therefore more conservative for classifying paternal- or maternal-specific eQTLs, not less.
  
  The definitions of the 4 groups are based on somewhat ad hoc priors, BF thresholds, etc. Also, in Section 4.6, the value of theta is arbitrarily chosen (along with the threshold of 4 to declare POE). In my opinion, the clean treatment of the 4 groups would start with a significant P-value (beta-maternal vs beta-paternal). Within this set, you can then use the original criteria presented in the paper, but only among these associations where there is solid evidence of different parental effects.
  
  We take strong issue with the characterization of our prior specifications and thresholds as “ad hoc” or “arbitrary.” In Bayesian analysis, prior specification is a principled and transparent modeling choice, not an arbitrary one.
  
  (1) Choice of log<sub>10</sub> BF = 4 threshold: As stated in Section 2.2, this threshold was chosen based on explicit considerations of prior odds and posterior probability of association. For a prior odds of 1:1000 (a reasonable guess for cis-eQTLs), this BF corresponds to a posterior probability of association of 0.91. If one prefers a more optimistic prior odds of 1:100, the PPA becomes 0.99. The threshold is therefore grounded in decision theory, not whim.
  
  (2) Choice of θ in Section 4.6: We explicitly state that we explored multiple values of θ(0, log<sub>10</sub> 2, log<sub>10</sub> 3) and chose θ = log<sub>10</sub> 2 because it “produced minimum G<sub>1</sub> and G<sub>0</sub> that contain known imprinted genes.” This is a principled, data-driven calibration step using positive controls, not an arbitrary selection. The transparency of this process is a strength, not a weakness.
  
  (3) Comparison to p-value thresholds: The reviewer suggests that p-value thresholds are somehow less arbitrary. However, the conventional p-value threshold of 0.05 is itself a historical convention with no universal justification. Moreover, as we note, p-values do not account for power differences across SNPs. A p-value of 5 × 10<sup>−8</sup> from a SNP with 40% MAF is not comparable to the same p-value from a SNP with 1% MAF, because the power to detect the association differs dramatically. Bayes factors automatically adjust for this through the prior, making them more comparable across variants, not less.
  
  In revision, we added a section in supplementary to review relationships between p-values, Bayes factors, and FDR.
  
  Recommendations for the authors:
  
  Reviewer 1 (Recommendations for the authors):
  
  Here are some suggestions to improve the study:
  
  (1) Provide information about the ancestry background of participants and consider including ancestry principal components in the eQTL models, as is commonly done, to account for population structure.
  
  We thank the reviewer for this suggestion. In the revised manuscript, we explicitly state that the participants in the Framingham Heart Study are predominantly of European descent, consistent with previous publications from this cohort. Regarding population structure, we respectfully note that our analysis already employs a linear mixed model (Section 4.4) that includes a random effect with a covariance structure defined by the genetic relatedness matrix (GRM). This approach is widely regarded as more robust than including a limited number of principal components, as it accounts for both fine-scale population stratification and known relatedness simultaneously.
  
  (2) Conduct sensitivity analyses using different Bayes factor cutoffs to assess the robustness of the findings.
  
  We appreciate the reviewer’s concern about threshold robustness. In fact, we already conducted a form of sensitivity analysis during the classification step. As described in Section 4.6 and shown in Supplementary Table S2, we explored multiple values of θ (0, log<sub>10</sub> 2, and log<sub>10</sub> 3) and observed how they affected the composition of our gene sets. The choice of log<sub>10</sub> BF = 4 for significance was similarly grounded in posterior probability calculations (Section 2.2). To further address the reviewer’s point, we add a Supplementary Table S3 for counts of eQTL and eGenes under different Bayes factor threshold. This demonstrates that our most significant claim, the abundance of POE eQTL, are not overly sensitive to the specific cutoff.
  
  (3) In the GWAS examples for KCNQ1 and CDKN1C, the assessment of whether the SNPs act as eQTLs for the two genes is based on a single BF threshold, which may be influenced by differences in gene expression levels. The authors could compare the corresponding effect sizes of these SNPs on both genes to provide a more nuanced investigation. While the limitation of missing data from other tissues is discussed in the paper, it remains possible that KCNQ1 plays a role in tissues more relevant to T2D.
  
  This is an excellent suggestion for a more nuanced investigation. We re-examined the effect sizes for the SNP rs2237892 in our published results. For gene CDKN1C, the paternal log<sub>10</sub> BF<sub>1</sub> = −0.477 and maternal log<sub>10</sub> BF<sub>0</sub> = 4.94, the normalized maternal effect in joint analysis is −4.86 vs −0.74 for paternal. Unfortunately, the published results has no eQTL for KCNQ1, which according to our selection creteria means maximum log<sub>10</sub> BF < 3 for all tests (genotype, paternal , maternal, joint). The concern for different gene expression level may affect BF is valid. We preempt this pitfall by quantile normalization of gene expression levels after controlling for GC content (as documented in Method Section). We agree with the reviewer that the lack of data from pancreatic tissues is a limitation. We add a sentence in revelant section to acknowledging that while whole blood is a valuable and accessible tissue, replication in T2D-relevant tissues (e.g., pancreas, adipose) would be an important future direction, and our findings provide a hypothesis for such targeted investigations.
  
  Reviewer 2 (Recommendations for the authors):
  
  Major comments:
  
  There are some literature elements missing:
  
  (1) Hofmeister has a newer and larger study [https://pubmed.ncbi.nlm.nih.gov/40770099/].Please cite that too; it also has POE pQTLs, which is relevant.
  
  (2) POE in pigs has been explored [https://www.nature.com/articles/s41467-02562243-6], please cite it.
  
  (3) An insightful review covering the mechanisms of POE for gene expression (https://www.sciencedirect.com/science/article/pii/S2352154618300482) should be cited.
  
  (4) Further studies on POE in gene expression in social insects (https://royalsocietypublishing.org and in mice (https://www.biorxiv.org/content/10.1101/2023.08.24.554674v1.full) are also relevant.
  
  We thank the reviewer for bringing these important references to our attention. We incorporated the suggested citations in the revision to provide a more comprehensive context for our work, including the newer POE pQTL study by Hofmeister et al., the findings in pigs, and the mechanistic review.
  
  While it’s OK to report and rank SNPs by BF, it is necessary to show association P-values as well. It is not explained in the text around the Table how the P-value is obtained in the Table. And it is important to show how their priors translate to FWER control. What is the FWER when picking SNPs at a certain BF value? 1-PPA and local FDR depend on the choice of the prior, but we need a prior-independent measure of FDR/FWER.
  
  We appreciate the opportunity to clarify. The p-value presented in Table 1 (column “P”) is indeed the frequentist p-value testing the null hypothesis of equal maternal and paternal effects (H<sub>0</sub> : β<sub>0</sub> = β<sub>1</sub>), as described in Section 4.5. We included this to provide a familiar metric for readers, but our discovery framework relies on Bayes factors for the reasons outlined in Section 2.2.
  
  Regarding error control, the reviewer is correct that 1-PPA is a local FDR that depends on the prior. We chose to control the local rate of false discoveries rather than the Family-Wise Error Rate (FWER) because FWER control (e.g., via Bonferroni) is often excessively conservative for exploratory analyses like eQTL mapping, especially given the correlation among tests due to LD.
  
  Our Bayesian approach provides a more nuanced measure of evidence at the level of each individual test, which is precisely what is needed for prioritizing SNPs with parent-of-origin effects.
  
  The demand for a prior-independent measure of FDR is conceptually problematic. Any probabilistic statement about a specific hypothesis being true or false necessarily requires a prior—this is a fundamental consequence of probability theory. Frequentist FDR, while prior-independent in one sense, does not provide a probability that a particular finding is false; it is a long-run error rate over many tests. Methods like q-values, often described as “prior-free,” still depend on implicit assumptions (e.g., the estimate of π<sub>0</sub>, independence of tests, and a mixture of effect sizes).
  
  In our specific context of cis-eQTL analysis, these assumptions are particularly questionable. LD induces correlation among nearby SNPs, violating the independence required for stable π<sub>0</sub> estimation. Moreover, effect sizes in a region are not randomly mixed—SNPs in high LD tend to have similar effect directions and magnitudes, which can bias the mixture model underlying q-value approaches. Our Bayesian approach, by modeling each SNP individually, avoids these cross-SNP assumptions.
  
  Importantly, while posterior probabilities depend on the choice of prior (π<sub>0</sub>), we have verified that our conclusions are robust across a wide range of plausible π<sub>0</sub> values (0.9,0.99,0.999). Given our extremely stringent Bayes factor threshold (BF<sub>j</sub> > 10<sup>4</sup>), the posterior probability for a maternal effect exceeds 0.90 for any π<sub>0</sub> < 0.999. Thus, the prior dependence is practically irrelevant for the SNPs we report.
  
  In revision, we added a section in Supplementary to describe the connections between p-value, Bayes factor, and FDR. We hope this will clarify that a (seemingly) prior independent FDR has a hidden assumption that cis-eQTL analysis is likely to violate.
  
  The major problem with the classification of POEs is that simply having significant maternal, but insignificant paternal effect is not an indicator of POE, this happens widely for SNPs with no POE whatsoever (it can happen by chance even when both maternal and paternal effects are the same and non-zero - the authors can see it via simulations under the null [maternal=paternal effect]). In order to be able to talk about POE, first, a significant difference between maternal and paternal effects needs to be claimed. Therefore, none of the 4 sets of POE eQTLs are justified. To me, the only relevant criterion to pick POE SNPs is the P-value when comparing the maternal and paternal effects. The definitions of the 4 groups are based on somewhat ad hoc priors, BF thresholds, etc. Also, in Section 4.6, the value of theta is arbitrarily chosen (along with the threshold of 4 to declare POE). In my opinion, the clean treatment of the 4 groups would start with a significant P-value (beta-maternal vs beta-paternal). Within this set, you can then use the original criteria presented in the paper, but only among these associations where there is solid evidence of different parental effects.
  
  We respectfully disagree with the reviewer’s assertion that a significant difference between maternal and paternal effects is the only valid criterion for defining POE, and we maintain that our classification is statistically sound and biologically meaningful.
  
  The Problem with the “Difference-Only” Approach: The reviewer’s proposed filter (a significant p-value for β<sub>0</sub> ≠ β<sub>1</sub>) is a single hypothesis test. Our goal was to classify eQTLs into multiple, distinct biological categories (paternal-specific, maternal-specific, opposing, etc.). The “difference-only” test collapses these categories. For example, a purely paternal-specific eQTL (β<sub>0</sub> = 0,β<sub>1</sub> ≠ 0) and a gene like ZNF890P (β<sub>0</sub> ≠ 0, β<sub>1</sub> ≠ 0, β<sub>0</sub> > β<sub>1</sub>) would both show a significant difference. In the reviewer’s framework, they would be lumped together, obscuring the fact that one is an imprinted gene and the other is a standard eQTL with allelic imbalance. Our framework correctly separates them.
  
  Bayes Factors are Not “Ad Hoc”: The choice of prior (σ = 0.5) follows established literature for linear model Bayes factors (Servin and Stephens, 2007). The threshold of log<sub>10</sub> BF = 4 was chosen based on its relationship to posterior probability (0.91-0.99 given reasonable prior odds), which is a transparent and principled decision rule. The selection of θ in Section 4.6 was calibrated using a positive control set of known imprinted genes, ensuring our definitions were conservative and accurate. This is the opposite of arbitrary.
  
  The Suggested Procedure Has Low Power: One can run the following simple R code to verify. We simulate maternal alleles xx and maternal alleles yy, then simulate phenotype with β<sub>xx</sub> > 0 and β<sub>yy</sub> = 0 (maternal effect only). We fit the joint model and compute p-values for the null β<sub>xx</sub> = β<sub>yy</sub> as suggested by reviewer. From the joint fit, we also extract p-values based on the null β<sub>xx</sub> = 0 and β<sub>yy</sub> = 0 respectively. The simulation was repeated 1000 times and p-values were stored in a matrix.
  
  We call positives based on suggested procedure, and compare number of positives called using marginal p-values at two threshold of 1×10<sup>−5</sup> and 1×10<sup>−6</sup> to declare significance. We used threshold of 0.01 to declare insignificance.
  
  The result demonstrates that the suggested procedure has a much lower power compared to the procedure based on marginal statistics.
  
  For the above reasons, the follow-up enrichment analysis is somewhat questionable. Most enrichments are non-significant, and it is likely because the SP and SM groups are diluted with SG SNPs. The P1-P9 groups have nothing to do with POE, and although the observation of increased enrichment for GWAS SNPs with increased pleiotropy is interesting, it is irrelevant for POE.
  
  We will address the dilution concern below. We agree that P1-P9 groups are not directly related to POE. But this is an interesting observation non-theless. As we found such an observation is missing in the literature, we ask to keep it in the paper.
  
  In the same way, section 2.7 is not supported; the claimed maternal and paternal POEs are heavily diluted by simple marginal associations. The same holds for sections 2.82.10. A striking example is Table 3: for clinical trial targets, paternal/maternal eQTLs behave just like simple marginal eQTLs (G<sub>G</sub>). A similar pattern emerges for combined target enrichment.
  
  The reviewer’s concern that our S<sub>P</sub> and S<sub>M</sub> sets are “diluted with S<sub>G</sub> SNPs” is precisely the issue our Bayes factor thresholds were designed to prevent. By requiring one effect to be significant and the other to be below a low threshold (θ), we explicitly excluded SNPs where both effects are significant and in the same direction (which defines S<sub>G</sub>).
  
  Regarding Table 3, the reviewer’s interpretation differs from ours. The fact that paternal eQTLs (G</sub>P</sub>) show significant enrichment for drug targets, while genotype eQTLs (G<sub>G</sub>) also show enrichment, does not imply dilution. Rather, it suggests there is an overlap in the biological importance of these gene sets, which is expected. The key message of the finding is the asymmetry: G<sub>P</sub> is significantly more enriched than G<sub>G</sub> (p=0.035 for combined targets), a pattern that would be washed out if G<sub>P</sub> were merely a diluted version of G<sub>G</sub>. This asymmetry supports the interesting biological hypothesis (Moore and Haig, 1991) we discuss. The non-significance for G<sub>M</sub> further highlights this asymmetry.
  
  I’m not sure how MR would be biased by POE: MR is conducted only if there is a marginal association, i.e., the average maternal and paternal effects are significant. If the expression is causal for a trait, the POE effect is propagated to the outcome; hence, the SNP effect on the exposure will be equally biased as the SNP effect on the outcome, and these cancel out, and the causal effect remains unbiased. Can the authors propose a concrete example of maternal/paternal effects that demonstrates their claimed bias?
  
  We thank the reviewer for this insightful question, which allows us to clarify our point with a concrete example from our data.
  
  Consider a scenario where one wishes to use Mendelian Randomization (MR) to test whether the expression of gene NECAB3 causally influences a particular trait (e.g., obesity). The reviewer is correct that if the causal effect is homogeneous, the average effect might still be captured. However, the bias we caution against arises in stratified analyses or in the interpretation of the genetic instrument itself.
  
  Take the SNP rs4911348 and its effect on NECAB3 (Figure 2). The genotype model shows no marginal association. Therefore, if a researcher were conducting a standard MR study using this SNP as an instrument for NECAB3 expression, they would discard it as an invalid instrument due to the lack of a marginal association. They would miss the true underlying biology entirely. The causal effect of NECAB3 on the trait would be masked in the full population.
  
  More subtly, even if a SNP has a marginal association, using it as an instrument while ignoring POE can lead to incorrect effect estimates in population subgroups defined by parent of origin. This is analogous to ignoring effect modification. For instance, if a treatment (exposure) has a different effect depending on which parent it came from (which is impossible, but the genetic propensity for the exposure does), failing to account for this can bias the instrumental variable estimate if the instrument’s strength varies by an unmeasured factor (parental origin).
  
  Our advice to “check the list of POE SNPs” is a practical caution: if the instrument for an exposure exhibits strong POE, the standard MR assumptions about the homogeneity of the instrument’s effect may be violated, potentially leading to biased estimates or incorrect conclusions about causality.
  
  Minor comments:
  
  (1) In Table 1, the last column header should be -log10(P), not ”P”.
  
  The column labelling is an editorial choice to prevent table overflow. This particularly labelling was explained in the caption.
  
  (2) While BFg/0/1/j are explained in the text, these notations should be explained in the Table caption as well.
  
  Added explanation in caption.
  
  (3) It should also be mentioned in the Table 1 caption how these top 10 SNPs were chosen.
  
  These are sentinel eQTL for each gene. We think the first paragraph of Section 2.3 explains clearly.
  
  (4) “may ”acquires” a cis-eQTL through” → ”may ”acquire” a cis-eQTL through”.
  
  Corrected. Thank you.
  
  (5) “which retained 16, 969 genes out of total 58103”, I assume the 58103 are transcripts, not genes.
  
  You are absolutely correct. We added transcripts after 58103.
  
  (6) In Equation (1), Z is not defined. In this concrete setting, isn’t it simply the identity matrix?
  
  Yes. Z is the identitity (loading) matrix for human study. We added a sentence to clarify in revision.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.05.597677v4
www.biorxiv.org www.biorxiv.org

The zoo of the gene networks capable of pattern formation by extracellular signaling

1
1. Public_Reviews 30 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (on non-trivial pattern transformations):
  
  (3) All modelling is confined to one spatial dimension, and the very definition of a "non-trivial" transformation is framed in terms of peak positions along a line, which clearly must be reformulated for higher dimensions. It's well-known that diffusions in 1, 2, and 3 dimensions are also dramatically different, so the relevance of the three-class taxonomy to real multicellular tissues remains unclear, or at least should be explained in more detail.
  
  Reviewer #2 (on non-trivial pattern transformations):
  
  (5) The definition of non-trivial pattern formation is provided only in the Supplementary Information, despite its central importance for interpreting the main results. It would significantly improve clarity if this definition were included and explained in the main text. Additionally, it remains unclear how the definition is consistently applied across the different initial conditions. In particular, the authors should clarify how slopebased measures are determined for both the random noise and sharp peak/step function initial states. Furthermore, the authors do not specify how the sign function is evaluated at zero. If the standard mathematical definition sgn(0)=0 is used, then even a simple widening of a peak could fulfill the criterion for non-trivial pattern transformation.
  
  There was indeed a problem on how we defined non-trivial pattern transformations in the original version. This definition was not clear enough beyond 1D. We now provide a simple clear definition in the main text that applies to all dimensions (“P1” and “P2” in the second page of the introduction).
  
  As we now explain through the main text, even if the solution of the heat/diffusion equation depends on the dimension of the system, our classification of gene networks (and the mathematical analyses we use) does not depend on the dimensionality of the system. However, some aspects of the specific pattern transformations possible from these networks depend on the dimensionality of the system. In the current version of the article, every time we explain something about the resulting patterns in 1D, we also explain it for the resulting patterns in 2D and 3D. We also have added figures for the 2D cases (in current Fig.1 and Fig.9). We now explicitly explain how the possible resulting patterns in space can depend on the boundaries and shapes of the system (i.e. the distribution of cells in space) (see specially the 5th paragraph of the discussion).
  
  The criticisms about “slope-based measures” mentioned by reviewer 2, is now addressed in a paragraph at the end of the introduction (here we added it):
  
  “It is worth noting that these three basic initial patterns correspond to spatially discontinuous functions: in homogeneous with noise initial patterns, white noise is discontinuous by definition; in spike and combined spike-homogeneous initial patterns, there is a concentration discontinuity between cells on the edge of the spike and nearby cells outside the spike. However, once extracellular signal diffusion begins, these sharp boundaries are smoothed into differentiable gradients, where critical points can be properly defined (e.g., at the center of the initial spike).”
  
  The main concern among these relates to the validity of our linearization of the model equations and the extension of the results obtained for the linear system to the fully nonlinear system. In this regard, the reviewers’ comments are:
  
  Reviewer #1 (on linearization):
  
  (2) A central step in the model formulation is the linearisation of the reaction term around a homogeneous steady state; higher-order kinetics, including ubiquitous bimolecular sinks such as A + B → AB, are simply collapsed into the Jacobian without any stated amplitude bound on the perturbations. Because the manuscript never analyses how far this assumption can be relaxed, the robustness of the three-class taxonomy under realistic nonlinear reactions or large spike amplitudes remains uncertain.
  
  Reviewer #2 (on linearization):
  
  (2) Most of the proofs presented in the Supplementary Information rely on linearized versions of the governing equations, and it remains unclear how these results extend to the fully nonlinear system. We are concerned that the generality of the conclusions drawn from the linear analysis may be overstated in the main text. For example, in Section S3, the authors introduce the concept of dynamic equivalence of transitive chains (Proposition S3.1) and intracellular transitive M-branching (Proposition S3.2), which pertains to the system's steady-state behavior. However, the proof is based solely on the linearized equations, without additional justification for why the result should hold in the presence of nonlinearities. Moreover, the linearized system is used to analyze the response to a "spike initial pattern of arbitrary height C" (SI Chapter S5.1), yet it is not clear how conclusions derived from the linear regime can be valid for large perturbations, where nonlinear effects are expected to play a significant role. We encourage the authors to clarify the assumptions under which the linearized analysis remains valid and to discuss the potential limitations of applying these results to the nonlinear regime.
  
  We used three linearizations in the original version of the manuscript. One was to analyze hierarchic networks (in the Hierarchic networks section). In the new version of the article we do not use any linearization to study the hierarchic networks, so this problem is solved.
  
  The second linearization was in section S3 on transitive chains. We realized that this section is not really necessary at all for the article so we deleted it.
  
  We keep the third linearization but we now explain why such linearization is useful and valid in a section called “Linear stability analysis”. Thus, through this section we justify this choice (explicitly in its two first paragraphs).
  
  Regarding Reviewer 2 concerns about large perturbations, we acknowledge that the phrasing using “arbitrary height” may have been confusing. As we now explain in the linear stability analysis section, linear stability analysis assumes perturbations to be small.
  
  For the homogeneous-with-noise initial pattern, as we explain, these perturbations are assumed to be small because they are actually molecular noise.
  
  For the spike initial pattern and hierarchic networks the perturbation is not necessarily small. However, by the definition of the spike and combined homogeneous-spike initial patterns, all cells outside the spike start with the same concentration of the extracellular signals that are secreted from the spike (e.g. zero). Thus, even in the case in which extracellular signals concentrations in the spike would be unrealistically high, the amount of extracellular signal diffusing from it can be considered small by simply considering it at a small enough time interval. Thus, right outside the spike the diffusion of extracellular signals from the spike can be treated as a continuous small perturbation for which one can study the stability, as we do in the “Linear stability analysis section”. This we now explain at the end of the introduction and in the “Linear stability analysis” section when we talk about the initial patterns again.
  
  In the following, we respond to the remaining concerns raised by the reviewers:
  
  Reviewer #1 (Public review):
  
  (1) The Results section is difficult to follow. Key logical steps and network configurations are described shortly in prose, which constantly require the reader to address either SI or other parts of the text (see numerous links on the requirements R1-R5 listed at the beginning of the paper) to gain minimal understanding. As a result, a scientifically literate but non-specialist reader may struggle to grasp the argument with a reasonable time invested.
  
  We acknowledge that the original version of the main text may not be as clear as we intended. Initially, we believed that placing the more technical mathematical passages in the Supplementary Information would make the main text more accessible to readers. We were wrong. We have now moved crucial parts of the supplementary to the main text and adapted the rest of the text accordingly. The most important of those is the new “Linear stability analysis” section and the associated dispersion relation (e.g. Fig.6).
  
  Reviewer #2 (Public review):
  
  (1) We have serious concerns regarding the validity of the simulation results presented in the manuscript. Rather than simulating the full nonlinear system described by Equation (1), the authors base their results on a truncated expansion (Equation S.8.2) that captures only the time evolution of small deviations around a spatially homogeneous steady state. However, it remains unclear how this reduced system is derived from the full equations -specifically, which terms are retained or neglected and why- and how the expansion of the nonlinear function can be steady-state independent, as claimed. Additionally, in simulations involving the spike plus homogeneous initial condition, it is not evident -or, where equations are provided, it is not correct- that the assumed global homogeneous background actually corresponds to a steady state of the full dynamics. We elaborate on these concerns in the following:
  
  We are actually simulating the full nonlinear system described by Equation (1). In the current version we are more explicit about this. As we describe in the introduction and, now, through all the text several times (e.g. in the last paragraph of the model section and in the paragraph before the linear stability section), the aim of the article is to describe necessary requirements for non-trivial pattern transformations. We did not intent to describe all necessary requirements nor sufficient requirements. These requirements are at the level of gene network topology not at the level of f or its parameters. In other words, we just claim that gene networks having specific topological features can lead to some specific types of non-trivial pattern transformations but not to others. We do not say for which specific fs (or its parameters) these pattern transformations are possible, we just say that this can happen for some f, as long as these fulfill our requirements. We do show, however, that without some specific topological requirements there are non-trivial pattern transformations that are not possible, no matter the f (this explicitly stated in the last paragraph of the model section and in the paragraph before the linear stability section). Thus, all the simulations shown in the figures are just examples, with specific fs, of the types of non-trivial pattern transformations possible from each type of gene network topology.
  
  In all simulations we used the f of the Maini-Miura model. We could have chosen other ones but we happen to chose that f. The presentation of the Maini-Miura model has been revised to improve clarity (equation S6.1 in SI). This model we are simulating fully, we are not doing any linearization for the simulations. That may not have been explained clearly enough in the previous version of the article. We just happen to make a change of variable that may have been confused as a linearization. In the current version, the existence of a homogeneous steady state is parameterized by a tunable g<sup>*</sup>, that can be chosen as for spike initial patterns or g for noise-homogeneous and spike-homogeneous initial patterns. We have also included a proof that the model equations satisfy our conditions R1-5. Indeed, the model is non-linear as long as σ<sub>i</sub>≠0 for some gene product (as we explicitly assume).
  
  It is assumed that the homogeneous steady states are given by g_i=0 and g_i=c_i, where 1/c_i = \mu_i or \hat{\mu}_i, independently of the specific network structure. However, the basis for this assumption is unclear, especially since some of the functions do not satisfy this condition -for example, f5 as defined below Eq. S8.10.5. Moreover, if g_i=c_i does not correspond to a true steady state, then the time evolution of deviations from this state is not correctly described by Eq. S8.2, as the zeroth-order terms do not vanish in that case.
  
  In the revised manuscript, homogeneous steady states are parameterized by a tunable g<sup>*</sup>, which can be chosen as for spike initial patterns or g for noise-homogeneous and spike-homogeneous initial pattern. Function f(g) in (S6.1), as well as the specific non-linear entries used in certain simulations, are constructed such that g<sup>*</sup> is indeed a steady state of the system and that conditions R1-R5 are satisfied. We have also corrected some typos in section S6 (previously section S8) of the Supplementary Information, that we believe may have induced the confusion indicated by this reviewer.
  
  Additionally, the equations used contain only linear terms and a cubic degradation term for each species g_i, while neglecting all quadratic terms and cubic terms involving cross-species interactions (i≠j). An explanation for this selective truncation is not provided, and without knowledge of the full equation (f), it is impossible to assess whether this expansion is mathematically justified. If, as suggested in the Supplementary Information, the linear and cubic terms are derived from f, then at the very least, the Jacobian matrix should depend on the background steady-state concentration. However, the equations for the small deviation around a steady state (including the Jacobian matrix) used in the simulations appear to be independent of the particular steady state concentration.
  
  As described above we just chose an example f to exemplify the non-trivial pattern transformations possible from each class of gene network topologies. There is no special reason to include, or exclude for that matter, cubic cross-species interactions since the point is just to exemplify the types of possible pattern transformations from each type of gene network topology.
  
  In addition, we believe that part of the reviewer’s concern may have arisen from a notational ambiguity in the previous version of the manuscript, which has now been corrected: the matrix appearing in f(g) has been renamed from J to W<sup>T</sup>. As stated in the main text, the jacobian of the regulation function f(g) evaluated at the homogeneous steady state must coincide with the transpose of the network weight matrix. With the current equations (S6.1), we have , from which we easily get . Also, it is clear that the Jacobian of f(g) is not independent of g.
  
  This is why we believe that the differences observed between the spike-only initial condition and the spike superimposed on a homogeneous background are not due to the initial conditions themselves, but rather result from a modified reaction scheme introduced through a questionable cutoff.
  
  "In simulations with spike initial patterns, the reference value g≡0 represents an actual concentration of 0 and therefore, we must add to (S8.2) a Heaviside function Φ acting of f (i.e., Φ(f(g))=f(g) if f(g)>0 , Φ(f(g))=0 if f(g){less than or equal to}0) to prevent the existence of negative concentrations for any gene product (i.e., g_i<0 for some i)." (SI chapter S8).
  
  This cutoff alters the dynamics (no inhibition) and introduces a different reaction scheme between the two simulations. The need for this correction may itself reflect either a problem in the original equations (which should fulfill the necessary conditions and prevent negative concentrations (R4 in main text)) or the inappropriateness of using an expanded approximation which assumes independence on the steady state concentration. It is already questionable if the linearized equations with a cubic degradation term are valid for the spike initial conditions (with different background concentration values), as the amplitude of this perturbation seems rather large.
  
  The Heaviside function does not preclude inhibition, it precludes gene product concentration to be negative. In the current version of the article we do not use the Heaviside function but another similar, but continuous, function. Having this function can indeed affect the dynamics but: 1) does not violate our requirements on f 2) Does not affect which non-trivial pattern transformations are possible from which gene network topology. Without this function non-trivial pattern transformations are still possible from the spike initial pattern through hierarchical networks, in the way we describe in the article. The Heaviside function (and the one we now use) simply allows that to happen more easily, i.e. for a larger range of parameter values. With this function large inhibitions do not lead to negative gene products concentrations while without it, this can happen for some parameter combinations. None of the arguments nor proves in our article requires the Heaviside, or any similar function. Again this is simply because our aim is to identify topological requirements that are necessary, but not sufficient, for non-trivial pattern transformation. So an f that leads to negative gene products concentrations for some parameter combinations but to non-trivial pattern transformations for others, is still valid example of our points (although not the most interesting or realistic example f).
  
  We distinguish between the spike and combined spike-homogeneous initial patterns simply because they are biologically quite different, i.e. in the former the gene product in the spike is only expressed in the spike and nowhere else. As we describe in the current version the pattern transformations possible from these two different initial patterns are very similar. In the same way, which gene network topologies can lead to which types of non-trivial pattern transformations is not affected by using the Heaviside functions or not (although this can affect the range of parameter values in which this happens).
  
  Lastly, we note that under the current simulation scheme, it is not possible to meaningfully assess criteria RH2a and RH2b, as they rely on nonlinear interactions that are absent from the implemented dynamics.
  
  The implementation of nonlinear entries in f(g) whenever they are needed is now made explicit in the corresponding subsection in the main text and in section S6 in the Supplementary Information. This entries also satisfy conditions R1-R5 around the steady state given by g<sup>*</sup>. Again we should insist that the simulated fs are nonlinear (as now explicitly explained in the SI).
  
  (3) Several statements in the main text are presented without accompanying proof or sufficient explanation, which makes it difficult to assess their validity. In some cases, the lack of justification raises serious doubts about whether the claims are generally true. Examples are:
  
  "For the purpose of clarity we will explain our results as if these cells have a simple arrangement in space (e.g., a 1D line or a 2D square lattice) but, as we will discuss, our results shall apply with the same logic to any distribution of cells in space." (Main text l.145-l.148).
  
  The result of which gene network topologies can lead to pattern transformations are based on a linear stability analysis and some logical arguments. As we now explain through the text none of them depends on the number of dimensions nor on the shape of the arrangement of cells. The geometry of the domain can influence the specific form of the resulting patterns, but it does not alter the broader type of resulting patterns (e.g., periodic patterns, peaks emerging around a spike, etc.) that a given gene network topology can produce. We now explicitly discuss these dependencies in the 5th paragraph of the discussion.
  
  "For any non-trivial pattern transformation (as long as it is symmetric around the initial spike), there exists an H gene network capable of producing it from a spike initial pattern." (Main text l.366f).
  
  We now provide a more detailed justification of this statement and the limits of its applicability. This is now in section: “The ensemble of possible pattern transformations from spike initial patterns in H networks“. To make this section easier to understand, however, we have also done changes through all the hierarchic networks sections.
  
  "In 2D there are no peaks but concentric rings of high gene product concentration centered around the spike, while in 3D there are concentric spherical shells." (Main text l. 447ff).
  
  This result pertains specifically to pattern transformations arising from spike initial patterns. As defined in the text, spike initial patterns are radially symmetric (at least far away from the boundary). Since diffusion preserves radial symmetry, pattern transformations from spike initial patterns in two or three dimensions reduce to effectively one-dimensional transformations along each radial direction. In this framework, each pair of concentration peaks symmetric with respect to the spike in one dimension corresponds to a ridge surrounding the spike in two dimensions, and each ridge in two dimensions becomes a spherical ridge shell around the spike in three dimensions. In the current version we explain what happens in 1D but also, in the same places, what happens in 2D and 3D (and we have added figures to visualize this in 2D, e.g. Fig.1 and Fig.9)).
  
  (4) The study identifies one-signal networks and examines how combinations of these structures can give rise to minimal pattern-forming subnetworks. However, the analysis of the combinations of these minimal pattern-forming subnetworks remains relatively brief, and the manuscript does not explore how the results might change if the subnetworks were combined in upstream and downstream configurations. In our view, it is not evident that all possible gene regulatory networks can be fully characterized by these categories, nor that the resulting patterns can be reliably predicted. Rather, the approach appears more suited to identifying which known subnetworks are present within a larger network, without necessarily capturing the full dynamics of more complex configurations.
  
  We acknowledge that our explanation regarding the combination of sub-networks may have been too brief. We now provide a more detailed description in the section “Gene networks combining different classes of subnetworks” and in its sub-sections. There we explore the different ways in which signal subnetworks can be combined (upstream, downstream, in series, in parallel, etc.). However, this section cannot be understood (and that may have been the problem in the original version of the manuscript) without the linear stability analysis section that is now in the main text, and the associated discussion on the dispersion relation and results related to it. These are important because they apply to all gene networks and, thus, constrain the possible gene network topologies and the types of possible pattern transformations. In other words, whichever ways gene networks are combined, they will always be RD-stable (i.e. no pattern transformation) or RD-unstable of the first (periodic resulting patterns) or second kind (other patterns we discuss). In the current version, we combine this fact with other arguments to describe the types of pattern transformations possible by gene networks combining the different classes of subnetworks.
  
  (6) The manuscript lacks a clear and detailed explanation of the underlying model and its assumptions. In particular, it is not well-defined what constitutes a "cell" in the context of the model, nor is it justified why spatial features of cells -such as their size or boundaries- can be neglected. Furthermore, the concept of the extracellular space in the one-dimensional model remains ambiguous, making it unclear which gene products are assumed to diffuse.
  
  We now clarify all these points in the first three paragraphs of the “Methods: the Model” section. We have also included a figure for that clarification (Fig.3).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  I suggest the following changes for each weakness I mentioned in the Public Review:
  
  (1) Presentation
  
  (R1.1) (a) Add a one-page "Key Requirements" table (e.g., immediately after the Model section) that lists every requirement code (R1-R5, I1-I2, RH1-RH2, etc.), its one-line statement, and the SI section where it is proved.
  
  In the new version of the article each requirement has its own paragraph starting with the requirement label, e.g. R1 (in bold): ….. We introduce each requirement there where they are justified or proven, otherwise the reader may not know where do they come from. We have also hyperlinked all requirements and most equations so that the reader can easily go back to the explanation of each requirement and equation.
  
  (R.1.2) Provide more figures illustrating the general structure of networks when you describe them; the network sketches could be folded into a single summary figure, so the reader sees all motifs at once. For example, in lines 304-311, it took me a while to understand if the requirement means just A -> k - ... ⊣ j, or it additionally requires A->...->j (through another pathway). It seems that the full requirement is A → k ⊣ j together with an independent positive route A → j. A figure describing the network structure, or at least a schematic "inline" plot in the spirit of what I just wrote, could help. This is just one example, but the text consists of a constant flow of such "diagrams encrypted in prose".
  
  We have followed the reviewer’s suggestions. Not all fit in a single figure so we have constructed new figures 4 and 5 for that purpose.
  
  (R.1.3) (b) Also consider supporting the main text with some key formulas and arguments from SI. My overall suggestion here is that it would be great to make the main text less prosaic and more self-consistent, if the journal requirements allow it.
  
  After the suggestions by both reviewers, and for the sake of clarity, we have actually moved (and clarified) several key parts of the SI into the main text. These include the whole “Linear stability analysis” and “Positive regulatory loops determine the kind of RD-instability” sections. These parts, although quite mathematical, facilitate the understanding of our results.
  
  (2) Linearisation
  
  (R.1.5) It's clear that keeping non-linearity is complicated and maybe redundant, but please, discuss the assumption of linearity explicitly, especially in the scope of relevance for the real systems, and explain why it's not important, if so. I guess that relaxing this assumption may affect the argumentation in many places, for example, equation (3) of the main text could break (i.e., if the signaling molecule can be consumed in some reaction of A+B->AB kind).
  
  We agree that the original version was not explicit enough about the reasons for the linear approximation. The first and last paragraphs of the section “Linear stability analysis” are explicitly devoted to justify this linearization. Moreover, the hierarchical network section is now written without using the linearization.
  
  We are not sure we understand which is the problem with the A+B→AB reaction. We are not assuming any specific f function, just the ensemble of functions that fulfill our requirements (R1 to R5). It is only for the simulations that we have to use a specific f. The reactions suggested by the reviewer could represent an f of the form d[AB]/dt=fAB([A]*[B])-m*[AB]**n for AB and d[A]/dt=-fAB([AB]) and d[B]/dt=-fAB([AB]), where fA and fB are functions that decrease with their arguments. We see no reason why there cannot be a fAB that fulfills our requirements. For example fAB=[A]*[B]/(K+[A]*[B])-m*[AB]. See also related comments in the public comments file.
  
  (R.1.6) Please, provide a separate section where you reformulate the definition of "non-trivial pattern transformation" for two- and three-dimensional domains, and summarize in this section why the analysis provided for 1D is relevant for higher-dimensional systems. By now, I'm not convinced.
  
  There was indeed a problem with the way we described non-triviality beyond 1D in the original version of the article. We have now refined the definition of pattern transformations so that it is understandable in 2D and 3D. This definition is presented in the introduction already (in P1 and P2). We have modified figure 1 accordingly.
  
  Reviewer #2 (Recommendations for the authors):
  
  Major Issues
  
  (1) Mathematical Proofs
  
  (R2.1) We strongly recommend that the authors revisit the mathematical derivations or provide a clear and rigorous justification for the assumptions made therein. These assumptions currently appear unjustified or overly simplistic, especially in light of the nonlinear dynamics the authors aim to describe. The authors should comment on why they expect their results to generalize to all complex network structures, as claimed, and not only apply to the simplified examples analyzed in the paper.
  
  The article has now been restructured to that end. Concerning the assumptions, they are now all explicitly described in the “Methods: the model” section. Concerning the derivations they are through all the results section. A major change in this line has been the moving of part of the supplementary into specific sections in the main text (and the consequent adaptation of the rest of the text). There are important points of the derivation that may have been buried into the old supplementary and that are crucial to understand the whole argument in the article. In fact, a large part of the results section is just a long argument to show that there are essentially only three classes of gene network topologies that can lead to non-trivial pattern transformations. These arguments are summed up in the last paragraph of the new section “Positive regulatory loops determine the kind of RD-instability” and in the first paragraph of the discussion. In brief:
  
  (1) Pattern transformation requires gene networks with extracellular signals
  
  (2) Applying previous mathematical results we show (given the broad requirements on f we have) that pattern transformation is only possible in gene networks that contain positive regulatory loops.
  
  (3) Applying previous mathematical results we show that in the gene networks in which these loops are extracellular, the only possible non-trivial pattern transformations lead to periodic resulting patterns.
  
  (4) Applying previous mathematical results we show that in the gene networks in which these loops are INTRAcellular, the only possible non-trivial pattern transformations do not necessarily lead to periodic resulting patterns.
  
  (5) Using simple logical arguments we also show that no non-trivial pattern transformations are possible in gene networks without negative interactions.
  
  (6) All the above points combined shows that there are only three classes of gene networks capable of nontrivial pattern transformations. 1) Those with intracellular positive loops, extracellular signals that do not affect themselves and some negative regulation by those (that we call hierarchic networks) 2) Those with intracellular positive loops and extracellular signals that affect themselves negatively (that we now call over-Turing networks) 3) Those with extracellular positive loops and an extracellular negative loops (that following previous work by others are called Turing networks).
  
  (7) Following previous research and different developmental arguments we explore the types of patterns transformations each of these three classes of gene networks can lead to. These types are characterized only in broad and potential terms. We say nothing about the parameters values for which any gene network leads to any specific pattern transformation. What we say is which types of pattern transformation may be possible (for some possible parameter combination) and which ones are not possible from gene network topology alone (based on the types of loops and so on).
  
  (R.2.3) Additional to the examples provided in the Public Review, claims such as "despite the large amount of theoretically possible gene network topologies, all gene network topologies necessary for pattern formation fall into just three fundamental classes and their combinations" (l. 34ff)
  
  This statement was originally intended as an introduction of the text following after it but it seems now clear that this was not apparent enough. This statement has been deleted but we convey a similar message letter in the text, now once its justification is provided. In fact, the justification for this statement is the summary we just described in the previous point (R.2.2) and it is discussed over the main text and summarized in the last paragraph of section “Positive regulatory loops determine the kind of RD instability”.
  
  (R.2.4) and "The same applies to the topologies we found not to be able to lead to non-trivial pattern transformation" (S7) are not or inadequately justified and should be either substantiated or significantly toned down.
  
  The same comments that above apply.
  
  (R.2.5) (a) We advise the authors to argue why it is enough to prove key results by considering linear dynamics (see S2-S7). While linearization is a common technique, the authors themselves emphasize the importance of nonlinearities in pattern formation throughout the paper.
  
  In the current version we provide an explicit justification for this in the section “Linear stability analysis”, especially in its first paragraph. Moreover, for the analysis of the hierarchical networks we do longer use any linearization.
  
  (R.2.6) (b) To make linear analysis meaningful, we suggest restricting the initial conditions to small fluctuations (e.g., small spikes or noise), which would justify using linearization to investigate the onset of non-trivial pattern formation. Alternatively, the authors should attempt to generalize the results to fully nonlinear dynamics, ideally for a broader class of functions f.
  
  As we now explain, the homogeneous-with-noise initial pattern already correspond to small perturbations around the homogeneous steady state (due to molecular noise). In addition, for the spike and spike–homogeneous initial pattern we now explicitly consider spikes of small amplitude. We acknowledge that the use of larger spikes in the previous version could lead to misunderstandings regarding the validity of the linear approximation, even though it does not contradict the assumptions underlying the analysis. In these initial patterns, pattern formation arises because the signal secreted from the spike diffuses into the surrounding domain, so that cells outside the spike experience only small deviations from the equilibrium concentration.
  
  Larger spikes may induce stronger deviations in cells located very close to the spike; however, because the spike occupies a region that is very small relative to the total domain size, these local effects do not influence pattern formation in the bulk of the domain. A similar situation occurs with boundary effects in cells located near the domain limits, which likewise do not affect the pattern formation process away from the boundaries. We have clarified this point in the revised manuscript, both in the final sentences of the Introduction and in the description of the initial conditions in the fourth paragraph of the “Linear stability analysis” section, where we explicitly state that each initial pattern can be interpreted as a perturbation of an otherwise homogeneous pattern.
  
  (R.2.7) (c) The assumptions required for the proofs should be explicitly stated and justified. At present, the logic behind the chosen constraints on f is unclear, and the flow of the argument suffers as a result.
  
  The actual justification for the requirements (i.e. constraints) on f are biological (and we now explain them more explicitly when we introduce these requirements). Most of the mathematical proofs do not require these requirements except when we explicitly say so.
  
  (R.2.8) (d) The illustrative functions provided in some of the proofs in the SI (e.g. S5.2.1 "To see this, let us consider, for example, that they are both quadratic monomials of the form f_k(g_A)=B_k g_A^2 and f_j(g_A)=B_j g_A^2") do not satisfy the authors' own stated conditions (e.g., this function violates requirement R4 (l.197 f)). More suitable examples should be selected to ensure consistency between assumptions and illustrations.
  
  We have changed the whole section (based on the comment R.2.9 from the same reviewer). We now provide arguments in the main text that generally do not rely on specific fs.
  
  (R.2.9) (e) Currently, all mathematical results are confined to the appendix. We recommend including key insights from the proofs in the main text to improve readability and to allow the main claims to stand on their own. For example, the section on the requirements RH2a and RH2b (l. 320 - l. 335)) would benefit strongly from the insights from S5.2.1
  
  We agree. We have moved the linear stability analysis and the dispersion relation section to the main text. We have also moved what used to be S5.2.1.
  
  (2) Simulations
  
  The simulations raise, as mentioned in the Public Review, several concerns regarding their generality and validity.
  
  (R.2.10) (a) We recommend validating the simulation results by comparing them with simulations of the full nonlinear equations. The authors should at least provide the equations for the full dynamics and explain how the expansion is performed and why it is valid. This also includes verifying the assumed steady states (g_i=0 and g_i=c_i, where 1/c_i = \mu_i or \hat{\mu}_i).
  
  We are simulating the whole non-linear equations. Here it is important to stress, as we do now in the main text, that our results apply to any f, as long as it fulfills our R1-R5 requirements. However, for the simulations in the figures we have to use a specific f (since there is an infinite amount of fs that fulfill our requirements). Again the figures are just examples to visualize the types of resulting patterns and gene networks we talk about.
  
  In the original version we may not have been clear enough about the equations used for the simulations. The presentation of the Maini-Miura model has been revised to improve clarity (equation S6.1 in SI). In particular, the existence of a homogeneous steady state is now parameterized by a tunable g<sup>*</sup>, that can be chosen as for spike initial patterns or for homogeneous-with-noise and spikehomogeneous initial patterns). We have also included a proof that the model equations satisfies our conditions R1-5. Indeed, the model is non-linear as long as σ<sup>i</sup>≠0 for some gene product (as we explicitly assume).
  
  The derivation of this cubic model from a separate expansion of general reaction-diffusion dynamics can be found in the original paper (Miura & Maini, 2004), with further applications to pattern formation that supporting its validity in subsequent works (Marcon et al., 2016; Diego et al., 2018). Importantly, this expansion is independent of the linearization performed in the main text of our article to derive the dispersion relation. The reference to this separate expansion in the previous version was included solely for contextual purposes; however, we have removed it in the revised manuscript to avoid potential confusion.
  
  (R.2.11) (b) The use of a Jacobian that is independent of the steady-state contradicts the assumption of nonlinearity (requirement R2 (l. 192f)) of f. We ask the authors to clarify this.
  
  We believe this concern arises from a notational ambiguity in the previous version of the manuscript, which has now been corrected: the matrix appearing in the regulatory term has been renamed from J to W<sup>T</sup>. As stated in the main text, the jacobian of the regulation function f(g) evaluated at the homogeneous steady state must coincide with the transpose of the network weight matrix. With the current equations (S6.1), we have , from which we easily get . Also, it is clear that the Jacobian of f(g) is not independent of g.
  
  (R.2.12) (c) In Figure S3 and similar simulations, the implementation of the nonlinear terms is ambiguous. The function f shown does not correspond to the Jacobian, and it remains unclear how these components are ultimately implemented in the simulation code. Additionally, as mentioned, it does not fulfill the necessary conditions for the global steady state.
  
  The implementation of nonlinear entries in f(g) whenever they are needed is now made explicit in the corresponding subsection of section S6 in the SI. With the new notation it becomes clearer that the fs used can fulfill the necessary conditions for the global steady state.
  
  (R.2.13) (d) The given function f_8 in S8.10.2 cannot correspond to the mentioned network since the number of gene products does not match the Jacobian and the network.
  
  This was a typo that has now been corrected.
  
  (R.2.14) (e) The given parameters for the figures in the SI do not match the figures. Please check and ensure that the correct figure is referenced (e.g., S8.2 Figure 3)
  
  This was a typo in the numeration of the subsections in the SI that has now been corrected.
  
  (R.2.15) (f) It is unclear which units are used, and the units used for the non-dimensionalization should be provided so one can relate them to biological systems.
  
  It is now explicitly stated in the revised version that the model equations are formulated in arbitrary units. This implies that the model dynamics are consistent with the characteristic units of any particular biological system under consideration. No non-dimensionalization of the model equations has been considered.
  
  (3) Conceptual and Structural Clarity
  
  The manuscript suffers from a lack of structural clarity, which affects both readability and scientific coherence.
  
  (R.2.16) (a) In one of the central figures (Figure 4) supporting their main claim, the naming of the network is not consistent with the main text. The network category referred to as "Over-Turing" is never mentioned in the main text. We suspect this should actually be labeled as the "noise-amplifying network."
  
  Indeed. This has now been corrected. We now use only the term “Over-Turing” in the article.
  
  (R.2.17) (b) The Supplementary Information includes an analysis of dispersion relations to classify patternforming networks, but this approach is not mentioned or referenced in the main text.
  
  This part of the SI has been moved to the main text and the dispersion relation has been fully and explicitly integrated in the overall argument of the article.
  
  (R.2.18) (c) In relation to Figure 6, we found that the concept of "diversity of possible final patterns" would benefit from a clearer definition and explanation. It is not immediately evident how this diversity is measured or what criteria are used to compare different networks. For instance, it is unclear why the Over-Turing network - which generates both periodic and noisy patterns - is considered to exhibit low diversity, whereas the Turing networks, which produce only periodic patterns, are described as having high diversity.
  
  This was just a large typo. The figure has been corrected. The reasons for this differences are now described in the last three paragraphs of the section “The ensemble of possible pattern transformations from H gene networks and spike initial conditions” for the hierarchical networks and in the last paragraph of the section “Pattern transformations in L- subnetworks from spike-homogeneous initial patterns ”, for the noise amplifying networks and in the seventh paragraph of the section “Pattern transformations in the combination of L+ and L- subnetworks” for the Turing networks.
  
  (R.2.19) (d) Additionally, the dependence of final patterns on initial conditions is not clearly described. It seems that this relationship is only analyzed for non-trivial pattern formations, but this is not explicitly stated. Clarifying these points in the caption of Figure 6 would greatly help readers understand the interpretation and significance of the results presented in this figure.
  
  Indeed, we have done nothing for the trivial pattern transformations. We are now more explicit about this already from the introduction. This article is only concerned with non-trivial pattern transformations. For each type of gene network we now provide a more detailed description of how the resulting pattern depends on the initial pattern (in the sections for each gene network).
  
  (R.2.20) (e) The significance statement is simply a verbatim repetition of parts of the abstract. This defeats its purpose, which is to articulate the broader implications of the work. We urge the authors to rewrite this section with a focus on significance rather than summary.
  
  We have now corrected this.
  
  (R.2.21) (f) We suggest including a dedicated figure to illustrate the biological model, depicting cells, intracellular and extracellular compartments, and the presence or absence of boundaries between adjacent cells. Such a figure would significantly enhance readers' understanding of the system being discussed.
  
  We have now done that. See new figure 3.
  
  (R.2.22) (g) We encourage the authors to strengthen the 2D and 3D results presented in the paper by adding supporting citations, sharing implementation details, or providing a more in-depth analysis of these systems. If such additions are not feasible, it may be best to remove references to the 2D and 3D systems to maintain clarity and focus.
  
  In the new version of the article we explain why our results on which gene networks can lead to pattern transformation do not depend on the dimensionality of the system. In fact, none of our proofs or arguments assumes or requires a specific number of dimensions. The networks are the same no matter the number of dimensions. The types of possible patterns can be seen as manifesting themselves differently depending on the number of dimensions. In the current version of the manuscript we explain now, every time we explain a resulting pattern, how the pattern is in 1, 2 and 3 dimensions and why. We have added Figures 1 and 9 for that purpose. As we explain in the text, the resulting patterns that are noisy would be noisy no matter the number of dimensions and the ones that are based on a spike in the initial pattern have necessarily radial symmetry (in any number of dimensions). Similarly the periodic patterns will be periodic no matter the number of dimensions (although some aspects of it will change). Similarly, in the 5th paragraph of the discussion we discuss the effects of the shape of the system and the boundary. There was a problem with the definition of pattern transformation we used, but this has now been corrected, in P1 and P2 in the introduction.
  
  (R.2.23) (h) The results section lacks a consistent structure. Section titles do not clearly indicate which phenomena or initial conditions are being analyzed, making it hard for readers to track the logical progression of the study.
  
  Now the results start with some introductory results with the subsections:
  
  “Basic requirements on gene networks capable of pattern transformation”
  
  The rest of the results are split into four clearly differentiated sections:
  
  “Gene network classification”
  
  “Linear stability Analysis”
  
  “Positive regulatory loops determine the kind of RD-instability”
  
  “Hierarchical Networks”
  
  “Emergent networks”.
  
  “Gene networks combining different classes of subnetworks”
  
  The last three sections have several sub-sections inside.
  
  We think that the titles of the sections are self-explanatory since hierarchical networks contain only H subnetworks while the emergent networks contain L+ or L- subnetworks and the last major sections is about how all these can be combined.
  
  Minor Issues
  
  (1) Notation and Terminology
  
  (R.2.24) (a) Variable naming is inconsistent throughout the paper. Terms like g_A(x) and A(x) (S5.2.1) are used for gene network concentrations without consistent usage. The naming of genes in networks also varies between the main text, SI, and figures. I.e., sometimes genes are labelled with small, sometimes with large letters, and sometimes with numbers.
  
  This has now been corrected.
  
  (R.2.25) (b) It would improve clarity to use distinct notations for intracellular vs. extracellular concentrations and gene expressions. Ensure networks and examples are consistent across all figures, captions, and supplementary materials. For example, RH2a and RH2b have different networks in the main text compared to the SI.
  
  As we now explain in the third paragraph of the “Methods: the model” section we consider, for simplicity, that gene products are either intracellular or extracellular. In that sense there is no possible ambiguity. As explained in that section, again for simplicity, we do not consider the receptor nor the signal transduction pathways of signals. This means that an extracellular gene product can “directly” regulate intracellular gene products. Because of that, we think that using different notations for extracellular and intracellular gene products would make things more confusing. We have corrected the misnaming between main text and figures.
  
  (R.2.26) (c) We suggest using distinct notation for the gene product itself and for its small deviation from a homogeneous steady state in the SI. This would help clarify whether specific statements apply only within the linearized regime or can be generalized to the full nonlinear dynamics.
  
  We do that in the new version of the article.
  
  (R.2.27) (d) Line 327 contains a mistake: g_k = g_j should be expressed as a proportional relationship. The division by g_A also seems unnecessary - please revise.
  
  This is now explained in a different way so this mistake does not apply.
  
  (2) Model Description
  
  (R.2.28) (a) Justify why boundary effects and spatial separation between cells can be neglected in the model.
  
  This is now discussed in the 5th paragraph of the model section. We do not claim that boundary effects are negligible. We claim, instead, that which are the gene networks that can lead to pattern transformations do not depend on the boundaries. The same occurs for the types of resulting patterns, in the coarse way we use, possible from each gene network and initial pattern.
  
  As stated in the first two paragraphs of the model section, the spatial separation between cells can be ignored because we assume there are many cells in the system and these are evenly spaced and sized (at least roughly). That is usually the case in animal development, although not always (there are exceptions in the very early stages of many marine invertebrates), and we do not claim to know exactly what happens in those cases: as we stated in the first paragraph of the introduction we assume systems made of many small cells.
  
  (R.2.29) (b) State explicitly that only extracellular gene products are assumed to diffuse - this is currently only mentioned in the SI.
  
  This is now explicitly stated early on in the first three paragraphs of the model section and also after the introduction of the model equations (1)-(3).
  
  (R.2.30) (c) In the Supplementary Information, the authors state that both extracellular and intracellular gene products can exhibit non-zero diffusion, which appears inconsistent with the conceptual framework and probably is a typographical error.
  
  This was indeed a typographical error. It is now corrected.
  
  (3) Assumptions and Requirements on f
  
  (R.2.31) (a) The equation for requirement R5 is incorrect as written in the main text and should be reformulated more rigorously. The condition should be stated for all constant values of g_i (and g_j) to avoid misinterpretation; otherwise, one might assume all matrix elements must have the same sign.
  
  This has now been corrected.
  
  (R.2.31) (b) Clarify what restrictions on f prevent pathological nonlinearities like 1/(g_k + \epsilon), which would contradict the assumed behavior at high concentrations.
  
  We do not understand this criticism. 1/(g_+\epsilon) fulfills our requirements on f and we do not see how is that pathological. We are unsure of what the reviewer means by the assumed behavior at high concentrations.
  
  (4) Figures and Captions
  
  (R.2.32) In Figure S3b, the diagram shows gene 5 being activated by gene 4, yet the caption states this is a negative regulation - please correct.
  
  This has now been corrected.
  
  (5) Readability and Formatting
  
  (R.2.33) (a) Improve navigation by hyperlinking references to equations, figures, and requirements throughout the document.
  
  In the new version we have inserted these hyperlinks.
  
  (R.2.34) (b) Adding hyperlinks to the requirements would additionally help the reader to keep track of them
  
  In the new version we have inserted these hyperlinks.
  
  (We.2.35) (c) Correct inconsistent or mismatched equation numbers and references. E.g. SI S5.1 is not referring to the correct equation (the equation it should be referring to would be Equation 3), and the reference to Figure 7 in part of the dispersion relation is wrong (as far as we see, this should be Figure 5).
  
  This has all been corrected now.
  
  (R.2.36) (d) Clarify ambiguous language in the introduction. For instance, the description of spike patterns (lines 136f) as a single cell spike contradicts the stated width (SI) and the visual representation involving 500 cells from the figures.
  
  This has now been corrected.
  
  (R.2.36) (e) The discussion of 2D and 3D simulations appears limited to the "noise amplifying" network. It's unclear whether a similar analysis was done for other network types.
  
  In Figures 1 and 9 and through the text we discuss all types of patterns in 2D and 3D.
  
  (6) Typos
  
  (R.2.37) Typos in the text (The following is just a small selection of the typos we came across. Since there are quite a few throughout the manuscript, we may not have caught all of them. We kindly recommend that the authors carefully proofread the full text to ensure consistency and clarity):
  
  We have corrected all the indicated typos and proofread the whole manuscript and SI.
  
  Reviewer #3 (Recommendations for the authors):
  
  Major concern:
  
  (R.3.1) Pattern formation can be induced by the positional information, and reaction-diffusion/Turing mechanisms is a foundational idea in the field. As in the references the manuscript cited, these paradigms were already clearly articulated and synthesized (e.g., Green & Sharpe's work (2015)). Moreover, the search for minimal network topologies that can generate Turing patterns has been extensively explored in Zheng et al. (2016). The novelty of the present work is unclear. It might offer a fresh perspective on an established problem, but it does not seem to present fundamentally new biological or mathematical advances.
  
  If the authors wish to strengthen the novelty and impact of the manuscript, they should consider explicitly acknowledging prior work and positioning their contribution as a formal extension or generalization, not discovery. To enhance the practical relevance of their work, the authors could demonstrate how their framework can be used to predict or classify gene network behaviors in pattern formation that are not easily identifiable through experimental approaches alone. For example, they could show how their classification helps distinguish between Turing, hierarchical, and noise-amplifying dynamics in complex or ambiguous biological systems, thereby offering a guiding tool for experimental design or interpretation.
  
  Indeed, the gene networks we identify have been identified before. We were and we are quite explicit about it, in the discussion, and we do cite the relevant work on that (including the one suggested by the reviewer). The novelty of the work is not identifying these gene networks, nor minimal ones, but showing that these are all the possible ones for pattern transformation (that there is no new type of network), this has not been done before (not even intended) and we are very explicit about that being our results (first paragraphs of the discussion).
  
  Minor concern:
  
  The writing style and language usage can be improved for clarity. Some explanations in the results and discussion can benefit from tight editing to eliminate redundancy and improve readability.
  
  We have corrected all the indicated typos and proofread the whole manuscript and SI.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.05.06.652477v3
socialsci.libretexts.org socialsci.libretexts.org

1.3: Issues in Development

1
1. KruJuneBug 30 Jun 2026
  
  in Public
  
  more advanced skills that were already present in some form in the child
  
  I love this idea because it made me think more deeply. It reminds me of a seed. A seed already has the potential to become a particular kind of tree, but it needs many different factors such as water, sunlight, nutrients, and time to continue growing. Eventually, it becomes a strong tree that benefits the world by supporting the ecosystem and providing shade for others.
  
  I think human development is similar. We don't suddenly gain completely new abilities as adults. Instead, we build on skills that already exist in an early form during childhood. For example, when we are five years old, we learn to tie our shoelaces. Later, we learn how to solve problems in school, such as passing an English test. As adults, we may solve much more complex problems, like running a business or leading a team. The skill is still problem-solving, it has simply become more advanced over time. That is why I love the idea of continuous development.
Visit annotations in context

Annotators

KruJuneBug

URL

socialsci.libretexts.org/Bookshelves/Early_Childhood_Education/Child_Growth_and_Development_(Paris_Ricardo_Rymond_and_Johnson)/01:_Introduction_to_Child_Development/1.03:_Issues_in_Development
www.biorxiv.org www.biorxiv.org

In-cell cryo-electron tomography reveals differential effects of type I and type II kinase inhibitors on LRRK2 filament formation and microtubule association

1
1. Public_Reviews 29 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  In this study, the authors set out to determine how two classes of kinase inhibitors, which stabilise a disease-relevant enzyme in either an active (Type I) or inactive state (Type II), influence its organisation and interactions with microtubule filaments in cells. Using the state-ofthe-art in-cell structural imaging approaches, they examine how these compounds affect the formation of protein filaments and their association with microtubules, and succeed in defining the underlying structural basis for these differences.
  
  A major strength of the work is the application of in-cell cryo-electron tomography combined with correlative imaging, which enables direct visualisation of protein organisation in a near-native cellular context. The data convincingly demonstrate that the Type I inhibitor compound stabilising the active state promotes extensive LRRK2 filament formation and microtubule bundling, whereas compounds stabilising the inactive state markedly reduce these interactions. The structural analysis further provides insight into how conformational states relate to filament organisation, including modelling of previously unresolved regions of the protein.
  
  These findings are internally consistent and align well with prior biochemical and structural studies, many of which were performed by the same team.
  
  There are, however, some limitations that should be noted. The experiments rely on overexpression of the I2020T mutant form of the LRRK2 protein, which is a rare variant, in a single cell type (293T cells), which may not fully reflect endogenous behaviour or wild-type LRRK2 in a physiological context. In addition, while the imaging data are compelling, the functional consequences of the observed filament formation and microtubule association remain unclear.
  
  The study therefore provides strong descriptive and structural insight, but more limited evidence linking these observations to cellular or disease-relevant outcomes.
  
  Overall, the authors largely achieve their aims, and the results support their central conclusion that different classes of kinase inhibitors have distinct effects on protein organisation in cells. The work represents an important advance in understanding how small molecules can reshape protein architecture in a cellular environment, with potential implications for therapeutic strategies. The methodological approach will also be of broad interest to the field, as it highlights the power of in-cell structural biology to study dynamic protein assemblies that are difficult to capture using traditional approaches.
  
  We thank the reviewer for their thoughtful and positive assessment of our work. We appreciate their recognition that in-cell cryo-electron tomography and correlative imaging provide a powerful approach for directly visualizing how small-molecule inhibitors reshape LRRK2 organization in a cellular environment.
  
  We agree that the use of overexpressed LRRK2I2020T in HEK293T cells represents an important limitation of the present study. This experimental system was selected because it enabled visualization and structural analysis of inhibitor-dependent LRRK2 assemblies in cells. However, the extent to which these observations apply to endogenous LRRK2, wild-type protein, other disease-associated variants, or physiologically relevant cell types remains to be established.
  
  We also agree that the functional consequences of inhibitor-dependent LRRK2 filament formation and microtubule association remain unresolved. The goal of the present study was to define how type I and type II kinase inhibitors alter the cellular organization and structural state of LRRK2. Our data demonstrate that these inhibitor classes have markedly different effects on LRRK2 filament formation and microtubule association in cells, and provide a structural framework for understanding these differences. Future studies will be required to determine how these assemblies influence LRRK2 signaling, microtubule-based processes, and diseaserelevant cellular phenotypes.
  
  We thank the reviewer for highlighting both the methodological significance of this work and its potential implications for understanding how therapeutic molecules remodel protein architecture in cells.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Mutations in Leucine-Rich Repeat Kinase 2 (LRRK2) are a major cause of Parkinson's disease. LRRK2 PD-related mutations all result in increased kinase activity. Therefore, LRRK2 has been the focus of the development of kinase inhibitors. So far, two classes of kinase inhibitors have been identified: type 1 LRRK2-specific inhibitors that stabilize LRRK2 in a closed active-like conformation and broad-range type 2 inhibitors that stabilize LRRK2 in an open inactive-like conformation. Basiashvili et al. used here in cell structural biology to study the effect of both type 1 and type 2 inhibitors on the localization and structural conformation of LRRK2-I2020T.
  
  Strengths:
  
  They showed that Type 1 and not Type 2 inhibitors induce LRRK2 filament/ on microtubules.
  
  Furthermore, they were able to build a structural map of full-length LRRK2 I2020T bound to a Type 1 inhibitor in a closed kinase confirmation. Together, this work thus confirms the data of previous studies that showed that LRRK2 Type 1 and 2 inhibitors differently affect filament formation.
  
  Weaknesses:
  
  All conclusions are fully supported by the provided data. However, as the authors indicated themselves, the physiological relevance of LRRK2 microtubule binding is questionable. Furthermore, although the authors used a full-length LRRK2 protein, like in previously published structures, the resolution of the N-terminal domains is rather poor. Therefore, it also remains unclear what we learn from this structure compared to the previously published structures.
  
  We thank the reviewer for their positive evaluation of our study and for recognizing that our conclusions are supported by the data.
  
  We agree that the physiological relevance of LRRK2 filament formation and microtubule association remains an important open question. Our study was designed to determine how type I and type II inhibitors affect the cellular organization and structural conformation of LRRK2. We explicitly acknowledge that future studies using endogenous LRRK2, disease-relevant cellular systems, and functional assays will be necessary to determine the biological significance of inhibitor-induced microtubule association.
  
  We also appreciate the reviewer’s comment regarding the resolution of the N-terminal domains. Although the N-terminal density does not support detailed atomic interpretation, its visualization provides information about the global organization of full-length LRRK2 within an inhibitorinduced, microtubule-associated assembly in cells. Importantly, our study does not claim highresolution structural determination of the N-terminal regions. Rather, the advance is the in-cell structural observation of full-length LRRK2<sup>I2020T</sup> in a type I inhibitor-stabilized, closed-kinase conformation, together with density indicating that the N-terminal repeat regions adopt an organization within the microtubule-associated lattice.
  
  We have revised the manuscript to clarify this point and to more carefully distinguish the structural information supported by the density from interpretations that would require higherresolution data.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This paper describes new insights into the effects of type-I and type-II LRRK2 inhibitors on HEK293T cells that over-express GFP-labeled LRRK2-I2020T. Using correlative light microscopy and cryo-electron tomography, a type-I inhibitor leads to the extensive decoration of microtubules with LRRK2, which is not seen for a type-II inhibitor. Subtomogram averaging reveals that LRRK2 binds to the microtubules in a closed-kinase conformation, with density for the N-terminal arms.
  
  Strengths:
  
  The paper is well written; the CLEM and cryo-ET appear to be done to a high standard. Consequently, I have only minor comments.
  
  Weaknesses:
  
  The resolution of the subtomogram averages is somewhat limited, but the authors have adequately limited the number of degrees of freedom in the fitting of their atomic models by only allowing rigid-body transformations of separate parts of LRRK2.
  
  The authors should include FSC curves between the rigid-body fitted atomic models and the various sub-tomogram average maps.
  
  We thank the reviewer for their positive assessment of the manuscript and for recognizing the quality of the correlative imaging and in-cell cryo-electron tomography analyses.
  
  We also appreciate the reviewer’s recognition that our interpretation of the maps was appropriately constrained by fitting domains as rigid bodies, rather than attempting unsupported high-resolution model refinement.
  
  We thank the reviewer for highlighting this and apologize for the oversight. We have added all the missing FSC curve plots of subtomogram maps presented in this study in Extended Data Figure 8.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  I think the current study is OK as it is, and the authors have taken this as far as they can.
  
  In future work, for either the authors or others in the field, it will be important to determine whether endogenous LRRK2 can be recruited to microtubules in response to compounds that stabilise the active state, particularly in cell types that are more relevant to Parkinson's disease. Does this cause a roadblock that impacts microtubule-driven transport? Establishing whether such recruitment occurs under physiological expression levels will be critical for assessing the broader relevance of the findings.
  
  In addition, it would be valuable to evaluate whether these Type 1 compounds have detrimental cellular effects linked to altered endogenous LRRK2-driven microtubule association, and whether inhibitors that stabilise the inactive state offer a potential advantage by avoiding this phenotype.
  
  We thank the reviewer for insightful recommendations for future studies.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Figure 5: What is map C, and how is it different from the other maps? The authors indicate that the resolution of the N-terminal domains is moderate. How certain are the authors of the fit of these domains? Since map C is not provided in the supplemental, it is not possible to check this.
  
  We apologize for this oversight. We have updated the text to reflect how the map C was calculated. Now the text reads:
  
  “Additionally, we performed subtomogram analysis in Dynamo on a larger LRRK2<sup>IT</sup>decorated lattice that contained three layers of LRRK2<sup>IT</sup> density around the microtubule; we refer to this average as map C. Refinement was focused on the central four LRRK2<sup>IT</sup> subunits to better resolve additional protein densities within this larger lattice. In map C (Fig. 5A; Ext. Fig. 7).”
  
  In addition, we updated the figure 5D-F to demonstrate clear fit of the N-terminal domains into the presented map. We also added an Extended Data Figure 7 to the supplemental materials to highlight the fit of the model in the map and highlight the areas that would correspond to the Nterminal domains of LRRK2. We hope these updates demonstrate a good fit and justify observations highlighted in the paper.
  
  (2) The authors convincingly confirm that LRRK2 Type 1 and 2 inhibitors differently affect filament formation and that type 1 LRRK2-specific inhibitors stabilize LRRK2 in a closed activelike conformation. However, from the way the paper is written, it is unclear what we learn from this new structural data. How similar is the current structure compared to the previous structures? What is the novelty?
  
  We thank the reviewer for noting that this is unclear and giving us the opportunity to highlight it in the manuscript. We have added the following sentence in the discussion:
  
  “However, how the N-terminal repeats of LRRK2 are organized when the protein is in its closedkinase conformation remained unresolved. Stabilization of LRRK2 in a closed-kinase conformation by MLi-2 treatment and microtubule association reduces conformational heterogeneity to permit structure determination of full-length LRRK2<sup>IT</sup> with the N-terminal repeats undocked from the catalytic core. Therefore, the key novelty of this structure is that it captures full-length LRRK2<sup>IT</sup> in a cellular, microtubule-associated closed-kinase state and shows that kinase closure is compatible with an undocked N-terminal architecture. This distinguishes the in situ closed-kinase state from previously described in vitro intermediate active states.”
  
  Minor comments:
  
  (1) "Its C-terminal catalytic region is composed of WD40, Roc GTPase, Kinase and COR (RCKW) domains."
  
  Suggest changing this to Roc GTPase, Cor, Kinase and WD40 (RCKW) domains for clarity/following of abbreviation.
  
  We have made this change.
  
  (2) "In the MLi-2 treated cells, LRRK2IT strands were organized around microtubules with a regularly spaced lattice, similar to the LRRK2IT strands in cells not treated without the inhibitor (Fig. 3A-E)"
  
  Phrasing, correct the underlined portion.
  
  We have made this change.
  
  (3) "While average pitch. rise, and handedness of the filaments of the rate GZD-824 treated LRRK2 filaments were similar..."
  
  Punctuation.
  
  We have made this change.
  
  (4) "Our results clarify the relationship between kinase conformation, repeat undocking, and microtubule association. Increased microtubule association observed for I2020T mutant favors repeat undocking, a prerequisite for kinase closure and filament assembly"
  
  Do the authors mean undocking by the N-terminal repeats or repeatedly undocking of these domains?
  
  We meant undocking of the domains, and have corrected the sentence to clarify this.
  
  (5) "Together, these findings provide a structural view of full-length LRRK2 in a closed kinaseconformation and capture a resolved snapshot along its conformational continuum"
  
  Needs a space.
  
  We have made this change, and thank the reviewer for pointing it out.
  
  (6) "Microtubule decoration by LRRK2IT has not been studied in cell types that endogenously express high levels of LRRK2, such as lung epithelial cells and brain-resident immune cells including microglia and macrophages44. Thus, it remains possible that aberrant LRRK2microtubule interactions occur under physiological expression conditions, potentially disrupting homeostatic intracellular transport and being further exacerbated by type I LRRK2 inhibitors, as suggested by in vitro studies23,45."
  
  Many studies have studied the localization of endogenous LRRK2, however were not able to detect filament localization on microtubules. Moreover, to my knowledge, there is also no clear evidence that type 1 inhibitors disrupt microtubule transport in cells expressing endogenous levels of LRRK2.
  
  Therefore, I suggest to rephrase or remove this paragraph.
  
  We agree that the current evidence does not establish that this occurs broadly in cells. However, to our knowledge, cells or tissues with high endogenous LRRK2 expression have not yet been systematically examined in this context. We therefore present sparse decoration of hyperactive LRRK2 on microtubules as a possibility rather than a strong conclusion. We have also previously shown that type I inhibitors disrupt microtubule transport in vitro, but determining whether a similar effect occurs in cells is ongoing work and beyond the scope of the present manuscript.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) P4: The first section of the Results refers to LRRK2 localising to microtubules in the presence of the type-I compounds, and to the cytosol with the type-II inhibitor. Aren't microtubules in the cytosol also?
  
  We meant cytosolic LRRK2, we have revised the text to reflect this. It now reads:
  
  In cells treated with MLi-2, we observed LRRK2<sup>IT</sup> in extended filaments, puncta, and diffuse in the cytosol (Fig. 1D-E; Ext. Fig 1A-D). In contrast, when cells were treated with GZD-824, LRRK2<sup>IT</sup> was mostly localized to puncta and distributed throughout the cytosol, with reduced filament formation (Fig. 1F-G; Ext. Fig 1E-H), in agreement with our previous work [23,24,40].
  
  (2) P4: second column, halfway down. I don't understand how the 16 and 8 neighbours are derived from Figure 3J-K. Perhaps indicate this in the figure?
  
  Thank you for bringing this to our attention. We have added an Extended Data Figure 5 to clarify this point. The Extended data figure 5 highlights and annotates the immediate neighboring LRRK2 densities in the MLi-2- and GZD-824-treated lattices, making clear how the 16 and 8 nearest-neighbor values were assigned from the observed lattice organization.
  
  (3) P6: first column, halfway down: perhaps make it explicit that only rigid-body fitting was performed because of the limited resolution?
  
  We have incorporated this useful suggestion. The text now reads:
  
  “We split this model in three parts: the WD40 and C-lobe of the kinase, the N-lobe of the kinase with ROC and COR domains, and the LRR and ANK domains, aligned and fitted each of these three to our map A (Fig. 4D-F). Given the limited resolution of the map A, we fit the model as three rigid bodies without atomic refinement.”
  
  (4) P6: same column near the bottom: what is map C? and how was it calculated? Also, it is not clear to me from Figures 5D-F whether the statement "clearly correspond to the LRR-ANK-ARM domains" is justified by the map. From Figure 5D-F, I see a rather poor fit in a low-resolution map. This needs to be toned down or better illustrated.
  
  We apologize for the oversight. We have updated the text to clarify how the map C was calculated. Now the text reads:
  
  “Additionally, we performed subtomogram analysis in Dynamo on a larger LRRK2<sup>IT</sup>decorated lattice that contained three layers of LRRK2<sup>IT</sup> density around the microtubule; we refer to this average as map C. Refinement was focused on the central four LRRK2<sup>IT</sup> subunits to better resolve additional protein densities within this larger lattice. In map C (Fig. 5A; Ext. Fig. 7).”
  
  In addition, we updated the figure 5D-F to better demonstrate the fit of the N-terminal domains into the presented map. We also added an Extended Data Figure 7 to the supplemental materials to further highlight the fit within the map and indicate the areas that correspond to the N-terminal domains of LRRK2. We hope these updates clarify how map C was calculated and better illustrate our interpretation of the additional densities.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2025.12.18.694444v3
www.biorxiv.org www.biorxiv.org

Tonic feedback motor commands predict visuomotor learning

1
1. Public_Reviews 29 Jun 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors investigate the relationship between feedback responses and trial-to-trial learning. In their paradigm, participants were constrained to a channel trial, and a cursor was visually perturbed. Using a channel-perturbation-channel structure, the authors obtain feedback responses to the perturbation and the learning response that ensues. In Experiment 1, the authors demonstrate that temporal dynamics of the learning response (LR) are poorly linked to temporal dynamics of the feedback response (FBR). The LR responses are yoked to the start of the movement, even in cases where the FBR is very delayed. Then, in Experiments 2 and 3, the authors dissect FBR and LR responses into two components: (1) a phasic component that has a peak point mid-movement and then declines, and (2) a tonic component that grows over the movement time course and remains stable during the holding period. The authors provide evidence that LR responses are better predicted from the tonic component of the FBR than the phasic component. The idea that tonic FBR components drive learning over phasic components departs from prior models of error-based learning and provides a new theory to understand sensorimotor adaptation.
  
  Strengths:
  
  (1) The paper is well-written, and the contribution is important and timely. The authors provide clear experiments that change the way we conceptualize how trial-to-trial learning is driven by feedback responses to error.
  
  (2) The paper provides solid evidence to demonstrate that feedback (FBR) and learning (LR) responses are not linked by a fixed delay, in contrast to prior models.
  
  (3) The paper also introduces the concept that both tonic and phasic components of the FBR differentially influence the learning response. The paper provides solid evidence that the tonic forces maintained during holding still have an impact on the learning that proceeds on the next trial. This has implications for models of sensorimotor adaptation and our understanding of the physiology of learning.
  
  Weaknesses:
  
  While some conclusions are strong, I feel that the conclusions regarding FBR and LR relationships need additional analysis. All these concerns are elaborated below. Broadly speaking, there is a concern that some conclusions reached by the authors are linked to the particular phasic/tonic model they use to parse FBR and LR responses. Other models are not considered and could lead to differing results. Furthermore, it is assumed that LRs are scaled FBRs. This assumption excludes the possibility that LRs could be driven by FBRs and other mechanisms, which would alter the way the regression analyses are constructed. As described below, model-free analyses are warranted to corroborate the main findings. Further, the role that phasic-FBR plays in the adaptation process is understated in the Discussion despite evidence to the contrary in Figure 8. Much of the analysis is done on trial-averaged and participant-averaged responses, inflating R2 values. More analysis should be done at the trial level to better examine model performance and accuracy. And while valuable, the authors' experimental approach differs from standard force-field experiments that were initially used to test feedback error learning hypotheses. The paper could benefit from a Limitations section to discuss associated limitations.
  
  Main Concern 1:
  
  The decomposition of FBR and LR into phasic/tonic components is based on a specific model (i.e., Equation (1)). The notion that tonic FBR predicts phasic/tonic LR is based on responses estimated from the model. Thus, it is unclear whether critical findings (e.g., LR responses are predicted by tonic FBR) are true of the "data" or true when the "data are analyzed in the context of their model". In other words, had the authors proposed a different model to decompose the LR/FBR into tonic/phasic components, would they obtain different results?
  
  There are many possible alternatives:
  
  (A) In Equation (1), the phasic and tonic components are assumed to add linearly at all times to obtain the force profile. But the phasic and tonic components could be applied at separate times. The tonic component could be invoked during holding, and the phasic component could be invoked during moving. This type of model will differ from the current version, especially in how the peak force during the moving period is assigned to the phasic/tonic components.
  
  (B) Another possibility is that the tonic and phasic components do indeed operate at the same time (like in Equation (1)), but they are separate, independent controllers. In the author's model, the tonic component is dependent on the phasic component.
  
  (C) Another possibility is that the tonic and phasic components are linked, but not by an integral.
  
  (D) Another possibility is that the phasic component is not a Gaussian function of time.
  
  Concern 1-1:
  
  While it is not possible to explore the entire model space described above, the authors should consider whether other phasic/tonic model classes could lead to qualitatively different results. The authors could also consider other phasic/tonic models if appropriate, and demonstrate that Equation (1) is superior based on an information criterion like AIC or BIC.
  
  Concern 1-2:
  
  I recommend that the authors pursue model-free, empirical analyses to support their findings. This would decrease the reliance on the "correctness" of a particular model. One logical choice would seem to be empirically estimating the phasic component as the peak force during the moving period and the tonic component as the average force during the holding period. In this model-free estimation of phasic and tonic commands, is it still the case that tonic FBR alone predicts LR components?
  
  Concern 1-3:
  
  Building on Concern 1-2, a clear case where the concern about using a model alone to estimate phasic and tonic components is in the across-subject variability analysis in Figure 7. Here, LR and FBR are compared to one another only in the context of the tonic-phasic model in Experiment 1. The result is that only the tonic FBR predicts the tonic LR. But investigating Figures 7b and 7c, it would appear that the peak force applied during the FBR during the moving period (which should reflect the phasic component in large part as in Figure 4a) would predict the peak (or average) force applied during the LR. Thus, the conclusion that tonic FBR only predicts tonic LR may be driven by how the model estimates tonic/phasic FBR/LR rather than a true property of the data. A model-free analysis, as suggested in Concern 1-2, would be helpful in addressing this concern.
  
  Main Concern 2:
  
  Analyses in Figures 4g, 4h, 6c, and 6d are based on relating LR and FBR components with no intercept: y = ax; the LR component is a scaled FBR component. It is unclear if the authors' conclusion would vary had a different model been used. For example, suppose that LR on trial n is partly determined by the FBR and also the sensory error (e) on trial n-1 (where c1 and c2 are constants):<br /> LR(n) = c1 FBR(n-1) + c2 e(n-1)
  
  Another model could suppose that the LR on trial n is due to the FBR on trial n-1, and also a non-specific adaptive component that is independent of both FBR and the sensory error:<br /> LR(n) = c1 FBR(n-1) + c2
  
  Concern 2-1:
  
  For these alternate models, y=ax (i.e., zero intercept) is not an appropriate relationship between LR and FBR components. Had the authors allowed a non-zero intercept in Figs. 4g, 4h, 6c, and 6d, will they still observe that only tonic FBR predicts LR components? In other words, would R2 improve for phasic FBR relationships with a non-zero intercept?
  
  Concern 2-2:
  
  Why was a non-zero intercept allowed for the between-subject analyses in Figure 7, but not for similar analyses in Figures 4 and 6?
  
  Main Concern 3:
  
  The main results in Figures 4g, 4h, 6c, and 6d are based on an R2 value that is calculated on a linear fit to the mean response averaged across participants and trials. This raises the concern that the R2 value is being inflated, and it also misses the rich trial-to-trial variation and subject-to-subject variation that could be used to examine the model's accuracy. A couple of concerns here:
  
  Concern 3-1:
  
  As can be seen from the horizontal and vertical error bars in Figures 4g and 4h, there is considerable variability across participants. While not shown, it is almost certainly the case that there is considerable variability across trials within a participant (as alluded to in the Fig. 8 analyses). The authors should evaluate their model performance and report goodness-of-fit (or error) at the single-trial level. For example, the model could be fit to individual trial data, and the R2 values from the trial fits could be used for comparing the various relationships in Figures 4 and 6. Another idea would be to keep the alpha, beta, T and sigma estimates obtained from the average data, and then apply these parameters to individual trial responses and report the model error. Do phasic FBR commands similarly predict LR components at the trial level, or do trial-level analyses corroborate the current conclusions on tonic FBR superiority?
  
  Concern 3-2:
  
  The authors report on Line 200 that the R2 values of 0.635 and 0.698 have modest predictive power. It would be helpful for the authors to statistically compare the R2 values between Figures 4g and 4h. One idea would be to obtain an R2 value for each individual participant. Then the distribution of R2 values across participants could be compared between the different relationships in Figure 4g/4h (e.g., via a t-test). This would help to better support the idea that Figure 4h shows better model fits than Figure 4g. These analyses could also be conducted for the relevant parts of Figure 6 (Experiment 3). The authors should consider allow a y-intercept in this process as they do in Figure 7.
  
  Main Concern 4:
  
  The authors compare tonic and phasic FBR predictive power in Figure 4. There are other places where the analyses in Figures 4g and 4h should be repeated:
  
  Concern 4-1:
  
  Tonic and phases FBR responses appear to vary in Experiment 1 (Figure 2c), but the authors do not test whether they predict the LR component magnitudes in Figure 2d. Analyses in Figures 4e,4f, 4g, and 4h should be added to the Experiment 1 analysis.
  
  Concern 4-2:
  
  While I understand the rationale behind computing differences in Figure 6 to isolate the second-shift effect on FBR/LR, the authors should still perform the primary investigation in Figures 4e, 4f, 4g, and 4h on the FBR and LR responses in Figures 5b-g (without subtracting the "Maintained" component). In other words, before analyzing the contributions of the second shift in Figure 6, the authors should repeat their analysis in Figure 4 applied to the FBR and LR responses in Figure 5 (without subtracting off the maintained response). How well does Equation (1) and y=ax capture the FBR and LR responses in Figures 5b-g?
  
  Main Concern 5:
  
  Given current practices in human sensorimotor adaptation, the current n=10 (or n=12) group sizes appear limited in size, raising concerns on statistical power.
  
  Concern 5-1:
  
  The authors should consider a power analysis or provide some other justification to support their chosen sample sizes.
  
  Concern 5-2:
  
  It is unclear why cross-correlation analyses in Figure 2e, 3d, and 5h have error bars, but no other FBR or LR time courses have error bars. Error bars should be provided in Figures 2b, 2c, 2d, 3b, 3c, 5b, 5c, 5d, 5e, 5f, 5g, 6a, and 6b.
  
  Concern 5-3:
  
  The subject counts are reported as n=10 for Experiment 1, n=12 for Experiment 2, and n=12 for Experiment 13, but the subject-to-subject analysis in Figure 7 says n=33.
  
  Main Concern 6:
  
  I agree that the author's model suggests that LR responses are most strongly predicted by the tonic FBR component. But I feel the narrative and Discussion surrounding this point are too strong. They paint the picture that only tonic FBR is important in learning. To do this, the role that phasic FBR plays is discounted, and mixed results concerning tonic FBR are overlooked. I feel that the Discussion should be broadened to acknowledge that the authors find evidence that both tonic and phasic FBR appear to influence the learning response, with tonic FBR making the stronger contribution in this task. Here are key areas that require attention:
  
  Concern 6-1:
  
  Importantly, the authors downplay their result in Fig. 8h, that the phasic FBR predicts phasic LR in their Results on Line 350. This argues against the idea that only tonic FBR influence LR parameters. On Line 485, the authors state that "trial-by-trial variability in LR amplitude was explained by the tonic component of the FBR, but not by the phasic component (Fig. 8)." This is not correct. Both the tonic and phasic components of the FBR altered LR components in Figure 8.
  
  Concern 6-2:
  
  Again, it is stated on Line 502, that the phasic FBR component "had only a modest effect on the LR". This again seems to underplay the result. The authors should amend their Results and Discussion to better acknowledge that their data support a role for both tonic and phasic FBR contributions to LR, but the tonic component appears to make a larger contribution in their model.
  
  Concern 6-3:
  
  While the role of phasic FBR in determining LR amplitude appears to be understated, the role of tonic FBR is, on occasion, overstated. The Discussion should mention that there is mixed evidence for the role of tonic FBR in LR parameters. For example, in their between-subjects analysis in Figure 7f, the authors do not find that phasic LR can be predicted by tonic FBR. Thus, across subjects, no component of the FBR appears to predict phasic LR.
  
  Concern 6-4:
  
  To better investigate the role that both phasic FBR and tonic FBR may play in adaptation, it would be advisable for the authors to consider this hypothesis. As it stands, tonic LR or phasic LR is regressed only onto tonic FBR or phasic FBR individually. In Figures 1 (Experiment 1), 3 (Experiment 2), and 5 (Experiment 3), the authors could regress tonic LR and phasic LR onto both phasic FBR and tonic FBR simultaneously. Models where LR = c1 phasic-FBR + c2 tonic-FBR could be considered and compared against univariate models, LR = c phasic-FBR and LR = c tonic-FBR using AIC or BIC to determine whether a mixed model that predicts LR with both phasic and tonic FBR is warranted.
  
  Irrespective of the result, the authors should be careful (Concerns 6-1 and 6-2) to state that when levels of tonic-FBR were controlled in Figure 8 (which is likely the cleanest way to look at the role phasic FBR plays in learning), phasic-FBR showed a clear influence on LR.
  
  Major Concern 7:
  
  On Line 577, it states the "hand was automatically returned to the starting position". Does this mean that the robot moved the hand back to the start location? If so, was the hand ever released from a force channel in between the perturbation trial and the following channel trial? A concern is that the holding forces from the perturbation trial could "bleed over" into the forces applied during the subsequent channel trial if the subject always remains in a channel trial in between the trials. Suppose we label the 3-trial structure as Channel 1 (C1) - Perturbation (P) - Channel 2 (C2). The authors should confirm that the holding forces on P are not correlated with baseline force (i.e., the channel force prior to movement onset) in C2. I do not expect there to be a strong correlation given that the learning responses in Figs. 2d, 3c, and 5e-g appear near-zero at t=-400ms, but this should still be verified.
  
  Major Concern 8:
  
  In Supplementary Figure 1, there appears to be an error in the "Amplitude of phasic LR (N)". In Supplementary Figure 1f, the phasic LR magnitudes appear in line with Supplementary Figure 1d, but there is a mismatch in the magnitudes for the phasic LR in Supplementary Figures 1e & 1d (the phasic LR magnitudes appear to be too low in Supplementary Figure 1e, peaking at around 0.1N when they should peak at around 0.15N).
  
  Major Concern 9:
  
  The authors should provide a Limitations section, highlighting unanswered concerns listed above, mixed results, and differences from prior work. These are touched upon in the Discussion section (particularly in Perspectives for future studies) but should be expanded further. At a minimum, the authors should consider including a discussion of the following points:
  
  Differences from prior work:
  
  9-1: There are methodological differences between this work and past studies highlighted by the authors. It could be that there are multiple error-based learning mechanisms that drive the FBR. Here, the authors find that visually-driven FBR responses do not drive LRs at a "common temporal shift". Instead, LRs are broadly expressed at the start of the movement (regardless of when the FBR was timed). However, tasks that have other components (e.g., a proprioceptive error) might invoke different learning mechanisms. For example, proprioceptive-driven FBRs might invoke LRs that have different temporal properties than visually-driven FRBs.
  
  9-2: As noted by the authors, Reference [10] studied FBR-driven learning in muscle commands, as opposed to forces. Muscle responses may have differing temporal and/or magnitude (for phasic/tonic) components that qualitatively differ from the force-based conclusions made here. Thus, the learning mechanisms at the muscle level may differ from those observed at the force level.
  
  9-3: While the tonic FBR is a strong predictor of the learning response in this experiment, most of the experimental conditions are done where the cursor remains deviated from the target throughout the trajectory and into the holding period. This differs from past work on feedback error learning, where feedback was veridical, and the cursor (and hand) ended on the target. This persistent displacement from the target during the prolonged holding period may influence the learning process and could enhance the tonic-FBR contribution to learning.
  
  9-4: The authors state in the present study that subjects were told not to use "explicit strategies" and move as straight as possible to the target. For past work, participants were able to use explicit strategies during feedback and learning responses. It could be that the lack of (or reduction in) explicit responses alters single-trial learning mechanisms relative to past work.
  
  Alternate models:
  
  9-5: No alternate models are considered here for the tonic-phasic relationship. Other models could relate these two processes differently, which could lead to different conclusions.
  
  9-6: It is assumed that both the tonic and phasic controllers are active at the same moment in time and sum linearly to generate the overall force output. Other models could have applied each "controller" to different phases of the reach in a differential manner (e.g., two separate controllers, a moving controller and a holding controller operating at different moments in time).
  
  9-7: It is assumed here that the LR should be a scaled FBR: y = ax. Conclusions made here could change if the LR is due to multiple processes, FBR-driven learning only being one of them. Other models where the LR is driven by both FBR and the sensory error were not considered here.
  
  Mixed results:
  
  9-8: While tonic FBR was a good predictor of phasic LR at the group-level (e.g., 4g), it did not predict phasic LR between subjects (Fig. 7f) and in fact tended toward a negative relationship.
  
  9-9: Phasic FBR predicts Phasic LR at the trial-level (Figure 8h) but not as well at the subject-level (Figure 7d).
  
  9-10: Overall, with the exception of Figure 8, most analyses look at the relationship between LR and tonic FBR or phasic FBR separately. In Figures 4c, 4d, 6c, 6d, and 7d-g, the authors look at the marginal effect of tonic or phasic FBR on learning, but do not control for variations in the other FBR component (e.g., they look at phasic FBR on tonic LR, but do not control for tonic FR). The only analysis that controls for the other component is in Figure 8, suggesting that both tonic and phasic FBR contribute to LR.
  
  Minor concerns
  
  (10) I'm not sure I follow the cross-correlation analysis in Figure 3. Overall, to me, both the FBR in Figure 3b and the LR in Figure 3c look quite similar in their temporal profiles, irrespective of the shift magnitude. The authors state on Line 158 that their cross-correlation analysis "...revealed that the overall shape of the cross-correlation function changed systematically with error magnitude". However, to me, in Figure 3d, the shape of the many curves looks similar.
  
  What is confusing to me here is including a phasic movement period and a tonic holding period inside the cross-correlation. The tonic "static" component during the holding period will likely greatly influence how well the cross-correlation is able to match the phasic peaks during the LR/FBR moving periods. In other words, the reach consists of a "movement" and a "holding" period. But the cross-correlation is blending the two together, and thus, I am not sure how reliable this measure will be for truly estimating the temporal shift between conditions. For example, if you look at the shaded gray area in Figure 3b, the "Movement period" looks almost identical in temporal properties. The "peaks" and "troughs" happen at nearly the same moment in time across all conditions. The onset of the FBR at approximately 200 ms is also identical across shift magnitudes. Thus, to me, the temporal properties of the FBR seem very similar during the moving period (where the FBR is responding to the error). But including the holding force (the tonic force after the 600ms period) seems to be causing the cross-correlation function to estimate differences at very high lags. If these differences are being driven solely by the holding forces, I am not sure this is meaningful.
  
  It seems that the authors might want to repeat this analysis, excluding the holding force period from the calculation of the cross-correlation coefficients.
  
  (11) It would appear that the authors have a significant main effect of their ANOVA (p=0.028) in Fig. 3f, but no post-hoc tests are reported to indicate which group means differ.
  
  (12) When plotting FBR, a [0,600]ms period is shaded as the movement period. On Line 580, it says that feedback was provided on peak movement speed. Was any feedback provided as to the movement duration? If not, did participants complete the movement within the 600 ms window labeled as movement speed? Were movements during perturbation trials longer than non-perturbed trials?
  
  (13) Over what time period is Equation (1) fit to the data? Is it the [-200,700]ms window shown in Figure 4a? A concern is that including too much of the "holding period" in the model fit will cause the model to be biased toward fitting the holding period well and not the moving period. This, in turn, might lead to better estimates for the beta parameter than the alpha parameter. In addition to clarifying the fitting process, the authors should also include R2 values for the moving and holding periods separately.
  
  (14) The procedure is clear from Figure 1e, but it would be helpful on Line 91 to explain that "collapsing" FBR and LR across rightward and leftward means that the FBR and LR were negated for one of the directions (prior to collapsing).
  
  (15) Are the "Amplitude of tonic LR (N)" supposed to be negative in Figures 6c and 6d?
  
  (16) Overall, the parameter distributions in Figures 4e and 4f are similar to those in Supplementary Figures 1c and 1d. The FBR amplitudes look nearly identical. Only the Phasic LR amplitudes in Supplementary Figure 1d appear to be larger than the Phasic LR amplitudes in Figure 4f. Can the authors provide an intuition for why the phasic LR contributions increase when T and sigma parameters are allowed to vary between participants?
  
  (17) There are two points where the authors should consider softening their language:
  
  17-1: The authors state at multiple points (e.g., Line 154) that "...the waveforms of LRs remained largely similar across conditions, while their amplitudes showed only modest modulation with cursor shift magnitude". However, in Figure 3c, the LR amplitude for the 0.4 cm shift is approximately 0.2 N, and the LR amplitude for the 3 cm shift is approximately 0.3 N - a 50% increase. The authors should consider softening the language here to appreciate the variations in LR amplitude.
  
  17-2: On Line 258, it is stated that the FBR during holding "diverged only slightly" for the 16 cm condition in Fig. 5b. This seems too strong a statement. The "Maintained" FBR holding force is about 0.2 N, and the reverse is about 0.1 N. Thus, the "Maintained" condition is doubled. While I agree that the LR diverges more than the FBR (i.e., 5b vs. 5e), I think the language choice here should be more careful.
  
  Review 1
Visit annotations in context

Tags

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.30.702978v3
www.biorxiv.org www.biorxiv.org

Stage-Specific Threats Reveal the Inadequacy of Adult-Centered Conservation

1
1. Public_Reviews 29 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This valuable study analyses correlations between traits of Chinese frog species and their Red List status, finding differences between adults and larvae and thus pointing to the importance of considering different life-cycle stages in this and possibly other animal groups when assessing species extinction risks. The current study is, however, incomplete because of unclear threat categories for tadpoles, the omission of other key species traits, and insufficient statistical analysis.
  
  Thank you very much. We have revised the manuscript according to the reviewers' comments. The parts highlighted in red in the manuscript are the revised portions.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  The manuscript shows that different traits of adults and larvae correlate with Red List status. The authors argue that this shows a big gap in the conservation of amphibians and that the traits of all life stages should be taken into account in amphibian conservation. Specifically, amphibian conservation should do more for the habitats where the larvae live.
  
  The manuscript is well written and easy to understand. The methods are sound.
  
  While the study will make an interesting contribution to conservation science, there are many things that I disagree with.
  
  (1) I don't think that amphibian larvae and their requirements are a "blind spot" as the title suggests. When reading the manuscript, I didn't learn how conservation practice should change in response to the results.
  
  Thank you very much for your suggestions. The description of the 'blind spot' was inappropriate, and we have revised it. Investigating the relationship between life history traits and threat status can help us understand which species are more vulnerable to extinction. Furthermore, we can predict the potential threat severity of species that have not yet been assessed. Because we still lack knowledge about the biodiversity of many taxonomic groups. For example, as of early 2024, over 34% of Chinese anuran species have been described in the last ten years, and 100 - 200 new species are still being discovered globally each year. Under these circumstances, given the current investment in biodiversity conservation, it is nearly impossible to assess the threat status of every species and develop conservation strategies. Therefore, predicting the threat status of species is very important for biodiversity conservation, as it will provide support for the subsequent formulation of specific conservation policies. Among the already described animals species, most have complex life history cycles. Moreover, species face threats not only at the adult stage; those with certain traits at other life stages may also be vulnerable to threats. For example, our study takes amphibians as an example and shows that groups with larger body sizes at the tadpole stage may face more serious threats.
  
  (2) I wonder whether the relationship between species traits and extinction risk is of great importance for conservation. If a species is Data Deficient on the IUCN Red List, then species traits could be used to predict its Red List category. However, for other conservation projects, I don't see how this would work. How would traits be linked to captive breeding, conservation translocation, pond construction or habitat management in general? In some cases, I can envision a link between species traits and pond hydroperiod.
  
  Thank you very much for your suggestions. Understanding the relationship between traits and threat status is of great importance for the conservation policies and the allocation of conservation resources, especially when conservation resources are insufficient. As mentioned earlier, the current conservation resources are insufficient to support us in surveying and assessing every Data Deficient (DD) species, not to mention the large number of new species being discovered each year. By predicting threat status, we can identify which groups or species should be prioritized for research, such as population size and distribution range surveys, so that specific conservation strategies can subsequently be developed.
  
  (3) Species traits are body size and morphological traits. That makes sense. However, one of the species traits was microhabitat. I find it far-fetched to call habitat a species trait. This is standard habitat ecology. It is well known that habitats matter and that different habitat types face different threats, and consequently, the species that live in those habitats. Furthermore, habitat and morphology may be confounded. For example, tadpoles in lentic and lotic habitats have very different morphologies. So is it habitat or morphology?
  
  Thank you very much for your suggestions. The type of habitat in which a species lives affects the threats it faces. In many studies on the relationship between extinction risk and traits, microhabitat or habitat type is widely used as a predictive variable. For example, in studies on Squamata, whether a species is distributed on islands or peninsulas has also been included as a trait. Following your suggestion, we have revised the sentences to refer to 'morphological traits and microhabitat information'. Many morphological traits of species are related to habitat selection, but not all traits associated with habitat selection have been measured or have sufficient data. Therefore, it is necessary to include microhabitat type as an independent variable. Additionally, we calculated the Variance Inflation Factor (VIF) prior to the regression analysis to ensure that the analysis was not affected by multicollinearity.
  
  (4) I don't know how the threat status of Chinese amphibians is determined. IUCN has multiple reasons why a species can be Red Listed. One reason is range size, and another reason is population decline. Personally, I don't think they should be pooled in an analysis because they are fundamentally different reasons why a species has a high extinction risk. A reduction in population size of greater than 30% in 10 years or 3 generations is not the same thing as a small distribution range. Another issue is that IUCN developed the Green Status of species. The Green Status shows that even a species which is LC on the Red List may be significantly depleted.
  
  Thank you very much for your valuable suggestions. The assessment method of the China Biodiversity Red List is the same as that of the IUCN Red List, both of which are based on population size and area of distribution. We fully agree with your point that analyses should be conducted according to specific threat types. Unfortunately, the full report of the latest version of the China Biodiversity Red List, released in 2023, has still not been published. Therefore, we were unable to perform the relevant analyses.
  
  (5) The species traits in Table 1 are mostly functional/morphological and body size related (and microhabitat). While there may be correlations between traits and Red List status, it is unknown whether this is correlation or causation. In addition, it is difficult to know the conservation interventions that may be necessary now that we know that relative head with and Red List status are correlated.
  
  Thank you for pointing out the important distinction between correlation and causation. Your comment is very insightful, and we have revised our manuscript to further clarify the scope and limitations of our study. The aim of our study is to identify which traits show statistical associations with extinction risk, thereby providing testable hypotheses for future research. We acknowledge that the mechanisms underlying the associations between certain morphological traits (e.g., head length, tympanum diameter) and extinction risk remain unclear, and these findings cannot yet be directly translated into well-established management measures. Nevertheless, the value of our study lies precisely in generating hypotheses about traits that warrant prioritized investigation of their causal mechanisms, as well as offering clues for the initial allocation of conservation resources. Following your suggestion, we have discussed the limitations of the study in the Discussion section of the manuscript.
  
  (6) In the discussion, the authors explain why body size and other traits may affect extinction risk and whether there is a causal relationship. I agree that body size may have a direct effect because larger species are harvested more frequently (it was interesting to learn that tadpoles are harvested as well). However, as macroecological studies show, smaller species often have larger populations than larger species. Abundance may matter.
  
  Thank you very much for your suggestion. Following your advice, we have revised the discussion section regarding body size.
  
  (7) I found it much harder to understand why relative head length and tympanum size correlated with Red List status. I wasn't convinced by the arguments in the discussion. Typanum size may be related to hearing and anthropogenic noise. Several studies are cited which show that frogs alter their calling behaviour in response to noise. Crucially, however, they describe changes in behaviour or properties of the advertisement call, yet none show that noise has effects on population viability. If some anthropogenic stressor affects individuals, then this does not mean that it will cause a population decline. When IUCN published the second global amphibian assessment, did they list noise as a major threat to amphibians?
  
  We appreciate your insightful comments and fully agree with your assessment. Indeed, the hypothesis that noise threatened anuran amphibians lacks direct evidence. While relevant studies indicate that anthropogenic noise causes auditory masking in anurans and reduces individual reproductive success, the IUCN has not listed noise as a primary threat to amphibians. Although acoustic communication is vital for amphibian reproduction and is susceptible to noise interference, there is currently no definitive evidence proving that noise extensively impacts amphibian survival. Therefore, in the revised manuscript, we retained it as a hypothesis to be tested and explicitly clarified that current evidence is limited to behavioral changes. Regarding the correlation with relative head length, we acknowledge that the underlying mechanism remains unclear; it may stem from phylogenetic signal residuals or unidentified ecological factors (such as diet or locomotor ability). In the Discussion, we revised this part as a correlation requiring further investigation.
  
  (8) There are statements that the tadpole stage is the most important stage: "a critical period for amphibian survival" (line 78-79). While there is high mortality in the tadpole stage, tadpole survival is rather unlikely to affect population survival. Many population models show this. See, for example, Biek et al. 2002 in Conservation Biology. Other papers have argued that the postmetamorphic juvenile stage is most important (Petrovan and Schmidt 2009 Biological Conservation).
  
  We greatly appreciate your comment. We agree that the original statement was overly absolute. The most critical life stage for population persistence can differ across species, and many studies have shown that other stages may be more important. Accordingly, we have revised this sentence as you suggested.
  
  (9) The authors repeatedly make the statement that amphibian conservation should focus more on the tadpole stage. I don't understand why this statement is made. For example, a major activity in amphibian conservation is the restoration and de novo construction of ponds (see Calhoun et al. 2014 PNAS, Moor et al. 2022 PNAS). Ponds are habitats for tadpoles. Others removed fish from amphibian breeding sites because fish prey on tadpoles (and adults; see Vredenburg 2004 PNAS). Semlitsch (2002 in Conservation Biology) argued that the management of pond hydroperiod is a critical element of amphibian recovery plans. Ponds should be temporary because this effectively removes predators that consume tadpoles. Clearly, the tadpole stage is not a neglected stage in amphibian conservation.
  
  Thank you for pointing this out. The literature you cited (Calhoun et al., 2014; Moor et al., 2022; Vredenburg, 2004; Semlitsch, 2002) convincingly demonstrates that the tadpole stage has received a certain degree of attention in amphibian conservation practice. Our original statement was indeed problematic. What we intended to convey is that information on the tadpole stage needs to be integrated into conservation assessment frameworks and conservation planning. For example, many studies on the relationship between functional traits and threat extent have not included tadpole-related information. Compared with our knowledge of adult amphibians, we know far less about tadpoles, and for many species, information on the tadpole stage is entirely lacking. Therefore, we call for tadpoles to receive greater attention in future research relative to the current situation.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  Conceptual problems:
  
  (1) Many conservation measures for amphibians target larvae; thus, globally, this is not a blind spot. If this is different in China, it would be important to point this out.
  
  We thank the reviewer for the thoughtful comment. We recognize that the tadpole stage has indeed received attention in amphibian conservation practice, and our original statement was therefore imprecise. Our intended argument was that tadpole-stage information should be integrated into conservation assessment frameworks and conservation planning. For instance, many studies examining the relationships between functional traits and threat extent have failed to include data on tadpoles. Our understanding of tadpoles remains far more limited than that of adult amphibians, and for a large number of species, no information on the tadpole stage is available. Consequently, we advocate for substantially greater research attention to tadpoles than they currently receive. We have revised the text accordingly.
  
  (2) While traits may be used to predict Red-List status, it is not clear how they could inform conservation measures. This should be discussed.
  
  Thank you for your comment. The aim of our study is to identify which traits show statistical associations with extinction risk, thereby providing testable hypotheses for future research. We acknowledge that the mechanisms underlying the associations between certain morphological traits (e.g., head length, tympanum diameter) and extinction risk remain unclear, and these findings cannot yet be directly translated into well-established management measures. Nevertheless, the value of our study lies precisely in generating hypotheses about traits that warrant prioritized investigation of their causal mechanisms, as well as offering clues for the initial allocation of conservation resources. Following your suggestion, we have discussed the limitations of the study in the conclusion section of the manuscript.
  
  (3) The Red-List categories may not be appropriate to link traits to extinction risk. It would be important to explain how these are defined for China and how this may affect the analysis (e.g. linking larval traits to larval extinction risks would be difficult if Red-List criteria do not consider larvae).
  
  Thank you very much for your suggestions. The assessment method of the China Biodiversity Red List is the same as that of the IUCN Red List, both of which are based on population size and area of distribution. The assessment process is independent of species' morphological traits. Consequently, analyzing correlations between traits and Red List categories does not constitute circular reasoning or contain any inherent logical contradiction. On the contrary, it is precisely because the two are independent that statistically significant associations between traits and extinction risk can have predictive value and inform conservation actions. In the revised manuscript, we clarified the independence of Red List assessments and rephrase any potentially misleading wording (e.g., changing "threat category of tadpoles" to "threat category of the species (assessed based on adults)").
  
  Methodological problems:
  
  (4) Choice of traits. Are morphological traits sufficient (add e.g. fecundity)? Justify the use of habitat traits (also, if additional ones would be included: geographic and altitudinal ranges, habitat specificity).
  
  Thank you for your suggestion. We fully agree that traits such as geographic range, elevational range, fecundity, and habitat specificity have important effects on extinction risk. The core objective of this study is to compare the stage-specific differences in the associations between extinction risk and morphological and microhabitat traits of adults versus tadpoles. Moreover, spatial traits such as geographic range are inherently highly correlated with the threat status of species, and including them might mask life-stage-specific signals. We will acknowledge this limitation in the discussion and identify the above-mentioned traits as important directions for future research.
  
  (5) Model choice: models have high uncertainty, thus better use model averaging and AICc instead of AIC. Overall, the statistical analysis and model selection procedure are poorly described; only summary results are presented.
  
  We greatly appreciate the reviewer's suggestion. Accordingly, we re-analyzed the data following your advice. In addition, the description of the methods has been supplemented.
  
  (6) Caveats: the data only allow for correlational analysis; causation cannot be derived from observational data. Furthermore, with a limited number of species, the number of predictors should not be too large.
  
  Thank you for your suggestion. Studying the relationship between traits and species threat status is important in conservation biology. Although such studies can only reveal statistical associations between traits and extinction risk rather than infer causality, they can generate hypotheses to facilitate future research. Additionally, this type of study can help predict the threat severity of unevaluated species, which is highly valuable for developing biodiversity conservation plans. In this study, 299 species were included in the analysis, and nine predictor variables (eight morphological traits plus one microhabitat type) were used. The ratio of sample size to number of variables was approximately 33:1, and variance inflation factor (VIF) tests indicated that multicollinearity was within an acceptable range (VIF < 5). Therefore, the risk of model overfitting is low. We will add this clarification in the revised manuscript.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) My first major concern is the species threat categories for tadpoles. The authors obtained the extinction risk data from the China Biodiversity Red List or IUCN. However, the assessment of threat categories, whether by the China Biodiversity Red List or IUCN, is based solely on adults. That means that the threat categories for both adults and tadpoles are the same, which can be seen in Figure 1. Since there is no specific assessment of threat categories for tadpoles, I have concerns about whether it is reasonable to relate species traits of tadpoles to the extinction risk for adults. I think it is one of the reasons why there is no study examining the association between functional traits and extinction risk in tadpole stages.
  
  We thank the reviewer for raising this important point, as it addresses a key prerequisite issue. The Red List assessment evaluates species, not individual life stages. The threat categories of both the IUCN and China Biodiversity Red Lists are determined based on criteria such as population size and geographic range of the species. The assessment process is independent of species' morphological traits. Consequently, analyzing correlations between traits and Red List categories does not constitute circular reasoning or contain any inherent logical contradiction. On the contrary, statistically significant associations between traits and extinction risk can have predictive value and inform conservation actions. In the revised manuscript, we will explicitly clarify the independence of Red List assessments and rephrase any potentially misleading wording (e.g., changing "threat category of tadpoles" to "threat category of the species (assessed based on adults)").
  
  (2) My second major concern is about the Data Analysis. The authors built and compared three types of models, i.e., PGLS_BM, PGLS_OU, and GLS_no_phylogeny. They claim that the OU-based PGLS model provided the best fit for both adult and tadpole datasets. Although the result seems reasonable, it is not clear how the OU-based PGLS model was obtained and what it exactly means. It seems to be a full model including all the predictor variables. However, since eight morphological traits and one microhabitat data of both adults and tadpoles were collected, there should be 29-1=511 candidate models. Unless the best model has an Akaike weight (wi) > 0.90 in all the OU-based PGLS models, it has substantial model selection uncertainty. If this is the case, the model average should be used, and weighted estimates of regression coefficients and unconditional standard errors that incorporate model selection uncertainty are better statistical methods (Burnham & Anderson, 2002).
  
  Thank you very much for your suggestion. Species' traits are related to evolutionary relationships, with more closely related species tending to be more similar. In the original manuscript, the three models we compared (PGLS_BM, PGLS_OU, GLS_no_phylogeny) were intended to select the optimal evolutionary covariance structure. Since we were more interested in the differences between adults and tadpoles, after selecting the OU structure, we actually used a single full model that included all traits to estimate the regression coefficients for each factor. Following your advice, we have added a model averaging analysis and revised the manuscript accordingly.
  
  (3) In addition, the Second-Order Information Criterion AICc, but not AIC, should be used for model selection. You have at least 9 variables (eight morphological traits and one microhabitat data) or 11/13 variables for the parameter estimates (Table 1). However, you have only 299 species included in the analysis (n = 299), which is relatively small compared to the number of variables (n/k << 40). Therefore, the AIC corrected for small sample size (AICc) should be used.
  
  We greatly appreciate the reviewer's suggestion. Accordingly, we re-analyzed the data following your advice.
  
  (4) Previous studies found that amphibian species with large body size, restricted geographic and elevational ranges, low fecundity or high habitat specificity are frequently predicted to have higher extinction risk (Cooper et al., 2008; Sodhi et al., 2008; Botts et al., 2013; Lips et al., 2003; Murray & Hose, 2005). The authors only included morphological traits and one microhabitat data point in the analyses. I wonder whether they can collect more trait data associated with extinction risk, such as geographic and elevational ranges, fecundity traits, or diet/habitat specificity, so as to gain more insight into the study.
  
  Thank you for your suggestion. We fully agree that traits such as geographic range, elevational range, fecundity, and habitat specificity have important effects on extinction risk. The object of this study is to compare the stage-specific differences in the associations between extinction risk and morphological and microhabitat traits of adults versus tadpoles. Moreover, spatial traits such as geographic range are inherently highly correlated with the threat status of species, and including them might mask life-stage-specific signals. In the Methods, we acknowledge this limitation and identify the above-mentioned traits as important directions for future research.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.17.712346v3
socialsci.libretexts.org socialsci.libretexts.org

2.2: Sources of Social Knowledge

1
1. kay1a444 28 Jun 2026
  
  in Public
  
  We may dislike people from certain racial or ethnic groups because we frequently see them portrayed in the media as associated with violence, drug use, or terrorism. And we may avoid people with certain physical characteristics simply because they remind us of other people we do not like.
  
  As it goes for associational learning through the unjustified racial prejudices, I feel television also has an impact on this because some movies or shows will paint a character of color to be a stereotypical role to prove a point to viewers which makes people of the opposite color think about how that is true for tht race since it's a stereotype.
Visit annotations in context

Annotators

kay1a444

URL

socialsci.libretexts.org/Bookshelves/Psychology/Social_Psychology_and_Personality/Principles_of_Social_Psychology/02:_Social_Learning_and_Social_Cognition/2.02:_Sources_of_Social_Knowledge
socialsci.libretexts.org socialsci.libretexts.org

4.2: Indigenous Ways of Knowing

1
1. BoosieHilt 27 Jun 2026
  
  in Public
  
  Dr. Cutcha Risling Baldy (Hupa, Yurok and Karuk and an enrolled member of the Hoopa Valley Tribe in Northern California) Dr. Cutcha Risling Baldy (Native American Studies, Humboldt State University) researches Indigenous feminisms, California Indians and decolonization. In her blog post titled “Give It Back: Publishing and Native Sovereignty,” Cutcha writes: I’ve become obsessed with the idea of finding out what would happen if I started mourning loss of land, loss of lives, loss of fish - if my grief was on display. As an academic I’ve internalized the message that somehow the work isn’t supposed to be deeply personal. Like I don’t carry the blood of my ancestors in my veins, blood that has run rivers red as we held on to the bodies of slaughtered children and wailed into the night sky asking ourselves “why” or “what are we supposed to do now?” Like we didn’t sing or dance for all those we lost. Like that song doesn’t come from me now. Like I don’t close my eyes and hug my daughter just a little bit tighter at night because there was a time when they would have ripped her from my arms and sold her. And I would never stop looking for her. I would do anything to find her again. Like my ancestors didn’t search until they couldn’t search any longer. Like we don’t continue to search, or grieve even now. And we live here in this space that they stole from us. This place where we buried our beloved. Where we sing and dance and laugh and love. This place where we cried tears of joy and sadness and from laughing so hard our stomachs hurt and from hurting so hard we thought we’d never laugh again (2020). In just 3 short paragraphs Dr. Cutcha Risling Baldy describes what it is like to be a California Indian woman today. She bids us to think about land theft, loss and destruction. She makes us think about the significance of intergenerational trauma and how violence doesn’t just hurt the victim. Cutcha calls us to think about missing and murdered indigenous women and places that California Native people deem sacred. Her life is place based. She speaks and writes from an internal place that is spiritually, emotionally, and intrinsically connected to where her ancestors and she were raised, and where their creation happened. Colonization may have pushed many of us from our homelands, but we return. See more on Dr. Risling Baldy's work on the Hupa Coming-of-Age Dance as decolonizing praxis under Chapter 8, section 8.6: Transformational Liberation through Love.
  
  This part stood out too me also because Dr. Cutcha Risling Badly explains how the effects of colonization are still in the mist today. She connects with the loss of the land, family, and the culture to her own life, and this made history feel extremely personal instead of just something that only was a thing of the past. It’s very noticeable how she talks about intergenerational trauma and how the hurt and pain from the past situations continue to affect the family’s generation after generation. In this section I see that even after all that the Native communities battled; they continued to protect their culture, traditions, and connections to the land they came from the land that belong to them.
Visit annotations in context

Annotators

BoosieHilt

URL

socialsci.libretexts.org/Bookshelves/Ethnic_Studies/Introduction_to_Ethnic_Studies_(Fischer_et_al.)/04:_American_Indian_Native_American_Studies/4.02:_Indigenous_Ways_of_Knowing
www.biorxiv.org www.biorxiv.org

Long-Range Coupling of Posterior Cell Addition and Anterior Vacuolation Provides Robustness in Notochord Elongation

1
1. EMBOpress 26 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  This work focuses on zebrafish notochord morphogenesis during axial elongation. In particular it dissects the role of YAP signalling on regulating the balance between caudal cell addition with the cell enlargement occurring rostrally through vacuolation.
  
  The article is timely to the field and includes several important experiments. The overall presentation and written style are good, citations are adequate and there is a clear effort to integrate experiments and mathematical modelling from the outset. The logic behind experiments is sound and the conclusion coherent (even if not totally unexpected given the literature): YAP affects progenitor addition which in turn changes packing, vacuolation and axis length. I just have a few points that could make the article clearer and more persuasive.
  
  We thank the reviewer for these positive comments about our manuscript. We would like to reiterate the two main unexpected findings based on our results:
  
  While YAP mutants display a defective notochord (Kimelman et al., 2017; eLife) it has not been clear what specific role that YAP signalling is playing during notochord development. Therefore, the finding that Yap signalling plays a role in controlling the rate to notochord progenitor addition and represents a novel discovery.
  
  The observation that the notochord can buffer its elongation rate against an increased influx of progenitors is novel and counter intuitive. Our current understanding of tissue elongation depends on the central idea that the addition of progenitors directly impacts elongation rate. Here we show for the first time that this has minimal impact at the tissue level using the notochord as an example. Major points
  
  - Last section of results is difficult and confusing. After analysing vgll4b loss-of-function line, effectively over-activating YAP, the focus is on YAP inhibition using Verteporfin.
  
  o Concerns on Verteporfin: the molecule has been widely used to module YAP, but there are also plenty of studies suggesting it is non-specific (also degrades YAP, has 14-4-3σ dependency and induces stress). I would consider an alternative: truncated TEAD, LATS over-expression or gain-of-function phosphomimetic versions of YAP.
  
  o Presentation: regardless of point above, Verteporfin's role on YAP should be verified in the system. As such it is crucial to include: images of 4xGTIIC, noto and YAP stains after treatment. Only then inspect the effects on vacuolation and different treatments.
  
  As suggested by the reviewer, we have added a supplementary figure validating the verteporfin treatment, including quantification of GFP reduction across the three tissues and quantification of notochord staining. We did not include Yap1 immunostaining data because the signal quality was insufficient for reliable analysis.
  
  A simple over-expression experiment will not allow the spatial and temporal control required to test our hypothesis. Yap has a known function in gastrulation, so we need experiments that allow us to perturb Yap activity only at posterior body elongation stages. This has been achieved with the vgl4b experiments shown in the manuscript, as this gene is specifically expressed in the tailbud at these stages. In addition to the full verification of verteporfin's impact on YAP activity, we feel this is sufficient evidence to support our conclusions.
  
  - In Fig 3F, noto HCR staining is taken as evidence for progenitor exhaustion/ faster depletion. Other scenarios would be possible without more direct demonstration. Evidence (either experimental or literature) that YAP is not involved in self-renewal or induction of these progenitors at these stages should be discussed.
  
  We have concluded that the smaller volume of noto expressing cells is consistent with the faster depletion of the progenitor pool based on the direct observation of increased progenitor addition rate from photo-labelling experiments (Figure 3A,B). As suggested by the reviewer, we have now quantified cell divisions within the midline progenitor population and found no significant differences between mutant and control embryos. These data have now been included in Supplementary figure 3.
  
  - Individual datapoints in Fig 3C and 4D should be shown.
  
  These data have now been added to the figures
  
  Additional justification is needed as to why spinal cord is the best to benchmark displacement. Additionally looking at this with respect to mesoderm migration could capture another set of progenitors and behaviour/ displacements.
  
  Photolabels within the pre-somitic mesoderm are difficult to interpret as the high amount of cell rearrangement in this tissue leads to a spreading out of the labelled clone in a manner that then makes it difficult to assess tissue displacement (see Figure 2D,E; Thomson et al., (2021) Cells and Development). In contrast, aprevious paper has shown that notochord-spinal cord displacements can be mapped in a reliable manner across the anterior-posterior axis which motivated our choice here (McLaren and Steventon (2021) Development).
  
  - Plotting vacuole area in Fig.4I vs A-P position (similar to plots 1H, 2F-H) could further strengthen the point of gradual (linear) vacuolation.
  
  As suggested by the reviewer, we have plotted vacuole area as a function of position for the verteporfin treatment experiments, and these data have now been included in Figure 5.
  
  Minor points:
  
  - Scheme of Fig1A could benefit from having the info of zebrafish timeline (hpf)
  
  The scheme has been modified indicating zebrafish timeline
  
  - Figure 3B, what was time 0?
  
  Timepoints have now been included in the text and figure legend
  
  - The authors should address whether Verteporfin-treated mutants are rescued or whether the compound overwhelms the genetic effect.
  
  Given that verteporfin will impact Yap signalling in a global manner, whereas the vgl4b have a localised over-activation of Yap signalling, we think this experiment would be difficult to interpret and would likely be non-informative.
  
  - Cell density is an elegant measure but quite abstract. A plot of cells detected at each AP position would be quite valuable to reinforce more cells are being added to a relatively constant area.
  
  As suggested by the reviewer, we have now plotted these data for mutant and controls and also for verteporfin treatments. These data have now been included in supplementary figures 3 and 7.
  
  Reviewer #1 (Significance (Required)):
  
  Significance included above.
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  Summary
  
  Camacho-Macorra et al. investigate the mechanisms of axis extension in zebrafish embryos, focusing on the notochord and its two key elongation processes: progenitor addition (occurring early and posteriorly) and vacuolization (occurring later and in an anterior to posterior sequence). The authors first develop a mathematical model to predict notochord elongation dynamics by integrating these processes. They demonstrate that the YAP signaling pathway is active in both the notochord and its progenitors during axial extension. Their analysis reveals that vgll4b, an inhibitor of YAP, is expressed in the same regions. Knockdown of vgll4b results in YAP hyperactivation in the notochord and posterior progenitor regions, leading to increased progenitor recruitment into the notochord and a reduction in the progenitor pool. The effects of this mutation on extension are most pronounced during the late phase, which is dominated by vacuolization. The authors observe smaller vacuoles in mutants during this phase. However, early (but not late) YAP inhibition decreases notochord cell density and increases vacuole size, suggesting that YAP primarily regulates notochord progenitor uptake, which indirectly affect vacuolization.
  
  Major Comments
  
  The authors propose that YAP activity mediates a long-range feedback mechanism linking posterior progenitor addition to anterior vacuolization. Two lines of evidence are presented to support this idea. First, there appears to be compensation for tissue length during Phase 2, when both progenitor addition and vacuolization occur. Second, temporal YAP inhibition experiments show that early, but not late, YAP inactivation affects both cell addition and vacuolization. While these observations are intriguing, they do not conclusively demonstrate spatial long-range coordination. Instead, the global decrease of vacuole size could be a simple delayed consequence of cell density increase or cell disorganization at the posterior end without involving a long-range feedback along AP axis. Claiming that such long-range feedback is taking place would require a more precise characterization and/or the identification of its nature (chemical, mechanical).
  
  We would like to thank this reviewer for this point, that we feel requires further clarification. As they suggest, the increased additional rate of posterior progenitors leads to a later impact on vacuolation, once these cells have reached more anterior parts of the body axis- creating an effective long-range feedback mechanism to link the two processes. However, this is not a direct propagation of a signal (mechanical or otherwise) across the length of the notochord, as may have been interpreted to be based on the previous framing of our conclusions. We have modified the title of our manuscript to place less emphasis on the 'long-range feedback', and included an additional discussion paragraph to make this point clearer.
  
  Furthermore, there are several caveats with the interpretations of the claims cited above. The authors do not show quantification of vacuole area using notochord cell segmentation as described in Fig 1C in vgll4b mutants at stages when progenitor addition is increased.
  
  This is an important point highlighted by the reviewer. We have now included analysis at 24 hpf, where we do see a significant reduction in vacuole area within the anterior part of the notochord during the buffering phase in vgl4b mutants- consistent with our model that reduced anterior vacuolation compensates for increased progenitor addition rate during this phase of notochord elongation (Figure 4E).
  
  The slope of internuclear distances in Supplementary Figure 4A at 27 hours post-fertilization suggests that vacuolization is initially normal (similar to wt context in Fig 1H), arguing against an early defect in vacuolation dynamics along the Anterior to Posterior axis that could compensate for extra addition of progenitors.
  
  We have revised Supplementary Figure 4 to present a direct comparison between mutant and control embryos at each time point analyzed. This analysis shows that within the mid-trunk region of the notochord, differences in cell size first emerge at the developmental stage when vacuolation becomes the primary driver of axis elongation. In addition, we observe a progressive decoupling of the scaling relationship in mutant embryos over time. As mentioned above- there is a significant difference in vacuole size within more anterior regions at 22.5 hpf that is consistent with the model that this is buffering against increase posterior addition.
  
  Finally, the timing of the analysis of the effect of Verteporfin treatments is unclear. According to the legend of Figure 4F, analyses for Treatment A (16-27 hpf) and Treatment B (27-38 hpf) were done at 24 hpf and 30 hpf, respectively. If this is the case, the 3-hour window for Treatment B may not allow sufficient time to reveal effects on vacuolization.
  
  We agree that the information regarding the verteporfin experiments was not clearly presented in the original figure, and we have therefore revised the schematic accordingly.
  
  To strengthen the claim of long-range coupling, the authors could:
  
  Provide direct measurements of vacuolization A-P dynamics/area during Phase 2, before the effect on notochord length in the mutant, to see if there is indeed a compensatory effect on notochord length for the additional accretion of notochord progenitors in the Vgll4b mutant.
  
  As suggested by the reviewer, we have added an earlier time point to the A-P area dynamics plot in phase 2, corresponding to a stage at which the effect on notochord length in the mutant is not yet detectable. At this stage, we observed no difference in vacuole area between mutants and controls. We have also included an earlier time point analysis in the anterior region of the axis, which shows a similar cell size difference to that observed later in a more posterior region (Figure 4F; see above response).
  
  Clarify the analysis timing of Treatment B to confirm that YAP inhibition during the vacuolization phase truly has no effect.
  
  This has now been clarified.
  
  Additionally, as a non-specialist, I found the distinction between the two modeling hypotheses difficult to follow. Specifically, it is unclear why the first hypothesis assumes YAP affects vacuolation rate, while the second assumes it affects vacuolation front speed. It is also not intuitive how front speed can be independent of vacuolation rate, as one would expect that if cells form vacuoles more slowly, the front should progress more slowly as well. Therefore, it could be good to clarify these aspects of the modeling part.
  
  We thank the reviewer for this comment and apologise for the lack of clarity in our description of the model. In our framework, the cell size profile along the AP axis of the notochord is governed by two distinct processes: (i) the addition of progenitors at the posterior tip, and (ii) vacuolation, which increases cell size and proceeds from anterior to posterior. We model the latter as a propagating wave with velocity vf, such that cells begin to vacuolate when the wave front reaches their position.
  
  Importantly, in the model these two aspects of vacuolation are decoupled: the front velocity vf determines when a given cell starts vacuolating, whereas the vacuolation rate J determines how fast the cell increases in size once the process has started. Biologically, this corresponds to distinguishing between the propagation of a trigger or competence signal along the tissue, and the execution of vacuole growth within each cell. Our reasoning was that they need not be strictly proportional: a signalling wave could propagate at a given speed even if the downstream cellular response is slower or faster.
  
  This is why we considered two alternative hypotheses: either YAP modulates the propagation of the vacuolation front (affecting vf), or it modulates the growth dynamics within each cell (affecting J). Our quantitative comparison with the experimental data supports the former scenario. This has now been clarified in the main text.
  
  Minor Comments
  
  While the study is technically sound, a few areas could benefit from improved clarity or additional data.
  
  An intriguing but puzzling finding is the reduction in the noto-expressing progenitor domain in vgll4b mutants, despite elevated YAP activity in progenitors. Intuitively, if YAP promotes progenitor maintenance or expansion, one might expect the noto+ domain to increase, not shrink. This paradox suggests that YAP may not only simply maintain progenitors but instead accelerates their differentiation or migration into the notochord (as stated in the manuscript and graphical abstract). Alternatively, YAP could only deplete the noto+ pool by driving premature entry into the notochord, though the lack of clear YAP upregulation in this domain would imply a non-cell autonomous role of YAP for this interpretation. The authors should discuss these possibilities more explicitly in the Discussion section and could consider including additional markers, such as proliferation assays or apoptosis markers, to clarify whether YAP affects progenitor proliferation, differentiation, or migration.
  
  As also suggested by the reviewer, we have included a cell proliferation analysis in Supplementary Figure 3 and have revised the Discussion section accordingly.
  
  In Figure 2B, the YAP activity reporter signal in the posterior floor plate is not immediately obvious. The authors should consider providing higher-magnification insets.
  
  As suggested by the reviewer, we have included higher-magnification insets in Figure 2
  
  In Figure 2C, the differences in tail shape between wild-type and mutant embryos are visually striking. If these differences have not been quantified or discussed, a brief comment in the text would be helpful.
  
  We did not see a consistent impact on the morphology of the posterior body, this has now been clarified in the main text.
  
  Supplementary Figure 6 describes embryo length differences in mutants but does not include a representative image. Adding one would strengthen the phenotypic description.
  
  As suggested by the reviewer, we have modified Supplementary Figure 6
  
  Figure 1C is not cited in the text as not associated with a result, but just a description of the approach that is used later in Fig 4I
  
  We have modified the text to include the appropriate figure reference.
  
  Finally, the authors might consider citing Michaud & Pourquié (2025) when presenting the role of hydrostatic pressure in axis elongation in the Introduction.
  
  We have now modified the text to include this citation which we agree is relevant to this work.
  
  Reviewer #2 (Significance (Required)):
  
  This study by Camacho-Macorra et al. presents a fascinating exploration of how YAP signaling and its inhibition by vgll4b coordinate progenitor addition and vacuolization during zebrafish notochord elongation. The work is well executed, with clear results and integration of mathematical modeling and experimental data. The findings shed new light on the molecular and mechanical regulation of axis extension, a fundamental process in vertebrate development. However, while the study is innovative and rigorously conducted, the central claim of "long-range coupling" between progenitor addition and vacuolization requires further substantiation. Addressing the points discussed below will make the study more convincing and accessible to developmental biologists and mechanobiologists alike.
  
  reviewer expertise: developmental biologist specialised in morphogenesis
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  In the studies conducted by Camacho-Macorra et al., the authors examine the extension of the body axis is zebrafish, focusing on the notochord. They specifically compare timepoints where progenitor addition to the notochord and vacuolization are important to drive axis extension. They generate a simple mathematical model of notochord extension and show that it recapitulates observations in vivo where progenitor addition and vacuolation drive tissue elongation. They further perturb the system by showing that YAP activity is localized to the midline progenitors of the notochord where when the competitive inhibitor of YAP vgll4b is perturbed it increases YAP signaling and results in increase progenitor addition to the notochord. They further describe a possible indirect-feedback mechanism linking YAP driven progenitor addition to the notochord with anterior vacuolation which when perturbed (i.e. increased YAP) results in reduced notochord elongation.
  
  Major Comments:
  
  NA
  
  Minor comments:
  
  1.Figure 1B - please put the model equation in the figure or at least point out what variables of the equation refer to each part of the schematic.
  
  As suggested by the reviewer, we have modified the scheme in Figure 1
  
  2.Figure 1F - smooth line is misleading, please include individual embryo measurement points. This comment could be applied to several figures
  
  We agree with the reviewer that the graphs in the original manuscript could be improved, and we have therefore modified all figures to better represent data dispersion within each group.
  
  3.Figure 2C/D - To make this manuscript more accessible to individuals who are not familiar with the anatomy of zebrafish tail, please include zoom in panels of the region of interest where arrows are pointing out increased YAP signaling in the floor plate and hypochord.
  
  As suggested by the reviewer, we have included higher-magnification insets in Figure 2
  
  4.In discussion - "In vgll4b mutants, increased progenitor incorporation initially does not alter overall notochord length due to a buffering mechanism for natural variation in progenitor addition" - this is not directly tested in terms of buffering for variation and is an assumption. Please either cite a paper or reword
  
  This point has been clarified in the revised discussion.
  
  Reviewer #3 (Significance (Required)):
  
  Overall, the logic and experiments conducted in these studies are well defined. However, the significance of the work is minimal and makes only a small contribution to the advancement of the field of developmental biology. Regardless, the studies are well done and worth publication.
  
  Strengths:
  
  -The study does a good job of incorporating and testing a computational model in a way that proves/disproves their hypothesis
  
  -The manuscript is well written and follows a logical order, making it easy for readers to understand the main findings
  
  -The study uses multiple routes of YAP inhibition (genetic and drug) to show effect on progenitor addition to the notochord and shortened body axis
  
  -The discussion does aa very good job of giving the context of the study's results.
  
  Weaknesses:
  
  -The study is minimal and fails to illuminate the mechanism that connects progenitor addition to vacuolization, claiming only an indirect relationship with YAP signaling. However, this is admitted by the authors and not overstated
  
  The study provides a minimal advancement to the field by investigating an unexplored area of zebrafish notochord extension. It provides a small step toward connecting mechanical/morphogenic mechanisms with signalling in zebrafish body axis extension.
  
  The audience of this work is a specialized basic research group of developmental biology scientists. The research is of particular relevance to individuals studying zebrafish or axis elongation. While the authors make comparisons to other systems, due to the unique nature of the zebrafish body extension, this generates a narrow field of focus for the manuscript.
  
  We have previously discussed the uniqueness of zebrafish posterior body elongation in light of critical differences in the degree to which posterior growth from self-renewing tailbud progenitor populations contribute to the mechanisms of axis elongation (Sambasivan and Steventon (2021) Frontiers in cell and dev. Biol; Steventon and Martinez Arias (2017) Developmental Biology). Here too, we think zebrafish provide an important system to explore differences in the mechanisms that drive notochord elongation, and we envisage that this study will provoke a similar cross species comparison that takes into account differences in the relative timing of progenitor addition and anterior notochord expansion (that occurs much later in amniotes, for example). It is only by considering these species-specific differences across experimental organisms that we can arrive at the fundamental principles that drive developmental processes, and how evolution has acted upon these to drive change in adult body plans. We therefore respectfully disagree with the review about the scope and importance of this work for these reasons.
  
  In addition, we feel that the principles by which dynamic processes are coupled across an organ are broadly applicable and will illuminate further research into understanding organ growth control.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.02.17.706348
www.biorxiv.org www.biorxiv.org

The urban tree of life: synthesizing relationships between body size and urban affinity

2
1. Public_Reviews 26 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  I have completed a thorough review of this paper, which seeks to use the large datasets of species occurrences available through GBIF to estimate variation in how large numbers of plant and animal species are associated with urbanization throughout the world, describing what they call the "species urbanness distribution" or SUD. They explore how these SUDs differ between regions and different taxonomic levels. They then calculate a measure of urban tolerance and seek to explore whether organism size predicts variation in tolerance among species and across regions.
  
  The study is impressive in many respects. Over the course of several papers, Callaghan and coauthors have been leaders in using "big [biodiversity] data" to create metrics of how species' occurrence data are associated with urban environments, and in describing variation in urban tolerance among taxa and regions. This work has been creative, novel, and it has pushed the boundaries of understanding how urbanization affects a wide diversity of taxa. The current paper takes this to a new level by performing analyses on over 94000 observations from >30,000 species of plants and animals, across more than 370 plant and animal taxonomic families. All of these analyses were focused on answering two main questions:<br /> (1) What is the shape of species' urban tolerance distributions within regional communities?<br /> (2) Does body size consistently correlate with species' urban tolerance across taxonomic groups and biogeographic contexts?
  
  Overall, I think the questions are interesting and important, the size and scope of the data and analyses are impressive, and this paper has a potentially large contribution to make in pushing forward urban macroecology specifically and urban ecology and evolution more generally.
  
  Despite my enthusiasm for this paper and its potential impact, there are aspects that could be improved, and I believe the paper requires major revision.
  
  Some of these revisions ideally involve being clearer about the methodology or arguments being made. In other cases, I think their metrics of urban tolerance are flawed and need to be rethought and recalculated, and some of the conclusions are inaccurate. I hope the authors will address these comments carefully and thoroughly. I recognize that there is no obligation for authors to make revisions. However, revising the paper along the lines of the comments made below would increase the impact of the paper and its clarity to a broad readership.
  
  Major Comments:
  
  (1) Subrealms
  
  Where does the concept of "subrealms" come from? No citation is given, and it could be said that this sounds like an idea straight out of Middle Earth. How do subrealms relate to known bioclimatic designations like Koppen Climate classifications, which would arguably be more appropriate? Or are subrealms more socio-ecologically oriented? From what I can tell, each subrealm lumps together climatically diverse areas. It might be better and more tractable to break things in terms of continents, as the rationale for subrealms is unclear, and it makes the analyses and results more confusing. The authors rationalized the use of subrealms to account for potential intraspecific differences in species' response to urbanization, but that is never a core part of the questions or interpretation in the paper, and averaging across subrealms also accounts for intraspecific variation. Another issue with using the subrealm approach is that the authors only included a species if it had 100 observations in a given subrealm, leading to a focus on only the most common species, which may be biased in their SUD distribution. How many more species would be included if they did their analysis at the continental or global scale, and would this change the shape of SUDs?
  
  (2) Methods - urban score
  
  The authors describe their "urban score" as being calculated as "the mean of the distribution of VIIRS values as a relative species-specific measure of a response to urban land cover."
  
  I don't understand how this is a "relative species-specific measure". What is it relative to? Figures S4 and S5 show the mean distribution of VIIRS for various taxa, and this mean looks to be an absolute measure. Mean VIIRS for a given species would be fine and appropriate as an "urban score", but the authors then state in the next sentence: "this urban score represents the relative ranking of that species to other species in response to urban land cover".
  
  That doesn't follow from the description of how this is calculated. Something is missing here. Please clarify and add an explicit equation for how the urban score is calculated because the text is unclear and confusing.
  
  (3) Methods - urban tolerance
  
  How the authors are defining and calculating tolerance is unclear, confusing, and flawed in my opinion.
  
  Tolerance is a common concept in ecology, evolution, and physiology, typically defined as the ability for an organism to maintain some measure of performance (e.g., fitness, growth, physiological homeostasis) in the presence versus absence of some stressor. As one example, in the herbivory literature, tolerance is often measured as the absolute or relative difference in fitness of plants that are damaged versus undamaged (e.g., https://academic.oup.com/evolut/article/62/9/2429/6853425?login=true).
  
  On line 309, after describing the calculation of urban scores across subrealms, they write: "Therefore, a species could be represented across multiple subrealms with differing measures of urban tolerance (Fig. S4). Importantly, this continuous metric of urban tolerance is a relative measure of a species' preference, or affinity, to urban areas: it should be interpreted only within each subrealm".
  
  This is problematic on several fronts. First, the authors never define what they mean by the term "tolerance". Second, they refer to urban tolerance throughout the paper, but don't describe the calculation until lines 315-319, where they write (text in [ ] is from the reviewer):
  
  "Within each subrealm, we further accounted for the potential of different levels of urbanization by scaling each species' urban score by subtracting the mean VIIRS of all observations in the subrealm (this value is hereafter referred to as urban tolerance). This 'urban tolerance' (Fig. S5) value can be negative - when species under-occupy urban areas [relative to the average across all species] suggesting they actively avoid them-or positive-when species over-occupy urban areas [relative to the average across all species] suggesting they prefer them (i.e., ranging from urban avoiders to urban exploiters, respectively).<br /> They are taking a relativized urban score and then subtracting the mean VIIRS of all observations across species in a subrealm. How exactly one interprets the magnitude isn't clear and they admit this metric is "not interpretative across subrealms".
  
  This is not a true measure of tolerance, at least not in the conventional sense of how tolerance is typically defined. The problem is that a species distribution isn't being compared to some metric of urbanness, but instead it is relative to other species' urban scores, where species may, on average, be highly urban or highly nonurban in their distribution, and this may vary from subrealm to subrealm. A measure of urban tolerance should be independent of how other species are responding, and should be interpretable across subrealms, continents, and the globe.
  
  I propose the authors use one of two metrics of urban tolerance:
  
  (i) Absolute Urban Tolerance = Mean VIIRS of species_i - Mean VIIRS of city centers<br /> Here, the mean VIIRS of city centers could be taken from the center of multiple cities throughout a subrealm, across a continent, or across the world. Here, the units are in the original VIIRS units where 0 would correspond to species being centered on the most extreme urban habitats, and the most extreme negative values would correspond to species that occupy the most non-urban habitats (i.e., no artificial light at night). In essence, this measure of tolerance would quantify how far a species' distribution is shifted relative to the most highly urbanized habitat available.
  
  (ii) % Urban Tolerance = (Mean VIIRS of species_i - Mean VIIRS of city centers)/MeanVIIRS of city centers * 100%<br /> This metric provides a % change in species mean VIIRS distribution relative to the most urban habitats. This value could theoretically be negative or positive, but will typically be negative, with -100% being completely non-urban, and 0% being completely urban tolerant.
  
  Both of these metrics can be compared across the world, as it would provide either absolute (equation 1) or relative (equation 2) metrics of urban tolerance that are comparable and easily interpretable in any region.
  
  In summary, the definition of tolerance should be clear, the metric should be a true measure of tolerance that is comparable across regions, and an equation should be given.
  
  (4) Figure 1: The figure does not stand alone. For example, what is the hypothesis for thermophily or the temperature-size rule? The authors should expand the legend slightly to make the hypotheses being illustrated clearer.
  
  (5) SUDs: I don't agree with the conclusion given on line 83 ("pattern was consistent across subrealms and several taxonomic levels") or in the legend of Figure 2 ("there were consistent patterns for kingdoms, classes, and orders, as shown by generally similar density histograms shapes for each of these").
  
  The shapes of the curves are quite different, especially for the two Kingdoms and the different classes. I agree they are relatively consistent for the different taxonomic Orders of insects.
  
  Comments on revised version:
  
  I believe their response is thorough and thoughtful. I still disagree with them on some fundamental points of their methodology. However, I would prefer to let my review and their response stand as is. This will allow engaged readers to see both sides of the arguments and judge for themselves whether they believe the revisions are sufficient and if my concerns are valid.
  
  Review 1
2. Public_Reviews 26 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This study provides an important assessment of how body size influences the occurrence of macro-organisms in urban areas across the globe. Size in most plants, but only some animal families, was positively associated with urban tolerance. The data set is impressive, but the evidence for broad-scale conclusions is incomplete due to methodological issues that need to be resolved.
  
  We have substantially revised the manuscript to resolve the methodological issues raised, including clarifying the definition, calculation, and interpretation of urban affinity (formerly named urban tolerance), and tightening the scope of our conclusions to align directly with the evidence presented.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors integrate multiple large databases to test whether body sizes were positively associated with which species tolerate urban areas. In general, many plant families showed a positive association between body size and urban tolerance, whereas a smaller, though still non-trivial, percentage of animal families showed the same pattern. Notably, the authors are careful in the interpretation of their findings and provide helpful context for the ways that this analysis can be generative in shaping new hypotheses and theory around how urbanization influences biodiversity at large. They are careful to discuss how body size is an important trait, but the absence of a relationship between body size and urban tolerance in many families suggests a variety of other traits undergird urban success.
  
  We appreciate this thoughtful and balanced assessment of our work and fully agree with the reviewer’s interpretation. In particular, we share the view that the heterogeneous and often weak association between body size and urban affinity across many families is an important result in its own right, underscoring that no single trait is likely to explain urban success across the tree of life. As the reviewer notes, our intention was not to present body size as a universal predictor, but rather as a widely available, integrative trait that can help reveal where general patterns do and do not emerge. We view the lack of a consistent relationship in many families as strong motivation for future work that explicitly integrates additional functional traits and ecological contexts, and we have clarified this perspective in the revised manuscript.
  
  Strengths:
  
  The authors aggregated a large dataset, but they also applied robust filters to ensure they had an adequate and representative number of detections for a given species, family, geography, etc. The authors also applied their analysis at multiple taxonomic scales (family and order), which allowed for a better interpretation of the patterns in the data and at what taxonomic scale body size might be important.
  
  We thank the reviewer for highlighting these strengths of the study. Considerable effort went into assembling, harmonizing, and filtering these data across taxa, regions, and taxonomic resolutions, and we were deliberate in applying conservative thresholds to ensure that species-level urban affinity estimates were based on adequate and comparable sampling. We hope that, beyond the specific results presented here, the compiled dataset and analytical framework will serve as a valuable resource for future studies aiming to explore additional traits, taxa, or mechanisms underlying species’ responses to urbanization.
  
  Weaknesses:
  
  My main concern is that it is not fully clear how the measure of body size might influence the result. The authors were unable to obtain consistent measures of body size (mean, median, maximum, or sex variation). This, of course, could be very consequential as means and medians can differ quite a bit, and they certainly will differ substantially from a maximum. And of course, sex differences can be marked in multiple directions or absent altogether. The authors do note that they selected the measure that was most common in a family, but it was not clear whether species in that family that did not have that measure were removed or not. This could potentially shape the variability in the dataset and obscure true patterns. This may require additional clarity from the authors and is also a real constraint in compiling large data from disparate sources.
  
  We appreciate this important point and agree that heterogeneity in how body size is measured (e.g., mean vs. maximum values, sex-specific measures) is a real but unavoidable challenge when compiling organismal trait data across such a broad taxonomic scope. We would like to clarify that our analytical approach was explicitly designed to minimize the influence of this heterogeneity rather than ignore it. Specifically, for each family we retained all species for which at least one body size estimate was available, rather than removing species that lacked a particular measurement type. When multiple body size measures existed for a species, we selected the measurement type that was most commonly available within that family in order to maximize comparability among species while retaining sample size. Importantly, differences among body size measurement types (including units, measurement detail, and whether values reflected means, maxima, or sex-specific estimates) were further accounted for by (i) log-transforming all body size values and (ii) centering and scaling body size values within each measurement type, which was included as a random effect in the hierarchical models. This approach reduces the influence of systematic differences among measurement types on estimated relationships with urban affinity. We have added a sentence to the methods clarifying that species with a single measurement type were not removed from analyses:
  
  “Importantly, this procedure did not result in the exclusion of species lacking a particular body size measurement type; rather, all species with at least one available body size estimate were retained, with measurement heterogeneity explicitly accounted for through hierarchical modeling.”
  
  We agree that variation in body size definitions may still contribute residual noise and potentially obscure weak relationships, and we now emphasize this more clearly as a limitation of large-scale trait syntheses. However, because our primary inference focuses on the presence, absence, and direction of size–urban affinity relationships across families, rather than precise effect sizes, we believe our approach provides a robust and conservative test of whether body size consistently predicts urban affinity across taxa. We highlight this point in the limitations section of our manuscript:
  
  “One important limitation of our synthesis is the heterogeneity in how body size is measured across taxa, including differences among mean, maximum, and sex-specific estimates. While our analytical framework explicitly accounts for this variation through transformation, scaling, and hierarchical modeling with random intercepts (see Methods), residual measurement noise may still obscure weak size–urban affinity relationships. This challenge is inherent to large-scale trait syntheses that integrate data from disparate sources, and highlights the need for continued efforts to standardize trait databases and expand the availability of harmonized organismal trait data across the tree of life.”
  
  Reviewer #2 (Public review):
  
  I have completed a thorough review of this paper, which seeks to use the large datasets of species occurrences available through GBIF to estimate variation in how large numbers of plant and animal species are associated with urbanization throughout the world, describing what they call the "species urbanness distribution" or SUD. They explore how these SUDs differ between regions and different taxonomic levels. They then calculate a measure of urban tolerance and seek to explore whether organism size predicts variation in tolerance among species and across regions.
  
  The study is impressive in many respects. Over the course of several papers, Callaghan and coauthors have been leaders in using "big [biodiversity] data" to create metrics of how species' occurrence data are associated with urban environments, and in describing variation in urban tolerance among taxa and regions. This work has been creative, novel, and it has pushed the boundaries of understanding how urbanization affects a wide diversity of taxa. The current paper takes this to a new level by performing analyses on over 94000 observations from >30,000 species of plants and animals, across more than 370 plant and animal taxonomic families. All of these analyses were focused on answering two main questions:
  
  (1) What is the shape of species' urban tolerance distributions within regional communities?
  
  (2) Does body size consistently correlate with species' urban tolerance across taxonomic groups and biogeographic contexts?
  
  We thank the reviewer for their careful reading of the manuscript and for this generous and accurate summary of the study’s aims, scope, and contributions. We appreciate the recognition of our group’s broader body of work using large biodiversity databases to quantify species’ associations with urban environments, and we are grateful for the reviewer’s acknowledgement that this study extends those efforts to an unprecedented taxonomic and geographic scale. We agree with the reviewer’s articulation of the two core questions motivating the paper, and we have revised the manuscript to ensure that these questions are stated clearly and addressed consistently throughout.
  
  Overall, I think the questions are interesting and important, the size and scope of the data and analyses are impressive, and this paper has a potentially large contribution to make in pushing forward urban macroecology specifically and urban ecology and evolution more generally.
  
  Thanks! We see this work as an effort to move beyond species-by-species descriptions of urban responses toward a community- and distribution-level perspective, where the shape of species’ urban associations themselves becomes an object of study. By framing species’ distributions along an urbanization gradient as a collective property of regional species pools, our approach opens a complementary way of thinking about how urbanization filters biodiversity.
  
  Despite my enthusiasm for this paper and its potential impact, there are aspects that could be improved, and I believe the paper requires major revision.
  
  Some of these revisions ideally involve being clearer about the methodology or arguments being made. In other cases, I think their metrics of urban tolerance are flawed and need to be rethought and recalculated, and some of the conclusions are inaccurate. I hope the authors will address these comments carefully and thoroughly. I recognize that there is no obligation for authors to make revisions. However, revising the paper along the lines of the comments made below would increase the impact of the paper and its clarity to a broad readership.
  
  We appreciate the detailed comments provided and have addressed each point in turn - see detailed responses below. We took these concerns seriously and undertook a substantial revision of the manuscript. In summary, we clarified the conceptual framing of “urban tolerance” (now referred to as “urban affinity”), explicitly defined the metric and its interpretation, added equations and a step-by-step methodological roadmap, and expanded justification for our regional stratification. Where appropriate, we refined language in the Results and Discussion to ensure conclusions are tightly aligned with what the metric can and cannot support. We agree that these revisions materially improve the clarity, rigor, and interpretability of the study, and we appreciate the reviewer’s perspective on how doing so strengthens the paper’s contribution and accessibility to a broad readership.
  
  Major Comments:
  
  (1) Subrealms
  
  Where does the concept of "subrealms" come from? No citation is given, and it could be said that this sounds like an idea straight out of Middle Earth. How do subrealms relate to known bioclimatic designations like Koppen Climate classifications, which would arguably be more appropriate? Or are subrealms more socio-ecologically oriented? From what I can tell, each subrealm lumps together climatically diverse areas. It might be better and more tractable to break things in terms of continents, as the rationale for subrealms is unclear, and it makes the analyses and results more confusing. The authors rationalized the use of subrealms to account for potential intraspecific differences in species' response to urbanization, but that is never a core part of the questions or interpretation in the paper, and averaging across subrealms also accounts for intraspecific variation. Another issue with using the subrealm approach is that the authors only included a species if it had 100 observations in a given subrealm, leading to a focus on only the most common species, which may be biased in their SUD distribution. How many more species would be included if they did their analysis at the continental or global scale, and would this change the shape of SUDs?
  
  We thank the reviewer for raising this point and agree that the rationale for using subrealms required clearer explanation. Next to allowing potential intraspecific differences in urban affinity across regions, our subrealm-based approach also provides a practical way to partition global biodiversity into ecologically meaningful regional assemblages while maintaining sufficient sample sizes for analysis. Urban affinity is likely to vary geographically within species due to differences in climate, habitat availability, urban form, and evolutionary history. By calculating urban affinity within subrealms rather than globally, our approach allows species to exhibit region-specific urban affinities while ensuring that comparisons are made among species co-occurring within the same regional ecological context. We have substantially revised the Methods to explicitly define subrealms, cite their origin, and clarify why this spatial stratification is appropriate for our study:
  
  “Accounting for geographic context through subrealm stratification
  
  To account for geographic heterogeneity in both species’ distributions and the baseline levels of urbanization, we stratified our analyses by global biogeographic subrealms (N=52; Fig. S1). Subrealms represent an intermediate hierarchical level within the One Earth [82] (https://www.oneearth.org/bioregions/) bioregionalization framework, grouping the 185 terrestrial bioregions into broader units that reflect shared species pools and ecological contexts while maintaining meaningful regional structure. This scale represents a practical compromise between analyzing data at the finer bioregion level (which would result in many regions with insufficient observations for robust analysis) and broader classifications such as continents or the 14 biogeographic realms, which aggregate ecologically distinct regions and species pools. This regionalization has been widely used in macroecological and biogeographic research to contextualize species–environment relationships because subrealms capture meaningful gradients in biotic assemblages that are not accounted for by climatic classifications alone [83,84].
  
  This stratification allows species’ associations with urban environments to be interpreted relative to the environments available within the regions they occupy. This is important, as previous work has shown that species’ responses to urbanization are constrained by biogeographic context, because regional species pools reflect shared evolutionary, ecological, and historical filters [23]. Previous work has also shown that urban associations among species are context-dependent, and interpreting species’ responses without accounting for regional baselines conflates availability of urban environments with species’ affinity to them. This distinction is critical because identical levels of urbanization (e.g., VIIRS radiance) can have different ecological meanings across regions with different species pools and land-use histories. It avoids conflating species’ urban affinity with global differences in urban availability.”
  
  We chose subrealms rather than Köppen climate classifications or continental units because our objective was not to partition species by climatic similarity per se, but to evaluate species’ associations with urban environments relative to the ecological and biogeographic contexts in which they occur. Climatic classifications such as Köppen are highly effective for addressing climate–species relationships, but they do not explicitly capture differences in species pools, evolutionary history, or land-use legacies that strongly shape how species interact with urbanization. Likewise, continents often aggregate ecologically disparate regions and species pools, potentially obscuring meaningful variation in baseline urbanization and species’ realized distributions.
  
  Importantly, urban affinity in our framework is a relative, context-dependent metric, explicitly interpreted within regions. Identical levels of urbanization (e.g., VIIRS radiance values) can have different ecological meanings across regions with distinct species pools, land-use histories, and settlement patterns. Stratifying analyses by subrealm therefore avoids conflating species’ affinity to urban environments with global or continental differences in the availability and intensity of urban land cover. We have clarified this distinction and motivation in the revised Methods (see responses below).
  
  Regarding the concern that requiring ≥100 observations per species per subrealm biases analyses toward common species: we agree that this threshold focuses the analysis on well-sampled species. This choice was intentional and follows previous work showing that such cutoffs are necessary to robustly characterize species’ responses to urbanization using occurrence data. While a global or continental analysis would indeed include additional, rarer species, it would also substantially increase uncertainty and conflate species’ responses across ecologically distinct contexts. Our study is therefore best interpreted as a macroecological synthesis of common species, which are also the taxa that disproportionately structure urban communities and drive the shape of Species Urbanness Distributions (SUDs). We now clarify this scope and limitation more explicitly in the introduction:
  
  “Our aim is to identify broad, cross-taxonomic patterns in species’ urban affinity at a global scale, rather than to resolve the specific causal mechanisms driving urban success or failure within individual taxa or cities.”.
  
  As well as in the discussion:
  
  “Our synthesis complements taxon-specific, presence–absence trait studies by identifying broad, cross-taxonomic patterns that can motivate and contextualize more mechanistic analyses [17,23].”
  
  Finally, while alternative spatial stratifications are possible, the central patterns we report particularly the skewed shape of SUDs—are robust to the use of regional context rather than absolute global metrics. Exploring how SUDs change under different spatial frameworks (e.g., continents, climate zones) is an interesting avenue for future work, but we feel is beyond the scope of the present study.
  
  (2) Methods - urban score
  
  The authors describe their "urban score" as being calculated as "the mean of the distribution of VIIRS values as a relative species specific measure of a response to urban land cover."
  
  I don't understand how this is a "relative species-specific measure". What is it relative to? Figures S4 and S5 show the mean distribution of VIIRS for various taxa, and this mean looks to be an absolute measure. Mean VIIRS for a given species would be fine and appropriate as an "urban score", but the authors then state in the next sentence: "this urban score represents the relative ranking of that species to other species in response to urban land cover".
  
  We agree that the wording in the original manuscript was unclear and conflated two distinct steps in the workflow. We have now revised the Methods to clearly distinguish between (i) the urban score, which is an absolute, descriptive summary of the mean VIIRS radiance associated with a species’ occurrence locations, and (ii) urban affinity, which is the relative, region-specific metric derived from the urban score. Specifically, we rewrote the methods to have distinct steps as subheadings, as follows: (1) urban score; (2) subrealms and why; (3) urban affinity. In the revised Methods, we explicitly define the urban score:
  
  “an absolute descriptive summary of the urbanization levels associated with a species’ occurrence locations within a given subrealm”.
  
  We no longer describe the urban score itself as “relative” or as a ranking among species. Relative comparisons among species arise only in the subsequent step, where species-specific urban scores are expressed relative to the regional background level of urbanization within each subrealm to derive urban affinity.
  
  We refer the Reviewer to the revised version which we feel is much clearer (lines 428-479)!
  
  That doesn't follow from the description of how this is calculated. Something is missing here. Please clarify and add an explicit equation for how the urban score is calculated because the text is unclear and confusing.
  
  The previous response, where we discuss the description, hopefully clarifies this. Further, we have revised the Methods to clearly define the urban score and to include an explicit equation. In the revised manuscript, the urban score for species s is calculated as the mean VIIRS radiance across all occurrence locations of that species:
  
  where n<sub>s</sub>is the number of GBIF occurrence records for species s, and L<sub>i</sub> is the VIIRS nighttime lights radiance value extracted at the location of occurrence i. We also clarify in the Methods that this urban score is an absolute summary statistic of observed urbanization at species occurrence locations
  
  (3) Methods - urban tolerance
  
  How the authors are defining and calculating tolerance is unclear, confusing, and flawed in my opinion.
  
  Tolerance is a common concept in ecology, evolution, and physiology, typically defined as the ability for an organism to maintain some measure of performance (e.g., fitness, growth, physiological homeostasis) in the presence versus absence of some stressor. As one example, in the herbivory literature, tolerance is often measured as the absolute or relative difference in fitness of plants that are damaged versus undamaged
  
  (e.g., https://academic.oup.com/evolut/article/62/9/2429/6853425?login=true).
  
  On line 309, after describing the calculation of urban scores across subrealms, they write: "Therefore, a species could be represented across multiple subrealms with differing measures of urban tolerance (Fig. S4). Importantly, this continuous metric of urban tolerance is a relative measure of a species' preference, or affinity, to urban areas: it should be interpreted only within each subrealm". This is problematic on several fronts. First, the authors never define what they mean by the term "tolerance". Second, they refer to urban tolerance throughout the paper, but don't describe the calculation until, where they write (text in [ ] is from the reviewer): "Within each subrealm, we further accounted for the potential of different levels of urbanization by scaling each species' urban score by subtracting the mean VIIRS of all observations in the subrealm (this value is hereafter referred to as urban tolerance). This 'urban tolerance' (Fig. S5) value can be negative - when species under-occupy urban areas [relative to the average across all species] suggesting they actively avoid them-or positive-when species over-occupy urban areas [relative to the average across all species] suggesting they prefer them (i.e., ranging from urban avoiders to urban exploiters, respectively). They are taking a relativized urban score and then subtracting the mean VIIRS of all observations across species in a subrealm. How exactly one interprets the magnitude isn't clear and they admit this metric is "not interpretative across subrealms".
  
  This is not a true measure of tolerance, at least not in the conventional sense of how tolerance is typically defined. The problem is that a species distribution isn't being compared to some metric of urbanness, but instead it is relative to other species' urban scores, where species may, on average, be highly urban or highly nonurban in their distribution, and this may vary from subrealm to subrealm. A measure of urban tolerance should be independent of how other species are responding, and should be interpretable across subrealms, continents, and the globe.
  
  We thank the reviewer for this careful and important critique. We agree that the term “tolerance” is commonly used to describe the ability of an organism to maintain performance (e.g., fitness, growth, physiological homeostasis) in the presence of a stressor, and that our metric does not measure tolerance in this mechanistic or fitness-based sense. To address this directly and unambiguously, we have revised the manuscript to explicitly define the term “urban affinity” as opposed to urban tolerance.
  
  In the revised Methods, we also reorganized and clarified the calculation of urban affinity, introduced explicit notation, and provided a formal equation. Specifically, we now define urban affinity for species s in subrealm r as:
  
  where U<sub>s,r</sub>is the mean VIIRS radiance across all occurrence locations of species s within subrealm r, and Ū<sub>r</sub>is the mean VIIRS radiance across all occurrence records of all species in that subrealm. This transformation centers species’ urban scores on the regional background level of urbanization, yielding a relative measure of spatial association with urban environments.
  
  We agree with the reviewer that this metric is not interpretable as an absolute measure of affinity, and we now state this explicitly. Urban affinity values are, by construction, relative measures, interpretable only within subrealms, and they quantify whether a species tends to occur in more or less urbanized environments than is typical for that region. The magnitude of the metric therefore reflects deviation from the regional baseline, not a universal or global scale of urbanization, and is not intended to be compared directly across subrealms.
  
  We respectfully disagree, however, that this makes the metric flawed. Rather, it reflects a deliberate analytical choice aligned with our research questions. Our goal was not to estimate absolute urban exposure or physiological performance, but to compare species’ realized spatial associations with urban environments within shared biogeographic contexts. Because baseline urbanization levels, settlement history, and species pools vary strongly across regions, a globally absolute metric would conflate species’ affinities with regional availability of urban environments. By contrast, a relative, region-centered metric allows meaningful comparisons among species that coexist within the same ecological and biogeographic setting. This approach follows a growing body of macroecological work that infers species’ environmental affinities from spatial distributions rather than direct performance measures (e.g., Callaghan et al. 2020; 2021; 2023), and we now cite these studies explicitly.
  
  I propose the authors use one of two metrics of urban tolerance:
  
  (i) Absolute Urban Tolerance = Mean VIIRS of species_i - Mean VIIRS of city centers Here, the mean VIIRS of city centers could be taken from the center of multiple cities throughout a subrealm, across a continent, or across the world. Here, the units are in the original VIIRS units where 0 would correspond to species being centered on the most extreme urban habitats, and the most extreme negative values would correspond to species that occupy the most non-urban habitats (i.e., no artificial light at night). In essence, this measure of tolerance would quantify how far a species' distribution is shifted relative to the most highly urbanized habitat available.
  
  (ii) % Urban Tolerance = (Mean VIIRS of species_i - Mean VIIRS of city centers)/MeanVIIRS of city centers * 100%
  
  This metric provides a % change in species mean VIIRS distribution relative to the most urban habitats. This value could theoretically be negative or positive, but will typically be negative, with -100% being completely non-urban, and 0% being completely urban tolerant.
  
  Both of these metrics can be compared across the world, as it would provide either absolute (equation 1) or relative (equation 2) metrics of urban tolerance that are comparable and easily interpretable in any region.
  
  In summary, the definition of tolerance should be clear, the metric should be a true measure of tolerance that is comparable across regions, and an equation should be given.
  
  We thank the reviewer for this thoughtful and constructive suggestion, which raises an important conceptual issue regarding how “urban tolerance” should be defined and quantified. We agree that any such metric must be clearly defined, interpretable, and accompanied by an explicit equation, and we have revised the manuscript accordingly to clarify both our definition and its intended interpretation.
  
  The alternative metrics proposed by the reviewer anchoring species’ distributions to city centers or to the most highly urbanized habitats represent a valid and intuitive absolute framing of urban tolerance. Indeed, a closely related approach was explored and evaluated in Callaghan et al. (2020; https://doi.org/10.1016/j.ecolind.2020.106905), where species’ occurrence-based urbanness scores derived from VIIRS night-time lights were compared against abundance-based estimates of urban tolerance using explicit urban–non-urban contrasts. That study further demonstrated that urbanness scores depend on the choice of spatial baseline (e.g., regional buffers around cities versus continental extents), and showed that different baselines capture complementary, but not identical, aspects of species–urban associations.
  
  In the present study, we deliberately adopt a relative, regionally contextualized metric (now referred to as urban affinity), expressing each species’ mean VIIRS association relative to the background urbanization of the biogeographic subrealm in which it occurs. This choice reflects our goal of comparing species’ relative affinities to urban environments within shared ecological and biogeographic contexts. Importantly, identical VIIRS values can correspond to very different ecological conditions across regions, and anchoring all species to city centers or global urban maxima risks conflating species’ affinities with regional differences in urban availability and infrastructure.
  
  We now make this distinction explicit throughout the manuscript, including by (i) defining urban affinity as a relative, occurrence-based measure of urban affinity (rather than physiological or fitness-based tolerance), (ii) providing an explicit equation for its calculation, and (iii) clarifying that these values are interpretable within, but not across, biogeographic subrealms. We view absolute, city-center–anchored metrics and relative, regionally normalized metrics as complementary approaches, each suited to different questions; the latter is most appropriate for the macroecological, comparative analyses pursued here.
  
  (4) Figure 1: The figure does not stand alone. For example, what is the hypothesis for thermophily or the temperature-size rule? The authors should expand the legend slightly to make the hypotheses being illustrated clearer.
  
  We now expanded the legend so that the figure and hypotheses presented can be understood based on just the figure and its legend; we did so by explaining the illustrated hypotheses as requested by the Reviewer. The figure legend now reads as follows:
  
  “Fig. 1: Conceptual framework illustrating hypothesized mechanisms linking urban affinity to interspecific body-size shifts. These include dispersal and mobility constraints under habitat fragmentation [44,45], thermophily and the temperature–size rule driven by the urban heat island effect [15,30], size-biased competition and survival [94,95], and size-biased human preferences [64]. Urban fragmentation of habitat resources can select for increased mobility (e.g., larger butterflies) or reduced mobility (e.g., larger seeds) depending on isolation severity. Elevated urban temperatures favor thermophily, which often negatively correlates with size as it affects the heat balance via thermal inertia. Similarly, these higher temperatures generally favor smaller-bodied adult ectotherms because they accelerate development and reduce time available for growth (i.e., temperature-size rule). In plants, the increased CO<sub>₂</sub> and nutrient availability associated with anthropogenic environments due to heating- and traffic-related CO2 emissions and eutrophication provides a competitive advantage to larger plant species, and human preferences too may favor larger species (e.g., tree-lined streets), whereas smaller species may be advantaged in colonizing built infrastructure.”
  
  (5) SUDs: I don't agree with the conclusion given on line 83 ("pattern was consistent across subrealms and several taxonomic levels") or in the legend of Figure 2 ("there were consistent patterns for kingdoms, classes, and orders, as shown by generally similar density histograms shapes for each of these").
  
  The shapes of the curves are quite different, especially for the two Kingdoms and the different classes. I agree they are relatively consistent for the different taxonomic Orders of insects.
  
  We agree that our original wording overstated the similarity of distributions across taxa and regions. We have revised the text to clarify that the consistency we refer to pertains primarily to central tendencies rather than identical distributional shapes. To address this directly, we conducted additional analyses comparing urban affinity distributions across subrealms for taxonomic groups with the largest sample sizes. These results, now presented in new Supplementary Figures (Fig. S2-S4), show that while distributional shapes vary among higher taxonomic groups, median values and overall spread are broadly similar within comparable taxonomic levels. We have updated the Results text and the Figure 2 legend accordingly to reflect this more precise interpretation.
  
  “These patterns in central tendency were broadly consistent across subrealms and taxonomic levels, although distributional shapes varied among higher taxonomic groups (Fig. 2).”
  
  “To evaluate this more formally, we compared distributions across subrealms for groups with the largest sample sizes and found that while distributional shapes varied among higher taxa, median values and overall spread were broadly similar within comparable taxonomic levels (Fig. S2–S4).”
  
  Figure 2 caption: “There were consistent patterns for kingdoms, classes, and orders (B) as shown by similar central tendencies despite variation in distributional shape.”
  
  We refer the Reviewer to the revised manuscript and supplementary material, but show the kindom level in Fig S2.
  
  More broadly, our goal in introducing Species Urbanness Distributions (SUDs) is not to argue that their exact shapes are invariant, but rather to provide a generalizable framework for describing how assemblages are structured along an urbanization gradient. In this respect, SUDs are conceptually analogous to Species Abundance Distributions (SADs), where the precise functional form has long been debated, yet the framework itself has proven extremely valuable for ecology. We therefore emphasize the utility of SUDs as a descriptive and comparative tool for quantifying community-level responses to urbanization, rather than as a claim about strict uniformity in distributional shape across taxa or regions.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This paper reports on an association between body size and the occurrence of species in cities, which is quantified using an 'urban score' that can be visualized as a 'Species Urbanness
  
  Distribution' for particular taxa. The authors use species records from the Global Biodiversity Information Facility (GBIF) and link the occurrence data to nighttime lighting quantified using satellite data (Visible Infrared Imaging Radiometer Suite-VIIRS). They link the urban score to body size data to find 'heterogeneous relationship between body size and urban tolerance across the tree'. The results are then discussed with reference to potential mechanisms that could possibly produce the observed effects (cf. Figure 1).
  
  We thank the reviewer for this clear and accurate summary of the study. We agree that the primary contribution of this work lies in the scale and taxonomic breadth of the analysis, and in introducing a framework (Species Urbanness Distributions) for quantifying species’ relative affinities to urban environments using globally available data. We have revised the manuscript to further clarify the scope of inference and the distinction between descriptive macroecological patterns and mechanistic explanations.
  
  Strengths:
  
  The novelty of this study lies in the huge number of species analyzed and the comparison of results among animal taxa, rather than in a thorough analysis of what traits allow species to persist under urban conditions. Such analyses have been done using a much more thorough approach that employs presence-absence data as well as a suite of traits by other studies, for example, in (Hahs et al. 2023, Neate-Clegg et al. 2023). The dataset that the authors produced would also be very valuable if these raw data were published, both the cleaned species records as well as the body sizes. The paper could strongly add to our understanding of what species occur in cities when the open questions are addressed.
  
  We appreciate highlighting the novelty of the taxonomic breadth and scale of our analysis. We agree that our approach is complementary to more detailed, taxon-specific trait studies based on presence–absence data. In response, we have further emphasized this distinction in the Discussion:
  
  “Our synthesis complements taxon-specific, presence–absence trait studies by identifying broad, cross-taxonomic patterns that can motivate and contextualize more mechanistic analyses17,23.”
  
  We also agree that the cleaned occurrence data and body size information represent a valuable resource, and all data will be made available, with the exception of some body size datasets which we are not able to make available.
  
  Weaknesses:
  
  I value the approach of the authors, but I think the paper needs to be revised.
  
  In my view, the authors could more carefully validate their approach. Currently, any weakness or biases in the approach are quickly explained away rather than carefully explored. This concerns particularly the use of presence-only data, but also the calculation of the urban score.
  
  The vast majority of data in GBIF is presence-only data. This produces a strong bias in the analysis presented in the paper. For some taxa, it is likely that occurrences within the city are overrepresented, and for other taxa, the opposite is true (cf. Sweet et al. 2022). I think the authors should try to address this problem.
  
  We thank the reviewer for raising this important point. We fully agree that GBIF occurrence data are subject to well-known sampling biases, including uneven geographic coverage, observer effort, and taxonomic focus. These limitations are now more explicitly acknowledged in the revised manuscript. At the same time, GBIF currently represents the only global biodiversity database that allows the scope of analysis undertaken here, spanning thousands of species across multiple taxonomic groups and regions. Systematic monitoring datasets that provide presence–absence data are typically restricted to particular taxa (often vertebrates or plants) and are geographically concentrated in the Global North, which would substantially limit the taxonomic and geographic breadth of our analysis.
  
  Importantly, our objective was not to estimate absolute species-specific responses to urbanization, but rather to examine relative patterns of urban affinity across species and families within comparable regional contexts. To address this, we structured our analyses at the subrealm level, which aggregates observations across large spatial extents and reduces sensitivity to fine-scale sampling biases associated with individual cities or urban–rural gradients. In addition, we restricted analyses to species with ≥100 observations per subrealm to focus on well-sampled taxa and reduce the influence of extremely sparse occurrence records. While these steps cannot fully eliminate sampling biases inherent to occurrence data, they substantially mitigate their influence when examining broad comparative patterns.
  
  Recent work has also evaluated the performance of GBIF data in urban biodiversity contexts. For example, Sweet et al. (2022) compared GBIF-derived species richness patterns with independent state-level biodiversity databases across cities and surrounding regions, finding that GBIF provided comparable or broader coverage across taxa and spatial extents. Their analysis showed that species richness was consistently higher in the surrounding region than in the city itself, suggesting that GBIF data capture broad urban–regional biodiversity gradients rather than systematically overrepresenting urban occurrences. Although our analysis differs in design, these results support the use of GBIF as a valuable resource for examining large-scale biodiversity patterns.
  
  More broadly, occurrence databases such as GBIF have become widely used for analyzing species–environment relationships at macroecological scales. While they may be insufficient for estimating precise species-specific environmental tolerances, they are informative for identifying broad patterns across taxa and regions. Our goal here is therefore to identify large-scale comparative patterns in urban affinity and generate hypotheses about trait– urbanization relationships, which can subsequently be tested with more structured monitoring datasets where available.
  
  Another important consideration is that our analyses focus on comparative differences among species within shared taxonomic and geographic contexts, rather than absolute estimates of urban affinity. Sampling biases in occurrence databases are often structured by observer behaviour (e.g., detectability, accessibility, or taxonomic interest), meaning that species recorded by similar observer communities are likely subject to similar sampling biases. Under these conditions, relative differences among species are expected to be preserved even when absolute occurrence frequencies are biased. This logic is consistent with the widely used target-group background approach in presence-only species distribution modelling, where species recorded by similar observer groups (often within the same taxonomic group) are used to control for shared sampling bias. Previous work by Callaghan et al. (2021; https://doi.org/10.1111/gcb.15670) performed additional validation analysis comparing our distribution-based urban affinity metric with estimates derived from occupancy modelling using well-sampled European butterflies (see Fig. S5 from the Callaghan et al. 2021 paper). The strong positive relationship between these approaches suggests that the broad patterns identified here are unlikely to arise solely from sampling artifacts.
  
  Finally, in the revised manuscript we now include additional comparisons among well-sampled taxonomic groups (see responses to other comments throughout our response document for details), which show substantial variation in urban affinity even among taxa with extensive sampling. These results suggest that the patterns reported here are unlikely to arise solely from sampling artifacts, but instead reflect meaningful ecological variation in how species interact with urban environments.
  
  The authors should compare their results to studies focusing on particular taxa where extensive trait-based analyses have already been performed, i.e., plants and birds. In fact, I strongly suggest that the authors should compare their results to previous studies on the relationship between traits, including body size and occurrences along a gradient of urbanisation, to draw conclusions about the validity of the approach used in the current study, which has a number of weaknesses.
  
  We agree that explicitly situating our findings within the existing trait-based urban ecology literature strengthens both interpretation and validation of our approach. We had already referenced several relevant studies (e.g., Hahs et al. 2023 and others) in the Introduction and Discussion, but we recognize that these comparisons were not sufficiently explicit. We have now added text to the Discussion directly comparing our results with previous trait-based studies across taxa:
  
  “Our results are broadly consistent with prior taxon-specific trait-based studies (eg., Hahs et al.[17]), but also highlight that relationships between body size and urbanization vary across taxa and analytical frameworks. For example, global syntheses and regional studies have reported positive, negative, or null size–urbanization relationships depending on clade and spatial scale. A recent global analysis that compiled empirical occurrence data for multiple terrestrial faunal taxa across cities worldwide reported broadly similar body-size responses to urbanization [17]. For four of the five groups that overlap with our analysis—amphibians, bats, bees, and birds—the direction of the body-size relationship with urbanization was consistent between studies. The only exception was carabid beetles, which tended to be smaller-bodied in highly urbanized environments in that analysis, whereas we detected no significant size effect for this family. Studies on birds, for example, have found mixed results, including positive associations to urbanization in some regional assemblages [45], no global relationship in others [46] or an overall negative relationship globally [23], and negative relationships in particular clades such as raptors [40]. Such discrepancies likely arise because different studies quantify urbanization differently, focus on different spatial grains, or analyze different components of species responses (e.g., presence– absence, abundance, or occurrence distributions). Additionally, a study on multiple taxa including butterflies and moths found a positive relationship in butterfly and moth community-weighed mean body size with increases in urbanization level, similar to our findings [31]. Researchers have also found that smaller-bodied dung-associated beetles potentially benefit from urban environments, which is similar to the negative association we found between urbanization and body size in beetles [47]. Our approach complements these studies by estimating occurrence-based urban associations across thousands of taxa simultaneously, allowing comparison of how consistently body size predicts urban affinity across taxonomic groupings rather than within a single lineage. In this sense, variation among published results does not contradict our findings but instead reinforces the conclusion that body size is a context-dependent filter whose direction and strength depend on ecological setting, taxonomic scope, and the urbanization metric used.”
  
  These additions highlight that published relationships between body size and urbanization vary widely across taxa, spatial scales, and analytical approaches. For example, prior studies have reported positive, negative, or null size–urbanization relationships depending on clade, geographic extent, and how urbanization or occurrence is quantified. Even within birds alone, the literature spans positive regional relationships, null global relationships, and negative relationships in particular clades such as raptors. We now explicitly discuss these contrasts and clarify that such discrepancies are expected because different studies measure different components of species’ responses (e.g., presence–absence vs. abundance vs. occurrence distributions), use different spatial grains, or focus on different taxonomic subsets.
  
  We emphasize that our analysis is not intended to replace taxon-specific trait studies, but rather to complement them by providing a macroecological synthesis across thousands of species simultaneously. Importantly, the heterogeneity we observe among families is itself a key biological result, indicating that body size is not a universal predictor of urban affinity but instead a context-dependent filter whose direction and strength vary across ecological and phylogenetic settings. We now state this interpretation more clearly in the revised manuscript.
  
  They should be be more careful in coming up with post-hoc explanations of why the pattern found in this study makes sense or suggests a particular mechanism. This reviewer considers that there is no way in which the current study can disentangle the different possible mechanisms without further analyses and data, so I would suggest pointing out carefully how the mechanisms could be studied.
  
  We agree that our study cannot disentangle the causal mechanisms underlying species’ responses to urbanization. Our intent in discussing potential mechanisms was not to claim definitive explanations, but rather to situate our findings within existing ecological theory and to highlight plausible, non-exclusive pathways that may generate the observed patterns. To make this clearer, we have revised the Discussion to explicitly frame these interpretations as hypotheses rather than conclusions, and to emphasize that testing the underlying mechanisms will require additional data and approaches, such as targeted trait datasets, experimental manipulations, and longitudinal or within-city studies:
  
  “Because our synthesis is correlative and macroecological in nature, the mechanisms discussed above are best viewed as hypotheses that can be evaluated through future work combining experimental, trait-based, and longitudinal data.”.
  
  Additionally, we modified our overall goal to make it clear that this is not inherently a mechanistic study per se:
  
  “Our aim is to identify broad, cross-taxonomic patterns in species’ urban affinity at a global scale, rather than to resolve the specific causal mechanisms driving urban success or failure within individual taxa or cities.”.
  
  More details should be given about the methodology. The readers should be able to understand the methods without having to read a number of other papers.
  
  We have substantially revised and expanded the Methods section to ensure that all analytical steps can be understood directly from the manuscript without requiring consultation of prior publications. In particular, we now (i) provide a clear conceptual roadmap of the workflow at the start of the Methods, (ii) define all key metrics explicitly, including equations for both the urban score and urban affinity, and (iii) clarify the interpretation, assumptions, and limitations of each step. We also added text explaining the rationale for subrealm stratification and the intended interpretation of relative values. Together, these revisions make the methodological framework fully transparent and self-contained (see revised Methods and related responses above and below).
  
  References:
  
  Hahs, A. K., B. Fournier, M. F. Aronson, C. H. Nilon, A. Herrera-Montes, A. B. Salisbury, C. G. Threlfall, C. C. Rega-Brodsky, C. A. Lepczyk, and F. A. La Sorte. 2023. Urbanisation generates multiple trait syndromes for terrestrial animal taxa worldwide. Nature Communications 14:4751.
  
  Neate-Clegg, M. H. C., B. A. Tonelli, C. Youngflesh, J. X. Wu, G. A. Montgomery, Ç. H. Şekercioğlu, and M. W. Tingley. 2023. Traits shaping urban tolerance in birds differ around the world. Current Biology 33:1677-1688.
  
  Sweet, F. S. T., B. Apfelbeck, M. Hanusch, C. Garland Monteagudo, and W. W. Weisser. 2022. Data from public and governmental databases show that a large proportion of the regional animal species pool occur in cities in Germany. Journal of Urban Ecology 8:juac002.
  
  We have incorporated these (and additional new references) into our revised manuscript.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  As you see from the general comments above and the specific recommendations below, the reviewers are impressed by your comprehensive data set and the analytic approach. However, they ask you to clarify your measures of organism size, occurrence data (vs. presence/absence and corresponding sample-bias caveats), urbanness (lighting differences between cities and regions?), urban tolerance (measure should not be relative to other species and particular regions), and region ("subrealm" vs. more commonly used defintions of world regions such as continents). They also encourage you to compare your general results with more detailed local studies to better justify using size as the only, easily available trait.
  
  We thank the Editor for this clear synthesis of the key priorities for revision. We have carefully addressed each point and substantially revised the manuscript to improve clarity, methodological transparency, and interpretability. In particular:
  
  We clarified how body size data were compiled, harmonized, and modeled, including explicit description of how different measurement types (mean, maximum, sex-specific) were retained and statistically accounted for through scaling and hierarchical modeling. We now state these procedures explicitly in the Methods.
  
  We expanded the Methods and Discussion to clarify that our analyses rely on occurrence data rather than presence–absence or abundance data, and we now explicitly discuss the implications and limitations of presence-only datasets, including potential sampling biases and how these may influence inference.
  
  We strengthened justification for using VIIRS night-time lights as a continuous proxy for urbanization, added supporting citations, and clarified that spatial heterogeneity in lighting primarily introduces additional variance rather than systematic bias. We also explicitly describe how urbanization values were calculated and interpreted.
  
  We substantially revised the manuscript to clearly define urban affinity at the outset (including in the Abstract), distinguish it from physiological definitions of tolerance, and provide explicit equations and step-by-step descriptions of how both urban score and urban affinity are calculated and interpreted. We now emphasize that the metric is a relative, region-contextualized measure of occurrence-based urban affinity.
  
  We added full justification, citations, and methodological explanation for the use of biogeographic subrealms, clarified how they differ from continents or climate zones, and explained why this stratification is appropriate for the ecological questions addressed. We also clarified the scope of inference and limitations of this approach.
  
  We expanded the Discussion to explicitly compare our results with prior trait-based urban ecology studies across taxa (including birds and other groups), highlighting where results converge, diverge, and why such variation is expected across spatial scales, taxa, and analytical frameworks.
  
  Reviewer #1 (Recommendations for authors):
  
  (1) Abstract
  
  (a) Please define how tolerance is being used here
  
  We now use affinity throughout and it is defined in various places (see responses to other comments here).
  
  (b) The abstract should clarify at what taxonomic scale body size is assessed. It is unclear in the abstract as to whether the reader expects intraspecific measures and interspecific, and at what resolution.
  
  We have revised the abstract by adding one sentence explicitly stating the scale body size was assessed:
  
  “We then assessed whether body size, an integrative ecological trait fundamental to space use, mobility, metabolism, and environmental sensitivity, showed consistent associations with urban affinity among species and across 371 taxonomic families. Analyses were conducted at the interspecific level and focused primarily on variation among taxonomic families (provided with this paper is an accompanying application to view results).”
  
  (2) Results/Discussion
  
  (a) The species urbanness distribution and comparison with the species abundance distribution is an interesting and conceptually useful contribution to urban ecology and underscores how urbanization functions on biodiversity at scale.
  
  We thank the reviewer for this positive assessment and are encouraged that they view the Species Urbanness Distribution (SUD) as a conceptually useful contribution to urban ecology. We see SUDs as a flexible framework that can be extended in several important directions, including comparisons across additional traits, cities of differing size and configuration, and temporal analyses that track how urbanness distributions shift with ongoing urban expansion or restoration. More broadly, we hope that SUDs can provide a framework to think about a macroecological understanding of how urbanization filters biodiversity.
  
  (b) In our Lambert et al. (2023) study that you reference, we suggest that 'exaptation' may be valuable to explore in urban areas. Although body size wasn't the trait we were considering at that time, it may be worth putting your discussion around pre-adaptation in this context.
  
  We agree that exaptation provides a valuable conceptual lens for interpreting species’ responses to urban environments. We have revised the Discussion to explicitly frame species’ urban success in this context:
  
  “Such traits “pre-adapted” to urban conditions allow for some species to not only persist but thrive in urban environments where most species cannot. Framing these patterns through the lens of exaptation may be particularly useful, as traits that evolved under non-urban selective pressures may incidentally confer advantages in urban environments without having arisen in response to urbanization per se (sensu Lambert et al.[4]). We therefore speculate that the skewed shape of SUDs may reflect the uneven distribution of exaptive traits across species pools, rather than widespread adaptive evolution to urban conditions.
  
  Consistent with this interpretation, if exaptive traits that facilitate urban persistence are unevenly distributed across species pools, most species would be expected to exhibit avoidance rather than affinity of urban environments. Indeed, we found that the median urban affinity is most often below one, indicating widespread avoidance among species.”.
  
  (c) Given the family-scale effect, it would be helpful to discuss how often species within a family co-occur in a given geographic region, how much other traits covary with size, etc. Do we have an a priori reason to expect family to be the taxonomic resolution at which body size seems to be most varied?
  
  Our exploratory and preliminary analyses revealed that variation in the body size– urban affinity relationship was strongest at the family level, which prompted us to focus our main analyses at this taxonomic resolution. (But we also present results on order as well). Families represent a biologically meaningful intermediate scale in taxonomy: species within families typically share broad morphological, ecological, and life-history characteristics, yet still exhibit substantial variation in body size and ecological strategies. Indeed, body size is well known to covary with multiple traits—including dispersal ability, metabolism, and space use—making it an integrative trait that captures several ecological dimensions simultaneously within and among families. These correlated traits likely contribute to the heterogeneous responses to urbanization observed among families.
  
  Using the family level also provides a practical balance between biological relevance and statistical robustness. Many families contain sufficient numbers of species to allow independent model estimation while avoiding the strong data imbalance that would arise at higher taxonomic levels. In addition, family is a commonly used unit in macroecological trait analyses (e.g., Roy et al. 2009; Smith et al. 2004), and it often reflects major morphological and ecological similarities among species, as reflected in taxonomic identification frameworks.
  
  Regarding co-occurrence, our analytical framework already accounts for geographic context by estimating urban affinity within subrealms. This ensures that species are compared within the same regional species pools and environmental contexts, rather than across globally disparate assemblages. Consequently, family-level effects emerge from comparisons among species that co-occur within shared biogeographic settings rather than from global taxonomic aggregation.
  
  We have added a short clarification in the manuscript to emphasize that body size functions as an integrative trait that covaries with multiple ecological attributes, and that family-level analyses represent a balance between ecological interpretability and data availability:
  
  “Because body size covaries with multiple ecological traits (e.g., dispersal ability and metabolic rate), we focused on family-level analyses to capture shared ecological strategies while still allowing sufficient variation among species to detect trait– environment relationships [39]”.
  
  (d) The result that body size shows a stronger effect in plants perhaps could suggest that plant records in GBIF are more sensitive to potential collection bias, perhaps due to detectability differences or preferences for where botanists and citizen scientists collect plant data? You mention ornamental plants late, but it may be worth discussing this here, too.
  
  We agree that this is a possible mechanism, which likely conflates detectability and ecological signal. We have expanded this point in the discusssion to better address this:
  
  “These human-driven preferences may also influence detectability and recording effort, as larger and more conspicuous plant species are more likely to be planted, maintained, and documented in urban environments, and thus be available in GBIF for our analyses. However, we suggest that this is not purely a sampling artifact, but such processes likely interact with ecological filtering to shape the realized size structure of urban plant communities.”.
  
  (e) I appreciate the additional taxonomic layering to the discussion. Seeing patterns at the family and order levels is helpful for generating new theory and predictions about how urbanization structures biodiversity at different taxonomic scales.
  
  We agree that examining patterns across multiple taxonomic scales is particularly valuable for generating testable hypotheses about how urbanization structures biodiversity, as different mechanisms may emerge or break down depending on the resolution of analysis. We hope this multi-scale perspective helps stimulate new theory and predictions about the ecological processes shaping urban biodiversity across the tree of life.
  
  (3) Methods
  
  (a) The methodology provides a scalable, consistent, and reasonable measure of both urbanness and species-level urban tolerance. The urban tolerance measure will, of course, not be useful for certain types of research (e.g., animal behavior), but it is appropriate for the resolution of this study.
  
  We agree that the urban affinity metric presented here is intended for broad-scale, comparative analyses and is not designed to capture fine-scale processes such as individual behavior or short-term demographic responses. Our goal was to develop a scalable and consistent measure that enables cross-taxon and cross-region comparisons at a global extent, which we believe is appropriate for addressing the questions posed in this study. We have sought to be explicit about this scope throughout the manuscript (e.g., to better alleviate Reviewer #1 concerns) and emphasize that the framework is complementary to, rather than a replacement for, more mechanistic or organism-focused approaches.
  
  (b) I'm concerned that the authors were not able to constrain their dataset to mean, median, or maximum, not potentially sex variability in sizes. Later in the methods, the authors state that they selected the measure of size that was most common within a family. Does this mean that species within a given family that didn't have that measure of body size were removed from the analysis?
  
  We appreciate this important point and agree that heterogeneity in how body size is measured (e.g., mean, maximum, or sex-specific estimates) is a real and unavoidable challenge in large-scale trait syntheses. Our analytical approach was explicitly designed to minimize the influence of this heterogeneity while retaining as many species as possible, rather than excluding species based on inconsistent trait metadata.
  
  Specifically, species within a family were not removed based on the availability of a particular body size definition. All species with at least one body size estimate were retained. When multiple measures existed for a species, we selected the measurement type that was most commonly available within each family to maximize comparability while preserving sample size. Remaining heterogeneity among measurement types (including units, measurement detail, and whether values reflected means, maxima, or sex-specific estimates) was explicitly accounted for through log-transformation and metadata-aware centering and scaling, with measurement metadata included as random intercepts in the hierarchical models. We have clarified this point in the Methods:
  
  “Importantly, this procedure did not result in the exclusion of species lacking a particular body size definition; rather, all species with at least one available body size estimate were retained, with measurement heterogeneity explicitly accounted for through metadata-aware scaling and hierarchical modeling.”
  
  In addition, our taxonomic modeling strategy was intentionally hierarchical. Species belonging to families that did not meet the minimum threshold for family-level modeling (≥10 species) were not discarded; rather, they were included in higher-level taxonomic analyses (e.g., order- or class-level models), ensuring that available information was retained wherever statistically appropriate. This approach reflects our broader goal of maximizing data inclusion while matching inference to the resolution supported by the data.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Overlap between VIIRS and GBIF data: While it would have been nice for the GBIF records and VIIRS timescales to match, the degree of mismatch isn't overly large (2010-2021 vs 2015-2021), and any bias or inaccuracies should be minimal. I am mainly making this comment as a potential counterpoint to a possible criticism from other reviewers.
  
  We thank the reviewer for this helpful observation and agree with their assessment. While the temporal coverage of GBIF occurrence records (2010–2021) and VIIRS night-time lights data (2015–2021) does not perfectly overlap, the mismatch is relatively small and unlikely to introduce substantial bias, particularly given our focus on broad, global patterns of urban affinity rather than fine-scale temporal dynamics. We appreciate the reviewer highlighting this point as a potential counterargument to concerns about temporal alignment.
  
  (2) Line 87: "only a select few species seem to possess traits that enable them to thrive in urban...".
  
  This seems like an odd statement, given how many of these species have positive urban tolerance measures.
  
  Agreed that this was oddly worded. We have revised for clarity, focusing on the magnitude of urban affinity:
  
  “Similarly, much like the skewed distributions observed in SADs [24,26], the skewed shape of SUDs indicates that while many species exhibit some degree of urban affinity, a relatively small subset of species attain high levels of urban affinity and dominate urban environments.”
  
  (3) Line 81: "skewed shape of SUDs suggests that traits enabling species to tolerate urban environments are both rare and specific".
  
  Again, based on the shape of some of these curves, I'm not convinced that it is rare, and there is nothing about these curves that suggests it is something "specific". Indeed, urban tolerance could be very multivariate, and the authors' own results suggest this is indeed the case.
  
  We have revised the sentence to retain a focus on traits while avoiding overinterpretation of adaptation from the distributional patterns alone. The revised wording emphasizes the uneven expression of high urban affinity across species without implying rarity or trait specificity:
  
  “The skewed shape of SUDs suggests that traits enabling species to tolerate urban environments are unevenly expressed, given that only a handful of species show extreme urban affinity values, but our results suggest this is geographically widespread across taxa.”.
  
  We also agree with the likelihood that it is multivariate, and return to this in the conclusion in a stronger sense:
  
  “Although body size emerged as a predictor of urban affinity, we found not only substantial heterogeneity across families and orders, but also that body size filtering alone is unlikely to explain the consistently skewed SUD shape. Taken together, these patterns suggest that urban affinity likely emerges from multiple trait combinations rather than a single, universally advantageous trait, and that strong affinity to urban environments is not uniformly expressed across taxa, despite occurring broadly across regions.”.
  
  (4) Line 100: "UHI", avoid abbreviations unless absolutely necessary.
  
  We have removed this abbreviation throughout.
  
  (5) Body size: focusing on one trait seems like a shot in the dark, and so it isn't too surprising that this didn't reveal a strong or consistent pattern. However, I also recognize that collecting consistent trait data across so many taxa is challenging, and size is a low-hanging fruit that correlates with multiple traits. Perhaps discuss more the range of traits you think are most likely to predict urban tolerance.
  
  Body size is indeed the ‘easiest’ to collect, but we acknowledge that there are other traits which could be important, and body size correlates with multiple traits. We revised our discussion to be more comprehensive to discuss some of the additional traits, and be explicit about the shortfalls of body size:
  
  “Ultimately, the heterogeneous and sometimes weak relationships between body size and urban affinity suggests that body size alone cannot explain the emergence of extreme urban exploiters and the skewed shape of SUDs. Focusing on body size as a focal trait necessarily represents a simplification of the multidimensional processes underlying species’ responses to urbanization, driven in part by data availability when conducting a taxonomically-broad synthesis. Instead, urban affinity likely depends on multivariate trait combinations [17,58] that vary among taxa [59] and ecological contexts [60]. Traits that are likely to correlate with urban affinity include dispersal capacity, behavioral flexibility, diet breadth, reproductive strategy, thermoregulatory ability, and, in plants, life history traits such as growth form, clonality, phenology, and seed size. The diversity of trait pathways through which species may persist or thrive in urban environments is consistent with the pronounced taxonomic heterogeneity we observe and helps explain why body size alone does not yield a universal pattern.”
  
  (6) Figure S2: This figure and analysis appear to 'come out of nowhere'. I think this is distracting and tangential, and it should be removed. I have the same thoughts about Figure S3. While I do think a discussion of other traits to measure is well warranted and needed, the inclusion of "preliminary' results that aren't motivated by clear questions, appropriate context, and rigorous analysis should be discouraged.
  
  We have removed Figure S2 and Figure S3 in response to this comment.
  
  I hope the authors find my constructive comments useful in their revision process.
  
  This was a very thorough and thoughtful review. We are greatly appreciative of the opportunity and guidance to improve our work!
  
  Reviewer #3 (Recommendations for the authors):
  
  Here is a list of a number of further points that the authors may want to address:
  
  (1) Figure 1 somehow misses the fact that humans simply do not want very large animals in the city. We kill large predators if they come too close to cities, and the same for large herbivores such as wild boar or deer.
  
  We agree that direct human persecution and management of large-bodied species can influence which species occur in urban environments, particularly for large predators and herbivores. Such processes represent important mechanisms shaping urban species assemblages and represent an entire field of socio-ecological dynamics. We have now clarified this point in the Discussion by noting that human–wildlife conflict, management, and persecution could contribute to observed size–urbanization relationships for some taxa, and that disentangling these mechanisms represents an important direction for future research. We added some text to highlight this point):
  
  “Similarly, human–wildlife conflict and active management of large-bodied animals in cities may influence which species persist in urban environments, potentially constraining the upper end of the body size distribution. Taken together, these examples illustrate the importance of considering the socio-ecological context of urban species assemblages [65]”.
  
  (2) Line 270. So you removed all data from the grid-based survey?
  
  We did not remove all data originating from grid-based surveys or gridded products. Rather, we retained GBIF point-occurrence records and applied a standard spatial filtering step, removing only those individual observations with reported coordinate uncertainty greater than 1 km. This was done to ensure reliable alignment between species occurrence points and remotely sensed environmental layers. We have clarified this distinction in the Methods to avoid confusion:
  
  “Due to uncertainty in matching observations with remotely-sensed products, any GBIF observation with a coordinate uncertainty > 1 km was removed. This filtering step removed individual observations with high spatial uncertainty, rather than excluding entire datasets or survey types.”.
  
  (3) Line 278. Human population density?
  
  Yes, we have added ‘human’ here (and elsewhere in this section) to make this clearer to the reader.
  
  (4) Line 284. What is a pixel?
  
  We have modified the text to make this clearer:
  
  “VIIRS Stray Light Corrected Nighttime Day/Night Band Composites product, representing monthly composites, (i.e., this dataset in Google Earth Engine: NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG) with a native resolution of ~500 m<sup>2</sup>. We took the median of all monthly composites for each pixel (i.e., a single grid cell of the night-time lights raster representing a fixed ground area) to calculate a pixel-level urbanization value, measured in average radiance, and used imagery from January 2015 to January 2021 to calculate this median”.
  
  (5) Line 292. It seems to me that lighting is different in different types of cities with the same level of impervious surface, depending on local customs of how many lights are installed, left switched on, etc. I guess that petrol stations and strongly lit industrial areas both produce high levels of light, while for the industrial areas, there could be lawn or other vegetation?
  
  We thank the reviewer for this thoughtful observation and agree that night-time lighting can vary across cities with similar levels of impervious surface due to differences in land use, infrastructure, and cultural lighting practices. We do not interpret VIIRS night-time lights as a direct measure of any single urban feature, but rather as a continuous, integrative proxy for urbanization that captures the combined footprint of human activity, infrastructure intensity, and energy use. VIIRS radiance has been repeatedly shown to correlate strongly with human population density, built infrastructure, and urban extent, while being negatively correlated with vegetation cover (e.g., EVI). It is repeatedly used in remote sensing and urban sustainability literature. This approach is widely supported in the literature, for example:
  
  Panić et al. used night-time lights were to map spatial and temporal patterns of artificial lighting as a proxy for human population distribution and activity, distinguishing areas of urban and rural occupancy.
  
  (https://www.ceeol.com/search/article-detail?id=1035395)
  
  Zhou et al. used night-time light observations were to develop a globally consistent time series of annual urban extent, delineating urban clusters and quantifying global urban growth over decades. (https://doi.org/10.1016/j.rse.2018.10.015)
  
  Chakraborty & Stokes used night-time light time series with machine learning to detect and quantify urban change processes—identifying deviations from expected radiance trends to monitor diverse urban transitions.
  
  (https://doi.org/10.1016/j.rse.2023.113818)
  
  Zhao et al. reviewed night-time light remote sensing was for its broad capacity to quantify human activities and socioeconomic dynamics—such as urbanization, economic change, and environmental impacts—across scales.
  
  (https://doi.org/10.3390/rs11171971)
  
  Zheng et al. used VIIRS nightime lights across 30 global megacities to produce a classification scheme to disentangle urban land changes into five categories, and assess global urbanization processes. (https://doi.org/10.1016/j.isprsjprs.2021.01.002)
  
  Zhao et al. argue that nighttime lights provide a consistent dataset to model and interpret urbanization dynamics and use this to track urban dynamics in Southeast Asia. (https://doi.org/10.1016/j.rse.2020.111980)
  
  While localized mismatches may occur (e.g., brightly lit industrial areas with surrounding vegetation), such heterogeneity is expected to introduce additional variance rather than systematic bias in the measure of urbanization, making our inference conservative. We have clarified this interpretation and added additional supporting references in the Methods:
  
  “Previous work has shown that VIIRS night-time lights is negatively correlated with greenness measured through the Enhanced Vegetation Index (EVI) and positively correlated with human population density [69,71]. Although night-time light intensity can vary among cities with similar impervious surface due to differences in land use, infrastructure, and cultural lighting practices, at broad spatial scales it functions as an integrative proxy of urbanization [75,76,77,78,79,80], with localized heterogeneity contributing primarily to additional variance rather than systematic bias.”
  
  (6) Line 295. How did you reconcile the spatial uncertainty of >1km with an urbanization pixel of 150m2? For how many species did you have a higher uncertainty than pixel size? In my experience, your ca. 39m accuracy is a strong assumption for GBIF data.
  
  We would like to clarify that we do not assume species occurrence accuracy at the scale of the geohash blocks (i.e., tens of meters), and we do not interpret GBIF records as having ca. 39 m positional accuracy. The use of geohash7 (~150 m blocks) reflects a computational indexing choice, not an assumption about biological or observational precision. All GBIF observations with reported coordinate uncertainty greater than 1 km were removed prior to analysis, ensuring that retained occurrences were compatible with the effective spatial resolution of the remotely sensed urbanization data. Importantly, the effective spatial resolution of our urbanization metric remains that of the VIIRS night-time lights product (~500 m). Geohash encoding at a finer resolution was used solely to efficiently associate point occurrences with the appropriate VIIRS pixel while avoiding redundant extraction or averaging across adjacent pixels. This approach does not increase the effective spatial precision of the analysis, nor does it imply sub-pixel inference. We have clarified this in the Methods:
  
  “The VIIRS night-time lights data, with a native resolution of ~500 m<sup>2</sup>, was then matched to these blocks by assigning each geohash7 block the average VIIRS radiance value that intersects it. We do not assume positional accuracy at the scale of the geohash blocks, but geohash encoding was used solely for computational indexing, while the effective spatial resolution of the urbanization metric is that of the VIIRS data (~500 m). This approach allows us to avoid unnecessary redundancy in the data while maintaining the original VIIRS resolution”.
  
  (7) Line 296. Why this high resolution in the species data when your light data is 500m2?
  
  The apparent mismatch in resolution reflects a distinction between data handling resolution and analytical resolution. Species occurrence records were retained at their native point-level precision to avoid premature spatial aggregation and to ensure that each observation could be accurately matched to the appropriate VIIRS night-time lights pixel. The finer-resolution geohash encoding does not imply that species data were analyzed at that scale, nor does it increase the effective spatial resolution of the analysis. We note, however, that the reported spatial uncertainty of some GBIF records may approach or exceed the resolution of the VIIRS data. Retaining such records represents a deliberate trade-off between spatial precision and data coverage, and is necessary to maximize taxonomic and geographic representation in a global analysis of this scope. Importantly, any residual spatial uncertainty is expected to introduce additional noise rather than systematic bias, making our estimates of species–urban affinity relationships conservative.
  
  (8) If you could show how your results match the results of Hahs et al and others with respect to occurrence and traits, this would strengthen your approach.
  
  We agree that explicitly comparing our findings with prior trait-based studies strengthens the interpretability of our approach. We have now added text to the Discussion that directly compares our results with published analyses, including Hahs et al. (2023) and other taxon-specific studies. In particular, we highlight where our occurrencebased estimates recover similar body size–urbanization relationships (four of five taxa in Hahs et al.) and where they differ (e.g., carabids), and we discuss how such differences likely arise from variation in spatial grain, response variables, and definitions of urbanization. These additions clarify how our framework aligns with, complements, and extends existing trait-based work rather than replacing it.
  
  (9) I wonder whether you could run your analysis with simplified data. In the end, you do not talk much about how high the urban score is, so you may also aggregate values to "highly lighted", "lighted", "some light" and "dark" and re-do the analysis, after checking how these scores correlate with e.g. impervious surface in a slightly larger area than what you used (maybe 50x50m).
  
  Our analytical framework—and the concept of Species Urbanness Distributions (SUDs) in particular—relies on retaining the continuous nature of the underlying urbanization metric. Discretizing night-time light values would necessarily introduce arbitrary thresholds, reduce information content, and obscure subtle but ecologically meaningful variation in species’ relative affinities to urban environments. Because we focus on relative affinity patterns rather than absolute urbanization classes, maintaining a continuous metric is central to both our methodological approach and conceptual contribution. That said, we agree that exploring how continuous urban affinity scores relate to categorical urban classes or alternative urbanization proxies (e.g., impervious surface at different spatial grains) represents a valuable direction for future work. Such analyses could be particularly informative for translating continuous affinity metrics into applied conservation or urban planning contexts.
  
  AuthorResponse
Visit annotations in context

Tags

Review 1

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.15.676216v2
www.biorxiv.org www.biorxiv.org

Targeted lysosomal activation in bladder epithelium enhances clearance of intracellular uropathogenic Escherichia coli

2
1. EMBOpress 26 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1
  
  Evidence, reproducibility and clarity:
  
  In this paper, Tomasek and colleagues describe a series of experiments illuminating the effects of OM-89, a bacterial lysate taken orally for prevention of recurrent UTI, on intracellular dynamics of UPEC, using cell culture and organoid models. Suggestions for improvement and for clarification of the authors' conclusions and relevance to human UTI (and OM-89 use) are offered below.
  
  Major points:
  
  The data indicate that OM-89 exposure in the organoids enhances lysosomal degradation pathways and (in mBOs) autophagic flux, and the authors conclude this is a mechanism by which UPEC regrowth after antibiotic treatment (modeling rUTI) is inhibited by OM-89. They also show enhanced cellular uptake of fluorescently labeled antibiotics (ampicillin) in organoids - this leads them to conclude (and state in the paper's title) that increased intracellular antibiotic concentration effects increased killing of UPEC and decreased regrowth. These are two separate proposed mechanisms, and especially with regard to the antibiotics, they have not shown that increased intracellular antibiotic concentration actually kills intracellular UPEC in their model - only that regrowth as measured microscopically is less. In total, a mechanistic connection between the observed lysosomal effect and the intracellular antibiotic uptake, and which one is more important for UPEC control in this model, is incomplete. The precise wording of the paper's title should be reconsidered accordingly.
  
  We agree with the reviewer that our study does not establish a direct mechanistic connection between OM-89-induced lysosomal remodeling and enhanced intracellular antibiotic accumulation, nor does it definitively determine the relative contribution of each process to intracellular UPEC control. Further studies dissecting the molecular pathways underlying these phenotypes will be required to determine whether they are mechanistically linked or represent parallel epithelial defense responses induced by OM-89.
  
  Importantly, additional CFU experiments performed during revision (as suggested in point number 4) revealed that OM-89 already reduces intracellular bacterial burden following a classical gentamicin protection assay, prior to prolonged ampicillin exposure. These findings suggest that enhanced intracellular bacterial control cannot be explained solely by increased intracellular antibiotic accumulation and support a direct contribution of epithelial antimicrobial mechanisms, including lysosomal activation, to the observed phenotype. Nevertheless, the relative contribution of lysosomal remodeling and enhanced antibiotic uptake to bacterial clearance remains unresolved and will require further investigation.
  
  Accordingly, we changed the title to "Targeted lysosomal activation in bladder epithelium enhances clearance of intracellular uropathogenic Escherichia coli." This revised title avoids implying a direct causal link between increased intracellular antibiotic accumulation and bacterial clearance while reflecting the central biological process identified in our study.
  
  OM-89 is taken orally for rUTI prevention, and some "components" reach the urinary tract (line 81). But it isn't explained how applying OM-89 directly to organoids models how its components may reach the bladder epithelium (from the basolateral side, if the OM-89 is applied outside the organoids) in the whole animal or human. At the least, this limitation should be stated in the Discussion.
  
  We thank the reviewer for pointing out this limitation. Although advanced in vitro models help to better mimic the in vivo situation, they still do not fully recapitulate all aspects of drug exposure and delivery observed in vivo. We included the following statement of limitation now in the discussion in lines 493-503: “One limitation of our study is that OM-89 was applied directly to epithelial cultures and organoids, whereas in clinical use it is administered orally. Although pharmacokinetic studies have demonstrated systemic distribution and urinary accumulation of OM-89-derived components following oral administration (van Dijk, 1982), our experimental setup does not recapitulate the exact route, kinetics or concentration profiles encountered in vivo. Rather, our models were designed to determine whether bladder epithelial cells are capable of responding directly to OM-89-mediated signals and to identify the intracellular pathways involved. Given the documented systemic exposure following oral administration, direct effects on the urothelium are biologically plausible. However, future studies will be required to determine how the epithelial responses identified here integrate with the complex systemic and immune-mediated effects of OM-89 under physiological administration conditions.”
  
  In the lysosome studies starting on line 319, the cultured cells are all infected (and either treated with OM-89 or not). What observations regarding number and size of vesicles, etc (all the measures in Fig 6) are evident when cells are treated with OM-89 only? These data should be presented (at least as a supplemental figure) to enable optimal interpretation of the OM-89+UPEC data in Fig 6. As the authors themselves indicate, OM-89 may be having a generalized effect on endocytic and/or autophagic flux by bladder epithelial cells, independent of infection.
  
  We thank the reviewer for this helpful suggestion and agree that assessing OM-89 treatment in the absence of infection provides important context for interpreting the infection-associated phenotypes as shown in Figure 6.
  
  Accordingly, we have included additional supplementary data examining the effects of OM-89 alone in both murine and human bladder epithelial cells. Specifically, we added analyses of Lamp1-positive lysosomal vesicles, lysosomal acidification (LysoSensor), and Cathepsin L activity under uninfected conditions (Supplementary Figures 4A, 4G and 7D-F). We comment on these additional findings in the Result section in lines 242-246 and lines 366-370, and in the Discussion section in lines 469-483.
  
  These experiments, together with the transcriptional data in SI Figure 3D, demonstrate that key features of lysosome-centered remodeling and activation are already induced by OM-89 in the absence of infection, indicating that OM-89 directly modulates epithelial lysosomal pathways rather than merely amplifying infection-driven responses. Inclusion of these data provides additional context for interpreting the infection-associated phenotypes shown in the main figures and further supports the concept of OM-89 as a direct modulator of epithelial antimicrobial function.
  
  With the organoids, beyond the microscopic quantification of UPEC, can CFUs be measured?
  
  We appreciate the reviewer’s interest in obtaining orthogonal measurements of bacterial burden. Performing CFU quantification directly from microinjected organoids is technically challenging, as it requires highly reproducible injections into identical numbers of organoids while avoiding bacterial leakage into the surrounding extracellular matrix. Even minor variations or accidental release of bacteria into the Matrigel can substantially affect CFU recovery and compromise interpretation.
  
  To address the reviewer’s underlying question while avoiding these limitations, we performed intracellular CFU assays using differentiated mouse bladder epithelial monolayers. Following a classical gentamicin protection assay for 1 hour, OM-89-treated cells displayed significantly reduced intracellular bacterial burden compared with PBS controls (new Figure 2C). Addition of ampicillin for 3 hours after the gentamicin protection phase resulted in a similar trend but did not further significantly reduce the bacterial burden (new Figure 2D). We commented on these findings in the Results section in lines 169-182, and in the Discussion section in lines 463-469 and lines 474-483. We also updated the Methods section in lines 637-652 with the intracellular bacterial burden assay description.
  
  These experiments provide an orthogonal readout of intracellular bacterial burden and are consistent with enhanced epithelial control of intracellular UPEC. In addition, we would like to clarify that the higher-throughput microscopy approach used throughout the organoid experiments does not allow strict discrimination between luminal, intracellular and tissue-associated bacteria. We therefore revised the terminology throughout the manuscript and now consistently refer to the measured signal as “intra-organoid bacterial burden”. To clarify this point, we added the following statement to the Results section (line 115): “Hence, the microscopy data represent the total “intra-organoid” bacterial burden at each experimental stage, without distinguishing the exact localization of the bacteria – which can be luminal, intracellular or tissue-associated.”. Consistent with this clarification, we have replaced the term “antibiotic-mediated killing” throughout the manuscript with the more cautious wording “antibiotic-mediated clearance” or “reduced bacterial burden”, where appropriate.
  
  Minor points:
  
  In Fig 1A, the "co-application" horizontal line is under the 7-10 hour window, but the text suggests that the application of antibiotics and OM-89 in this experiment is between 4-7 hours.
  
  We thank the reviewer for pointing this out. Indeed, in the co-application regime, OM-89 is added at the same timepoint as the antibiotic – meaning straight after monitoring the growth phase at 4h post-infection (pi). We now adapted the horizontal line for the “co-application” treatment in Figure 1A accordingly to represent the time-point of OM-89 addition better. Additionally, we added a line for the antibiotic-treatment in order to further facilitate readability.
  
  How are antibiotics and OM-89 "removed" at the 7-hour mark? This was not detailed in the Methods.
  
  Although we had specified this in the methods section (now line 682: “For every media exchange (e.g. antibiotic treatment or withdrawal), each well was washed with 9 ml of the respective media before leaving 1 ml in the well.”), we realized the positioning was not optimal as we had mentioned this part under the point “Bacterial injection” in “Injection experiments”. We therefore now separated this part, together with the lid preparation, from the “Bacterial injection” part and created the new subsection “Lid preparation for media changes” (line 668 onwards).
  
  What time point was used for the transcriptomic profiling of organoids? This is not clear from the relevant Methods or Results sections.
  
  As stated in the methods section, RNA for transcriptomic profiling from mBOs was extracted at 4h post-infection (pi) (now line 892).
  
  In showing that OM-89 "attenuated" the magnitude of inflammatory responses (Fig 2C and S3B), it would be helpful to add a panel showing the comparison of OM89+UPEC to PBS alone - this would be expected to convey activity (red) in the infection-related pathways, but to a lower magnitude than seen in UPEC vs PBS.
  
  Please see our combined response at point 5.
  
  Similarly, in the results outlined starting on line 196, it would be helpful to add a panel showing OM89+UPEC vs OM89 alone.
  
  We thank the reviewer for these suggestions. We performed the requested additional analyses and generated Gene Ontology Biological Process (GOBP) enrichment plots comparing (i) PBS+UPEC versus PBS, (ii) OM-89+UPEC versus PBS and (iii) OM-89+UPEC versus OM-89.
  
  As anticipated by the reviewer, these analyses show that infection-associated pathways remain induced in OM-89-treated infected organoids but with a reduced magnitude compared with infected PBS controls. Specifically, pathways that are strongly enriched in the PBS+UPEC versus PBS comparison display lower enrichment significance and effect size in the OM-89+UPEC versus PBS comparison. Furthermore, many of these pathways are no longer significantly enriched in the direct OM-89+UPEC versus OM-89 comparison, indicating that OM-89 attenuates the transcriptional inflammatory response induced by UPEC infection. These observations are consistent with our original interpretation, concluded from Figure 3C, that OM-89 dampens excessive infection-associated inflammatory signaling while preserving epithelial antimicrobial activity.
  
  Importantly, we found that the direct comparison between PBS+UPEC and OM-89+UPEC, presented in the original Figure 3C, remains the most informative representation of the OM-89 effect because it controls for infection status while specifically highlighting the transcriptional changes induced by OM-89. By contrast, comparisons against PBS or OM-89 alone involve simultaneous changes in both infection and treatment status, making biological interpretation less straightforward.
  
  Nevertheless, because the additional analyses directly address the reviewer's request and provide complementary context for interpreting Figure 3C, we have included them in Supplementary Figure 3B.
  
  In line 236, what is meant by lysosomal "activation"? A more specific term should be chosen here.
  
  We thank the reviewer for this question and aim to increase readability of this section. With lysosomal activation in the first sentence of the mentioned paragraph, we referred to the observed effect of upregulated lysosomal pathways and enhanced lysosomal function (measured by alterations in lysosomal vesicles) in the previous paragraph. However, to make the connection to the previous paragraph better, and given the comment number two of reviewer number two, we changed the whole first paragraph of this section. Therefore, the first sentence of this paragraph (line 252 onwards) reads now: “To test whether the observed effects on lysosomal pathways could mechanistically, at least in parts, explain OM-89-mediated protection, we first used Genebridge analysis (Li et al, 2019) to examine how the lysosomal gene signature identified in our RNA-seq data relates to host defense programs in the human bladder.”
  
  In the Abstract (line 25), the phrase "Using bladder organoids..." is a dangling modifier.
  
  We thank the reviewer for pointing this out and changed the sentence accordingly to “OM-89 promotes lysosomal acidification and increases lysosomal protease activity in bladder organoids and differentiated epithelial monolayers, thereby directing intracellular UPEC toward degradative compartments.” (now line 24)
  
  Typographical and copyediting:
  
  We thank the reviewer for identifying typographical errors and have corrected them throughout the manuscript.
  
  Line 74 should read "For instance..."
  
  Line 76 should read "when combined with antibiotic therapy..."
  
  As this sentence is to emphasize the already observed protective effects of OM-89, and the two studies mentioned were either performed without or in combination with antibiotics, we changed the sentence to “For instance, rodent infection studies have demonstrated protective effects of OM-89 alone (Bosch et al, 1988; Lee et al, 2006) and in combination with antibiotic therapy (Canton et al, 2025; Bessler et al, 2010), although this observed in vivo protection could not be linked to any major quantitative changes in bladder immune cell infiltration (Canton et al, 2025), leaving the underlying molecular mechanism not fully resolved.” for better readability. (now line 71)
  
  Line 122 should read "...regrowth following antibiotic treatment" or "regrowth post-antibiotic treatment"
  
  Line 138 should use "regimen" not "regime"
  
  Line 196 delete comma after "Although"
  
  Line 244 fully hyphenate "OM-89-mediated"
  
  Line 374 should read "...significantly enhance antibiotic-mediated killing"
  
  Significance:
  
  The paper is very well written and though a lot of data are included, the presentation is excellent and helps the reader to follow the story. The paper makes a strong contribution to the UTI pathogenesis field, and the use of mouse and human bladder organoids is innovative in studying intracellular UPEC. My scientific expertise as a reviewer is in UPEC pathogenesis, directly relevant to the content of this paper.
  
  Reviewer #2
  
  Evidence, reproducibility and clarity:
  
  This study examined the effect of OM-89 on UPEC infection, antibiotic clearance, and resurgence in mouse and human organoid models. The goal of the study was to understand the molecular mechanisms by which OM-89 is effective at preventing rUTI in patients.
  
  Major comments:
  
  The manuscript is well-written and the figures are well presented. Adequate background information is provided to give the study context and sufficient experimental details are provided to allow replication by other groups. Experiments contain appropriate controls and sufficient replicates to allow appropriate statistical analyses. The authors are careful to acknowledge the differences they observed between the mouse and human system and provide satisfactory potential explanations for these differences. The conclusions they draw are well supported by their data and none of their claims from their data are overstatements. Below are some, which I believe if addressed could improve the paper.
  
  I think the authors overstate the novelty of the concept that the urothelium is an active targetable determinant of infection and treatment outcomes. This is not an entirely new concept since previous studies have examined antimicrobial peptides and other factors from the urothelium.
  
  We thank the reviewer for this important point and agree that the urothelium has long been recognized as an active participant in host defense through mechanisms such as antimicrobial peptide production, pathogen sensing and regulation of inflammatory responses. We have therefore revised the manuscript to avoid implying that urothelial involvement in infection outcome is itself a novel concept. Instead, we now emphasize the specific advance of our study: the identification of lysosome-centered epithelial activation as a therapeutically targetable mechanism that enhances intracellular bacterial clearance and potentiates antibiotic efficacy.
  
  In the abstract we changed: “Our findings position the bladder epithelium from a passive barrier to an active, targetable determinant of treatment outcome and suggest host-directed modulation of epithelial antimicrobial pathways as a promising strategy to enhance intracellular bacterial clearance.” to “Our findings demonstrate that bladder epithelial antimicrobial pathways can be pharmacologically reinforced to influence treatment outcomes by enhancing intracellular bacterial clearance.” in line 29.
  
  In the introduction we changed: “Together with increased intracellular accumulation of antibiotics across different classes, this leads to improved intracellular killing and reduced bacterial regrowth across diverse UPEC strains.” to “Together with increased intracellular accumulation of antibiotics across different classes, these changes are associated with improved intracellular clearance and reduced bacterial regrowth across diverse UPEC strains.” in line 90 and “Together, these findings reveal a previously unrecognized epithelial lysosome-centered mechanism by which OM-89 enhances intracellular antibiotic performance and repositions the bladder epithelium from a passive reservoir of infection reactivation to an actively transformable antimicrobial compartment influencing treatment outcomes.” to “Together, these findings reveal a previously unrecognized lysosome-centered epithelial mechanism by which OM-89 strengthens bladder epithelial antimicrobial defenses and enhances intracellular bacterial clearance, identifying enhanced lysosomal function as a therapeutically targetable component of host defense.” in line 95.
  
  In the discussion we changed: “Together, these findings provide a mechanistic framework for the long-observed clinical efficacy of OM-89. Our findings reveal that the urothelium itself can be therapeutically targeted to reduce pathogen regrowth by transforming the epithelial barrier from a passive refuge for UPEC into an active defense site.” to “Together, these findings provide a mechanistic framework for the long-observed clinical efficacy of OM-89 and identify epithelial lysosomal pathways as a therapeutically targetable component of host defense that can be used to improve intracellular bacterial clearance.” in line 421 and “In the face of rising antimicrobial resistance (2024), strengthening epithelial antimicrobial function offers a complementary route to shift the bladder mucosa from a passive niche of bacterial survival and infection reactivation toward an active site of accelerated pathogen clearance.” to “In the face of rising antimicrobial resistance (2024), our findings provide a mechanistic rationale for the clinical use of OM-89 and support epithelial lysosomal pathways as a promising target for host-directed therapeutic strategies that enhance intracellular bacterial clearance and improve the efficacy of existing antibiotics.” in line 513.
  
  Depending on the target audience, the Module-Module association analysis could need more introduction. I am not a computational biologist and it was not obviously apparent how Figure 4A is generated and what it actually showing. How specifically does this analysis demonstrate a functional link between lysosomal activity and immune defense pathways? Without further explanation, it is my opinion that this figure panel is an unnecessary distraction that is not required for any of the conclusions that the group can already draw from the rest of their data.
  
  We thank the reviewer for this constructive critique. We agree that the rationale and interpretation of this analysis were not sufficiently explained in the original manuscript. We have therefore expanded the description of the MMAS approach and clarified how these data support the translational relevance of the lysosomal pathways identified in our experimental models.
  
  Specifically, we now explain that the Module-Module Association Score (MMAS) analysis evaluates transcriptional correlations between the lysosomal gene network and functional biological pathways across eight independent human bladder transcriptomic datasets comprising more than 1,400 clinical samples. We further highlight the strong positive associations observed with host defense modules, including “response to molecule of bacterial origin”, “cell activation involved in immune response”, and “innate immune response”. These additions clarify both the methodology and the rationale for including Figure 5A as a translational bridge between our experimental findings and human bladder biology.
  
  The revised text (starting at line 251) now reads: “To test whether the observed effects on lysosomal pathways could mechanistically, at least in parts, explain OM-89-mediated protection, we first used Genebridge analysis (Li et al, 2019) to examine how the lysosomal gene signature identified in our RNA-seq data relates to host defense programs in the human bladder. To evaluate the translational relevance of our experimental findings, we used a computational Module-Module Association Score (MMAS) analysis across eight independent human bladder transcriptomic datasets comprising over 1,400 clinical samples. This network-based approach evaluates the transcriptional correlation between the lysosomal gene network and functional biological pathways across diverse human cohorts. Module-Module association analysis performed on these human bladder datasets indicated that the lysosome module has strong positive associations with specific host defense modules, including "response to molecule of bacterial origin", "cell activation involved in immune response", and "innate immune response" (Figure 5A), highlighting a conserved functional link between lysosomal activity and immune defense pathways in the bladder epithelium. Altogether, these positive correlations suggest that enhanced lysosomal function represents a conserved pathway integrated within mucosal immunity across species, rather than an isolated cellular response unique to our experimental models.”
  
  Significance:
  
  General assessment: Solid experimental design with appropriate controls. Appropriate statistical rigor. Conclusions justified by the data. Limitations acknowledged. Differences in results between mice and humans acknowledged.
  
  Advance: Moderate technical advance building on prior organoid models. Significant mechanistic advance because OM-89 has been widely used for a long time without detailed understanding of why it works. Moderate conceptual advance that urothelial cells are a targetable determinant of treatment outcomes.
  
  Audience: I am a basic science researcher in the field of female urogenital tract microbiome and infections. Other researchers studying UTI will certainly be interested in this study. It also may be of interest to people studying other bladder conditions that involve the urothelium (bladder cancer).
  
  PeerReviewed
2. EMBOpress 10 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1
  
  Evidence, reproducibility and clarity:
  
  In this paper, Tomasek and colleagues describe a series of experiments illuminating the effects of OM-89, a bacterial lysate taken orally for prevention of recurrent UTI, on intracellular dynamics of UPEC, using cell culture and organoid models. Suggestions for improvement and for clarification of the authors' conclusions and relevance to human UTI (and OM-89 use) are offered below.
  
  Major points:
  
  The data indicate that OM-89 exposure in the organoids enhances lysosomal degradation pathways and (in mBOs) autophagic flux, and the authors conclude this is a mechanism by which UPEC regrowth after antibiotic treatment (modeling rUTI) is inhibited by OM-89. They also show enhanced cellular uptake of fluorescently labeled antibiotics (ampicillin) in organoids - this leads them to conclude (and state in the paper's title) that increased intracellular antibiotic concentration effects increased killing of UPEC and decreased regrowth. These are two separate proposed mechanisms, and especially with regard to the antibiotics, they have not shown that increased intracellular antibiotic concentration actually kills intracellular UPEC in their model - only that regrowth as measured microscopically is less. In total, a mechanistic connection between the observed lysosomal effect and the intracellular antibiotic uptake, and which one is more important for UPEC control in this model, is incomplete. The precise wording of the paper's title should be reconsidered accordingly.
  
  We agree with the point raised by the reviewer that we did not show a mechanistic connection between the observed lysosomal effect and the intracellular antibiotic uptake. Further experiments dissecting the exact involved mechanistic pathways driving both - either in conjunction or separately - would improve our understanding on how OM-89 leads to its positive effects. In future studies we will focus on dissecting the underlying pathways and determining whether a mechanistic connection exists to explain the observed positive effects of OM-89 between lysosomal degradation and enhanced intracellular antibiotic accumulation.
  
  Accordingly, we changed the title to "Targeted lysosomal activation in bladder epithelium enhances clearance of intracellular uropathogenic ____Escherichia coli". This revised title avoids implying a direct causal link between increased intracellular antibiotic accumulation and bacterial clearance, while still reflecting the central biological process identified in our study.
  
  Additionally, we incorporated changes in the introduction, as highlighted in our reply to point number one raised by reviewer number two.
  
  OM-89 is taken orally for rUTI prevention, and some "components" reach the urinary tract (line 81). But it isn't explained how applying OM-89 directly to organoids models how its components may reach the bladder epithelium (from the basolateral side, if the OM-89 is applied outside the organoids) in the whole animal or human. At the least, this limitation should be stated in the Discussion.
  
  We thank the reviewer for pointing out this limitation. Although advanced in vitro models help to better mimic the in vivo situation, they still do not fully recapitulate all aspects of drug exposure and delivery observed in vivo. We included the following statement of limitation now in the discussion in line 449-459: "One limitation of our study is that OM-89 was applied directly to epithelial cultures and organoids, whereas in clinical use it is administered orally. Although pharmacokinetic studies have demonstrated systemic distribution and urinary accumulation of OM-89-derived components following oral administration (van Dijk, 1982), our experimental setup does not recapitulate the exact route, kinetics or concentration profiles encountered ____in vivo. Rather, our models were designed to determine whether bladder epithelial cells are capable of responding directly to OM-89-mediated signals and to identify the intracellular pathways involved. Given the documented systemic exposure following oral administration, direct effects on the urothelium are biologically plausible. However, future studies will be required to determine how the epithelial responses identified here integrate with the complex systemic and immune-mediated effects of OM-89 under physiological administration conditions."
  
  In the lysosome studies starting on line 319, the cultured cells are all infected (and either treated with OM-89 or not). What observations regarding number and size of vesicles, etc (all the measures in Fig 6) are evident when cells are treated with OM-89 only? These data should be presented (at least as a supplemental figure) to enable optimal interpretation of the OM-89+UPEC data in Fig 6. As the authors themselves indicate, OM-89 may be having a generalized effect on endocytic and/or autophagic flux by bladder epithelial cells, independent of infection.
  
  We thank the reviewer for this suggestion and agree that evaluating OM-89 treatment in the absence of infection provides important context for interpreting the infection-associated phenotypes shown in Figure 6. Our original intention was to focus the main manuscript on the effects of OM-89 during UPEC infection, and we therefore did not include the corresponding uninfected conditions.
  
  As part of the planned revision, we will include additional supplementary data examining the effects of OM-89 alone in both murine and human bladder epithelial cells. Specifically, we will present analyses of Lamp1-positive lysosomal vesicles, lysosomal acidification (LysoSensor), and Cathepsin L activity under uninfected conditions. These experiments will allow readers to assess the extent to which OM-89 activates epithelial lysosomal pathways independently of infection and will provide important context for interpreting the infection-associated responses presented in the main figures.
  
  We agree with the reviewer that OM-89 may exert broader effects on epithelial lysosomal pathways beyond the setting of infection, and inclusion of these data will strengthen the interpretation of OM-89 as a direct modulator of epithelial antimicrobial function.
  
  With the organoids, beyond the microscopic quantification of UPEC, can CFUs be measured?
  
  We understand the wish of the reviewer to see CFU measurements performed on organoids. However, this imposes strong technical limitations, mainly due to the tedious and technically challenging microinjections, e.g. the exact same amount of organoids would need to be infected by microinjections in both conditions (OM-89 and control) and injections would need to be performed extremely precise with no bacteria spreading into the surrounding extracellular matrix (frequently, organoids would get penetrated with the microneedle all the way, leading to bacteria being not injected into the lumen but rather into the wall of the organoid or even be released on the other side of the organoid) as otherwise also bacteria escaping into the extracellular matrix would be collected upon recovering the organoids from the extracellular matrix domes, strongly affecting the CFU measurements.
  
  However, using differentiated monolayers of mouse bladder epithelial cells and performing a classic gentamicin protection assay would add an additional layer of information on the purely intracellular bacterial population, whilst overcoming the previously mentioned technical challenges. Therefore, we aim to perform CFU measurements on monolayers with and without OM-89 treatment to support our microscopic quantification and specifically be able to make a statement on reduced intracellular bacterial burden with OM-89 treatment. The CFUs will therefore provide an orthogonal measure of intracellular bacterial burden and complement the microscopy-based quantification during the infection and antibiotic-treatment phases.
  
  Adding to this point of the reviewer, we wanted to clarify that with the higher-throughput microscopic quantification used in our approach (Thunder widefield microscope at 25x magnification), we cannot distinguish between strictly intracellular or tissue-associated bacteria, hence we used the wording "intra-organoid" in our methods section. We now added this information also into the results section for clarification (line 116): "Hence, the microscopy data represent the total "intra-organoid" bacterial burden at each experimental stage, without distinguishing the exact localization of the bacteria - which can be luminal, intracellular or tissue-associated.". To further reflect this, we stepped back from referring to antibiotic-mediated "killing", but changed the wording to antibiotic-mediated "clearance" or referred to reduced bacterial burden throughout the manuscript.
  
  __Minor points:____ __
  
  In Fig 1A, the "co-application" horizontal line is under the 7-10 hour window, but the text suggests that the application of antibiotics and OM-89 in this experiment is between 4-7 hours.
  
  We thank the reviewer for pointing this out. Indeed, in the co-application regime, OM-89 is added at the same timepoint as the antibiotic - meaning straight after monitoring the growth phase at 4h post-infection (pi). We now adapted the horizontal line for the "co-application" treatment in Figure 1A accordingly to represent the time-point of OM-89 addition better. Additionally, we added a line for the antibiotic-treatment in order to further facilitate readability.
  
  How are antibiotics and OM-89 "removed" at the 7-hour mark? This was not detailed in the Methods.
  
  Although we had specified this in the methods section at line 603 "For every media exchange (e.g. antibiotic treatment or withdrawal), each well was washed with 9 ml of the respective media before leaving 1 ml in the well.", we realized the positioning was not optimal as we had mentioned this part under the point "Bacterial injection" in "Injection experiments". We therefore now separated this part, together with the lid preparation, from the "Bacterial injection" part and created the new subsection "Lid preparation for media changes" (line 613 onwards).
  
  What time point was used for the transcriptomic profiling of organoids? This is not clear from the relevant Methods or Results sections.
  
  As stated in the methods section, RNA for transcriptomic profiling from mBOs was extracted at 4h post-infection (pi) (line 842).
  
  In showing that OM-89 "attenuated" the magnitude of inflammatory responses (Fig 2C and S3B), it would be helpful to add a panel showing the comparison of OM89+UPEC to PBS alone - this would be expected to convey activity (red) in the infection-related pathways, but to a lower magnitude than seen in UPEC vs PBS.
  
  We thank the reviewer for this suggestion, as well as comment number 5 below. We comment more on both suggestions below.
  
  Similarly, in the results outlined starting on line 196, it would be helpful to add a panel showing OM89+UPEC vs OM89 alone.
  
  We performed the requested, combined GOBP analyses and they confirm that infection-associated pathways remain strongly activated in OM89-treated infected organoids relative to baseline (PBS) controls and relative to OM89-treated uninfected organoids. These results confirm the reviewer's hypotheses and further confirm the results presented in Figure 2C. In fact, induction of genes involved in detrimental effects of UPEC infections are induced at a lower extent when organoids are exposed to OM-89 only.
  
  However, because the direct comparison between OM89+UPEC and PBS+UPEC already highlights the effect of OM-89 while controlling for the infection status, we believe our original analysis presented in Figure 2C remains the most informative representation of attenuation. Therefore, we will include the new comparison in the supplementary section of the manuscript.
  
  In line 236, what is meant by lysosomal "activation"? A more specific term should be chosen here.
  
  We thank the reviewer for this question and aim to increase readability of this section. With lysosomal activation in the first sentence of the mentioned paragraph, we referred to the observed effect of upregulated lysosomal pathways and altered lysosomal vesicles in the previous paragraph. However, to make the connection to the previous paragraph better, and given the comment number two of reviewer number two, we changed the whole first paragraph of this section. Therefore, the first sentence of this paragraph (line 235 onwards) reads now: "To test whether the observed effects on lysosomal pathways could mechanistically, at least in parts, explain OM-89-mediated protection, we first used Genebridge analysis (Li et al, 2019) to examine how the lysosomal gene signature identified in our RNA-seq data relates to host defense programs in the human bladder."
  
  In the Abstract (line 25), the phrase "Using bladder organoids..." is a dangling modifier.
  
  We thank the reviewer for pointing this out and changed the sentence accordingly to "In bladder organoids and differentiated epithelial monolayers, OM-89 promotes lysosomal acidification and increases lysosomal protease activity, driving intracellular UPEC toward degradative compartments."
  
  Typographical and copyediting:
  
  We thank the reviewer for pointing out the typographical errors below and we corrected them all.
  
  Line 74 should read "For instance..."
  
  Line 76 should read "when combined with antibiotic therapy..."
  
  As this sentence is to emphasize the already observed protective effects of OM-89, and the two studies mentioned were either performed without or in combination with antibiotics, we changed the sentence to "For instance, rodent infection studies have demonstrated protective effects of OM-89 alone (Bosch et al, 1988; Lee et al, 2006) and in combination with antibiotic therapy (Canton et al, 2025; Bessler et al, 2010), although this observed in vivo protection could not be linked to any major quantitative changes in bladder immune cell infiltration (Canton et al, 2025), leaving the underlying molecular mechanism not fully resolved." for better readability.
  
  Line 122 should read "...regrowth following antibiotic treatment" or "regrowth post-antibiotic treatment"
  
  Line 138 should use "regimen" not "regime"
  
  Line 196 delete comma after "Although"
  
  Line 244 fully hyphenate "OM-89-mediated"
  
  Line 374 should read "...significantly enhance antibiotic-mediated killing"
  
  *
  
  __Significance:____ __
  
  The paper is very well written and though a lot of data are included, the presentation is excellent and helps the reader to follow the story. The paper makes a strong contribution to the UTI pathogenesis field, and the use of mouse and human bladder organoids is innovative in studying intracellular UPEC. My scientific expertise as a reviewer is in UPEC pathogenesis, directly relevant to the content of this paper.
  
  Reviewer #2
  
  Evidence, reproducibility and clarity:
  
  This study examined the effect of OM-89 on UPEC infection, antibiotic clearance, and resurgence in mouse and human organoid models. The goal of the study was to understand the molecular mechanisms by which OM-89 is effective at preventing rUTI in patients.
  
  Major comments:
  
  The manuscript is well-written and the figures are well presented. Adequate background information is provided to give the study context and sufficient experimental details are provided to allow replication by other groups. Experiments contain appropriate controls and sufficient replicates to allow appropriate statistical analyses. The authors are careful to acknowledge the differences they observed between the mouse and human system and provide satisfactory potential explanations for these differences. The conclusions they draw are well supported by their data and none of their claims from their data are overstatements. Below are some, which I believe if addressed could improve the paper.
  
  I think the authors overstate the novelty of the concept that the urothelium is an active targetable determinant of infection and treatment outcomes. This is not an entirely new concept since previous studies have examined antimicrobial peptides and other factors from the urothelium.
  
  We thank the reviewer for this important point and agree that the urothelium has long been recognized as an active participant in host defense through mechanisms such as antimicrobial peptide production, pathogen sensing and regulation of inflammatory responses. We have therefore revised the manuscript to avoid implying that urothelial involvement in infection outcome is itself a novel concept. Instead, we now emphasize the specific advance of our study: the identification of lysosome-centered epithelial activation as a therapeutically targetable mechanism that enhances intracellular bacterial clearance and potentiates antibiotic efficacy.
  
  In the abstract we changed: "Our findings position the bladder epithelium from a passive barrier to an active, targetable determinant of treatment outcome and suggest host-directed modulation of epithelial antimicrobial pathways as a promising strategy to enhance intracellular bacterial clearance." to "Our findings demonstrate that bladder epithelial antimicrobial pathways can be pharmacologically reinforced to influence treatment outcomes by enhancing intracellular bacterial clearance." in line 30.
  
  In the introduction we changed: "Together with increased intracellular accumulation of antibiotics across different classes, this leads to improved intracellular killing and reduced bacterial regrowth across diverse UPEC strains." to "Together with increased intracellular accumulation of antibiotics across different classes, this leads to improved intracellular clearance and reduced bacterial regrowth across diverse UPEC strains." in line 91 and "Together, these findings reveal a previously unrecognized epithelial lysosome-centered mechanism by which OM-89 enhances intracellular antibiotic performance and repositions the bladder epithelium from a passive reservoir of infection reactivation to an actively transformable antimicrobial compartment influencing treatment outcomes." to "Together, these findings reveal a previously unrecognized epithelial-centered mechanism by which OM-89 enhances intracellular antibiotic performance and establishes lysosomal activation as a therapeutically targetable component of epithelial host defense against intracellular UPEC." in line 96.
  
  In the discussion we changed: "Together, these findings provide a mechanistic framework for the long-observed clinical efficacy of OM-89. Our findings reveal that the urothelium itself can be therapeutically targeted to reduce pathogen regrowth by transforming the epithelial barrier from a passive refuge for UPEC into an active defense site." to "Together, these findings provide a mechanistic framework for the long-observed clinical efficacy of OM-89 and identify epithelial lysosomal pathways as a therapeutically targetable component of host defense that can be used to improve intracellular bacterial clearance." in line 398 and "In the face of rising antimicrobial resistance (2024), strengthening epithelial antimicrobial function offers a complementary route to shift the bladder mucosa from a passive niche of bacterial survival and infection reactivation toward an active site of accelerated pathogen clearance." to "In the face of rising antimicrobial resistance (2024), our findings provide a mechanistic rationale for the clinical use of OM-89 and support epithelial lysosomal pathways as a promising target for host-directed therapeutic strategies that enhance intracellular bacterial clearance and improve the efficacy of existing antibiotics." in line 469.
  
  Depending on the target audience, the Module-Module association analysis could need more introduction. I am not a computational biologist and it was not obviously apparent how Figure 4A is generated and what it actually showing. How specifically does this analysis demonstrate a functional link between lysosomal activity and immune defense pathways? Without further explanation, it is my opinion that this figure panel is an unnecessary distraction that is not required for any of the conclusions that the group can already draw from the rest of their data.
  
  We thank the reviewer for this constructive critique. We agree that the rationale and interpretation of this analysis were not sufficiently explained in the original manuscript. We have therefore expanded the description of the MMAS approach and clarified how these data support the translational relevance of the lysosomal pathways identified in our experimental models. We also agree that for a broader biological audience, the computational framework and the strategic necessity of Figure 4A required a clearer introduction and stronger justification.
  
  To address the reviewer's concerns, we have thoroughly revised the text (lines 235-250) to clarify the methodology and emphasize the essential translational value this analysis adds to our study:
  
  How the figure is generated and what it shows: We have added explicit language clarifying that we used a computational Module-Module Association Score (MMAS) to evaluate the transcriptional correlation between the lysosomal gene network and functional biological pathways. Rather than relying on a single experimental dataset, this analysis compiles data across eight independent human bladder transcriptomic datasets encompassing over 1,400 clinical samples.
  
  Demonstrating the link to immune pathways: We have explicitly named the specific host defense modules highlighted in Figure 4A, namely "Response to molecule of bacterial origin", "cell activation involved in immune response", and "innate immune response" to guide the reader directly to the strong positive correlations shown in the panel.
  
  Justifying its inclusion (mouse-to-human translational bridge): While the rest of our data characterizes the cellular mechanics of OM-89 in murine organoids and cell culture, Figure 4A demonstrates that the link between lysosomal activity and bacterial defense is a conserved feature of bladder tissue biology across species. This cross-species alignment (our mouse-data at this stage of the manuscript compared to human-derived data) provides critical clinical justification for targeting epithelial lysosomal pathways as a therapeutic strategy in human patients. The new paragraph reads as follows: "To test whether the observed effects on lysosomal pathways could mechanistically, at least in parts, explain OM-89-mediated protection, we first used Genebridge analysis (Li et al., 2019) to examine how the lysosomal gene signature identified in our RNA-seq data relates to host defense programs in the human bladder. To evaluate the translational relevance of our experimental findings, we used a computational Module-Module Association Score (MMAS) analysis across eight independent human bladder transcriptomic datasets comprising over 1,400 clinical samples. This network-based approach evaluates the transcriptional correlation between the lysosomal gene network and functional biological pathways across diverse human cohorts. Module-Module association analysis performed on these human bladder datasets indicated that the lysosome module has strong positive associations with specific host defense modules, including "response to molecule of bacterial origin", "cell activation involved in immune response", and "innate immune response" (Figure 4A), highlighting a conserved functional link between lysosomal activity and immune defense pathways in the bladder epithelium. Altogether, these positive correlations suggest that lysosomal activation represents a conserved pathway integrated within mucosal immunity across species, rather than an isolated cellular response unique to our experimental models."
  
  __Significance:____ __
  
  General assessment: Solid experimental design with appropriate controls. Appropriate statistical rigor. Conclusions justified by the data. Limitations acknowledged. Differences in results between mice and humans acknowledged.
  
  Advance: Moderate technical advance building on prior organoid models. Significant mechanistic advance because OM-89 has been widely used for a long time without detailed understanding of why it works. Moderate conceptual advance that urothelial cells are a targetable determinant of treatment outcomes.
  
  Audience: I am a basic science researcher in the field of female urogenital tract microbiome and infections. Other researchers studying UTI will certainly be interested in this study. It also may be of interest to people studying other bladder conditions that involve the urothelium (bladder cancer).
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2025.10.22.683857
www.biorxiv.org www.biorxiv.org

The serine protease homolog Skanda modulates Toll-Phenoloxidase-mediated immunity in Drosophila

2
1. EMBOpress 26 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Point-by-point response to the reviewers (____blue____)
  
  Dear Editor,
  
  Thank you for taking care of our manuscript. We are pleased to see that the reviewers are positive about our manuscript. We have amended our manuscript to address nearly all the reviewer’s comments. See below our point by points answer Although we cannot fully establish the exact function of the serine protease homolog Skanda in the Drosophila immune response, our study that combines both biochemistry and genetic provides important insight on the Toll-PO cascade and its complexity
  
  With best regards,
  
  Bruno Lemaitre on the behalf of the authors
  
  __Review____er #1 (Evidence, reproducibility and clarity (Required)): __
  
  In the manuscript entitled "The serine protease homolog Skanda modulates Toll-phenoloxidase-mediated immunity in Drosophila," Vasanth et al characterize in detail a previously unstudied component of the insect immune response using first biochemical and then in vivo methods. Using proteins overexpressed and purified from insect cells, the authors provide evidence that Skanda could be a negative regulator of the SP cascade, impacting cleavage of proHayan and proPsh, and consequently Toll pathway and PPO1 activation. This work reaches further by transposing these findings into the D. melanogaster in vivo model. Here, however, the picture becomes more confusing as Skanda at native levels does not appear to regulate either the Toll pathway or the melanization cascade. Only one strong phenotype was identified in that decreased expression of Skanda increased susceptibility to S. aureus infection while increased expression decreased susceptibility. The mechanism for this remains unclear. To their credit, the authors carry out an in-depth analysis to rule out all the obvious possibilities. In the discussion, the authors explore the basis of discrepancies between their biochemical and genetic findings. We would suggest that an additional one to consider is differing roles or behaviors of Skanda in the microenvironments of the local site of injury (where S. aureus may be contained when it is tolerated) and the hemolymph. In summary, this is a valuable analysis of the innate immune component Skanda whose role has become somewhat clearer through these studies, but still remains obscure.
  
  We thank the reviewer for this general assessment of our article. We agree with his idea that discrepancies between the biochemical and genetic findings arise from differing roles or behaviors of Skanda in the microenvironments of the local site of injury and the hemolymph’. We added the following sentence in the discussion: ‘The presence of Skanda in the hemolymph (Rommelaere et al. 2025) suggests a role in the systemic immune response; however, we cannot exclude that it may be particularly important within the local microenvironments at sites of injury’.
  
  __Major Comments __ - To assess bimodal distribution of bacterial ds within single flies in Fig 6E, authors should either: increase the sample size to allow for proper statistical assessment of different distributions among genotypes, specifically between w1118 and skanda_d107; or, provide a modelling framework for statistical testing. Otherwise, the present results seem insufficient to conclude that Skanda is playing a role in resistance to S. aureus. We agree with the reviewer that our bacterial count was not enough developed. In the revised version we add a new Figure 6E with two time points 13h and 16h that were chosen before flies start to die from S. aureus. We observe at 13h a significantly higher bacterial count in the Skanda mutants but not at the 16 hours although there is higher proportion of wild-type flies that have clear the bacteria. These observations suggest a role of Skanda to resist, but also tolerate S. aureus. The fast killing induced by systemic injury with a low dose S. aureus made difficult to find a condition that would allow to see a clear load difference. So we have amended our text to highlight that Skanda could also play a role in tolerance.
  
  Another way to assess a role for tolerance in the Skanda mutant would be to measure BLUDs (https://doi.org/10.7554/eLife.28298 ) and/or transcription of CrebA.
  
  We agree with the reviewer but measuring the BLUD with S. aureus is rather challenging as flies die quickly to this bacterium. As mentioned above and following revised figure 6E, we discuss in the revised version that Skanda could be involved in both resistance and tolerance.
  
  The error bars on qRT-PCR datasets are large, the data points are not shown so we do not know how many replicates were included in the graphs (Fig 5 B and C, Fig 6C, Fig 7 A and B, and Fig 8B). Bar plots are not the most faithful reproduction of biological datasets, as they can hinder significant information regarding datapoints distribution and variation (Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm | PLOS Biology). We advise that, particularly in the case of datasets such as qRT-PCR, the final values of fold change are represented with individual dots, with the mean value clearly represented, whether with or without the additional bar graph. Furthermore, no statistical tests were applied to determine significance. Data points should be shown and appropriate statistical tests should be applied. The number of biological replicates should be included in the analysis and the statistical test applied should be noted in the figure legends.
  
  We have changed the figures related to qRT-PCR to show the individual points and we have added statistics in the revised version.
  
  Although there are claims of Skanda conferring resistance to S. aureus infection, only Drs levels are tested. These conclusions could be strengthened by assessing expression levels of additional AMPs.
  
  In the revised manuscript, we report the expression of BomS1 in wild-type, skanda, and spz mutants following S. aureus infection. As previously observed for Drosomycin, Skanda does not markedly affect BomS1 expression (new Supplementary Figure S3E).
  
  __Minor Comments __ - Parag. 1: (data not shown) should be removed and if possible AlphaFold prediction of skanda conformation added. Alternatively, remove sentence.
  
  We have removed (data not shown) and indicated that the information derived from Alphafold.
  
  Parg. 3: 1000 mL? why not 1L?
  
  Corrected.
  
  Parag. 5: , in last sentence that should be .
  
  Corrected.
  
  Parag. 6: "a role at the same position..." does not convey the correct messageWe have improved the sentence for ‘Our results indicate that Grass processes Skanda in the Toll–PO SP cascade, consistent with Skanda acting at the same level of the proteolytic cascade as Hayan and Psh’.
  
  Figure axes (5D, 5E, 6D, etc...) of melanization assays are wrongly named "% melanisation", with "s"
  
  We have corrected for “Melanization”.
  
  Parag. 21: compound mutants (if correctly interpreted as dataset presented in Fig. 8B) were tested at 6h, 24h and 48h, and not 32h, as written in the text
  
  Indeed, in figure 8B, we monitored expression at 6, 24 and 32h and not 48h. This has been corrected.
  
  Results section "skanda is not mandatory for the activation of the Toll pathway" adopts a literal translation which would probably be better phrased as "is not essential"
  
  We have corrected accordingly.
  
  Discussion parag. 2: "Skanda exhibits..."
  
  Discussion last parag: "..., but also underlies..."
  
  It has been evidenced that
  
  This has been corrected.
  
  Additional comments: - The sentence on page 2 beginning with "Upon binding, these PRRs..." is very long and difficult to follow. This should be rewritten.
  
  We have split this sentence in two shorter ones for clarity.
  
  In many places in the manuscript bacterial "dose" is used in place of bacterial burden. The dose is the amount of a substance or bacterium given to the animal.
  
  We have changed ‘bacterial dose’ for ‘bacterial burden’ when relevant, and we have kept the term “dose” when we mentioned the OD used to infect flies.
  
  Page 11: Skanda is described as a placeholder when I think a (competitive) inhibitor would be more appropriate.
  
  We agree that Skanda functionally resembles a competitive inhibitor, but several key differences set it apart from classical small-molecule inhibitors. First, Skanda is comparable in size and structure to Persephone and Hayan, natural substrates of Grass. Second, Skanda-like SPHs, which have close SP paralogs (e.g., Psh), are common in insects (Cao and Jiang, 2019), indicating that they may constitute a distinct class of negative regulators that warrants its own terminology. Moreover, because amplification in protease cascades typically occurs at the terminal step. Negative regulation by Skanda in an intermediate step could be more stochiometric than the freely reversible inhibition expected for a typical competitive inhibitor. As Skanda’s mechanism remains unclear. the neutral term “placeholder” seems more appropriate than “competitive inhibitor”.
  
  **Referee cross-commenting**
  
  I agree with the comments of the other reviewers.
  
  Reviewer #1 (Significance (Required)):
  
  Strengths: The authors take a multi-disciplinary biochemical and in vivo approach to understand the molecular interactions among SPs and SPHs and thereby uncover the role of the protein Skanda that might otherwise not have been appreciated. They have made extensive use of novel transgenic fly lines, generated in the context of this study, and have thoroughly tested their specificity and cis-acting potential. These will provide a resource to the field. In addition to the new description of Skanda, these findings strengthen previous knowledge regarding systemic infections with different bacteria (M. luteus, S. aureus) and reproduce the known redundancies of Psh and Hayan modes of action. Moreover, this research is relevant for the expansion of basic knowledge on innate immunity, particularly in the field of insect-pathogen interactions, making use of S. frugiperda cell lines and D. melanogaster adults and larvae. Although not at the focus of this work, the evolutionary conserved nature of these aspects of innate immunity across these two distant species enhance the importance of these findings.
  
  Weaknesses: Some assays do not include enough biological replicates and others do not have enough information on how many biological replicates were performed. Therefore, the conclusions drawn are difficult to assess. Lack of statistical analysis on the qPCR experiments complicates the interpretation of results.
  
  We thank the reviewer for his assessment. We have added the number of replicates in the revised version and make visible the variability of our data.
  
  __Review____er #2 (Evidence, reproducibility and clarity (Required)): __
  
  Summary In this work the authors identify the SPH skanda as an important player in Drosophila resistance to S. aureus infections independent of Toll and classical melanization. The authors conducted rigorous in vitro assays using recombinant proteins of various SPs in the Drosophila Toll-PO cascade to show that skanda negatively regulates activation cleavage of SPs at the level of and downstream of Psh and hayan, two key SPs that converge on Toll pathway activation with the latter playing a central role in cuticular melanization. In parallel, genetic analysis using mutant flies showed that skanda does not negatively regulate Toll pathway nor melanization. Only skanda over expression in vivo led to a reduction in S. aureus melanization which, in my opinion, is most likely due to the artificial increase in the in vivo concentration of the protein rather than an indication of a potential true function. Altogether this an interesting work as it shows the discrepancies between the biochemical and genetic approaches when it comes to dissecting the insect SP cascades regulating melanization and Toll as highlighted by the authors themselves in the discussion section. All experimental work is well controlled, methodology is robust and results are adequately discussed. I have some comments concerning few experiments and interpretations that in my opinion warrant further discussion.
  
  We thank the reviewer for the analysis and agree that the result showing than Skanda negatively regulates melanization could be due to over-expression.
  
  __Major comments: __ 1- It seems that SP48 and Grass can redundantly cleave Skanda although the later cleaves more strongly. (Fig 3B) Can other downstream SPs cleave skanda? Can ModSp alone cleave skanda? (ModSP + skanda lane was absent for Fig 3B). It is important to test these possibilities as the in vitro system may be quite relaxed as to the specificity of these cleavage events and may not reflect what happens in vivo. In fact it has been shown in Anopheles gambiae that SPH can be redundantly cleaved by multiple SP in the protease cascade. Although these are cascades with certain hierarchy, information can still flow in more than one direction along the different branches of these cascades.
  
  We tested whether ModSP could cleave pro-Skanda and found that it did not (data not shown). This result is consistent with our expectations, as ModSP has a chymoelastase-like specificity and preferentially cleavage after Leu. In contrast, Skanda is cleaved by Grass and cSP48, both of which are trypsin-like proteases.
  
  At present, there is no straightforward way to assess whether downstream SPs activate pro-Skanda. Obtaining an active downstream SP would require sequential activation of all its upstream enzymes, and it is nearly impossible to completely remove these activating proteases afterward. As a result, it is difficult to distinguish the activity of a downstream SP from that of cSP48 and Grass. We are currently developing a new approach to overcome this limitation.
  
  2- In Fig 4B and 4C the bands of active forms should be quantified from at least 3 immunoblots for robust results especially in Fig 4C where the differences are minimal.
  
  As suggested by the reviewer, we quantified the band intensities from four independent blots and presented the data in Fig. 4B and 4C (lower panels).
  
  3- It is not clear to me why skanda should have a specific role in resisting S. aureus infections despite that S. aureus is not a natural pathogen of Drosophila? Has other Gram-positive and Gram-negative bacteria been tested?
  
  It is true that S. aureus is unlikely to be a natural pathogen of Drosophila. However, this bacterium has been used in several studies (notably Dudzic 2019) to uncover a specific activity associated with melanization modules that is distinct from cuticular blackening. For this reason, we believe that S. aureus provides a sensitive assay to monitor this particular immune mechanism. We further hypothesize that other bacteria related to S. aureus—possibly members of the Staphylococcus family—may infect Drosophila and could be controlled by Skanda. We chose not to elaborate on this point to avoid overextending the scope of the article.
  
  4- In Fig 6E more points should be collected for statistical power. It is also better to show these data that are not normally distributed in violin charts or boxes and whiskers which give a better indication as to which quartile the bulk of the data belongs.
  
  We have addressed this point (see answer to Reviewer 1).
  
  Minor comments: 5- In Figures 3 and 4, It would be easier to follow the cleavage events if a schematic drawing is provided showing the sequence of activation cleavage events of the tested SPs
  
  Because the order of the two cleavage events is unclear, we felt it was simpler to include the putative cleavage sites in Fig. 2B and refer interested readers to Fig. S1, Table S1, and Fig. 3 legend.
  
  6- The fact that PPO1/PPO2 depleted flies exhibit increased Drs expression could be due to increased bacterial proliferation in this mutant background that trigger increased Toll stimulation, rather than a negative feedback mechanism. This increased proliferation is shown in Fig 6E.
  
  This is a good point. The higher expression of Drs in PO1/PPO2 depleted flies could be associated to higher bacterial load in the mutant, or to negative feedback of the melanization reaction. This higher Toll pathway activation has been further characterized in Liu et al., (Plos pathogen 2025) where it was suggested that it relate to a negative feedback loop between the Toll and the melanization cascade.
  
  7- In Fig 6E more points should be collected for statistical power. It is also better to show these data that are not normally distributed in violin charts or boxes and whiskers which give a better indication as to which quartile the bulk of the data belongs.
  
  We have addressed this point. See answer to reviewer 1 for discussion.
  
  8- A phenotype for skanda in melanization was observed only in over-expression assays which may artificially alter molecular interactions in the cascade.
  
  We agree with this statement and we have added a comment in the discussion of the revised manuscript about the potential artifactual results due to over-expression.
  
  9- Page 10 last paragraph "peak expression at 32 hrs or 48 hrs as shown on the figure?"
  
  This is 32h and has been corrected.
  
  10- The differences in Drs expression levels in Hayan-pshDef and psh-skandaDef double mutant flies infected with M. luteus and S. aureus is surprising. I wonder whether the observed differences are due to biochemical differences in the microbial surfaces to which these cascades are recruited.
  
  Drs expression is markedly higher following systemic infection with M. luteus than with S. aureus, consistent with the different bacterial doses used. We deliberately employed a low dose of S. aureus because this condition reveals a pronounced susceptibility in skanda flies. Consequently, direct comparison between these two infection regimes remains challenging.
  
  11- There are several typos in the manuscript
  
  We have carefully re-read the manuscript and corrected several typos.
  
  Reviewer #2 (Significance (Required)):
  
  The main strength of this work is that it combines biochemistry and genetics in a strong genetic model to characterize the biochemical interactions between SPH and Sp in clip cascades and relate the relevant interactions observed in vitro with potential in vivo functions. This is the first time that such a rigorous combined approach was adopted to the study of these cascades. The results obtained also show the advantages and limitations of each approach. As such i believe this study will be of interest to a broad audience in the field of insect immunity.
  
  __Review____er #3 (Evidence, reproducibility and clarity (Required)): __
  
  __Summary: __
  
  Serine protease cascades are central for activation of immune responses in insects. In Drosophila melanogaster, Toll signaling pathway has been quite extensively studied, and several serine proteases, serpins and serine protease homologs (SPH) with functions in Toll activation have been identified. In this work, the authors characterize a new component of this system, a SPH which they name Skanda. Skanda seems to have multiple roles/points of action, on one hand participating in the regulation of Toll together with the established serine protease in the Toll activation, Psh, and on the other hand controlling the response to a systemic S. aureus infection, via not yet fully specified mechanism.
  
  __Major comments: __
  
  Key conclusions made in this work are convincing, and backed up by the data presented. The data and methods are presented in a way that allows reproduction of the experiment. The number of individuals used especially in the infection experiment (20 male flies per a replicate) is on the lower side, but the experiments are adequately replicated and the effects seen are clear.
  
  While this work contributes to our understanding of the regulatory mechanisms governing Toll signaling, at times the authors' reasoning is difficult to follow. I recognize that this is a complex topic, with multiple upstream branches activating Toll signaling, and the authors do consider various mechanisms that could explain their findings. However, the manuscript would benefit from additional clarification, perhaps through a schematic model illustrating the proposed effects of Skanda, to help readers position Skanda within the broader context of Toll signaling. We have done our best to explain the Toll serine protease and added a figure at the beginning of the manuscript. Since we cannot position Skanda in the Toll-Po cascade yet, we prefer to avoid drawing a model. We believe that this study highlights our ignorance of the complexity of serine protease cascades acting upstream of Spätzle and Melanization.
  
  Statistical analyses for the Drs expression experiments are lacking.
  
  The statistical analysis for Drs expression has been added in the revised version.
  
  __Minor comments: __
  
  The authors could explain what type of cells the sf9 cells are and why they decided to use them.
  
  Sf9 cells are an insect ovarian cell line derived from Spodoptera frugiperda and are widely used for baculovirus-mediated expression of eukaryotic proteins. They support proper protein folding, disulfide bond formation, and post-translational processing. This information is now mentioned in the Result section in addition to methods.
  
  Band intensities could be measured and plotted for the immunoblots. The immunoblot methods should be fully described in the Materials and methods section.
  
  Thanks for the suggestion. We have done this accordingly and included the results in Fig. 4B and Fig. 4C (lower panels). Brief descriptions of densitometric analyses have been added to the figure legends.
  
  Protein levels of Skanda in the Skanda mutant could be shown as the mRNA levels remain relatively high (Sup. Fig 3B). If this is not possible, could the authors comment on the remaining expression of Skanda in the Skanda mutants?
  
  We have added a comment on this point: The skanda mutation is a frameshift mutation that affects the coding sequence. There are still transcripts although not functional. The decreased expression of Skanda in SkandaD107 is probably due to non-sense-mediated RNA decay caused by the frameshift.
  
  Under the heading "Loss of skanda does not further enhance the cuticular melanization defects caused by the loss of Hayan or psh" the text should refer to figure 5D not 5B.
  
  We have corrected this mistake in the revised version.
  
  Figure 6C shows that Drs expression is higher in the Skanda mutant than in controls at 32 h post S. aureus infection (although this has not been statistically tested). The authors don't mention this result in the manuscript, but to me it fits with the idea of Skanda acting as a negative regulator (the effect of which is accumulating and seen only late after infection). Could the authors comment on this? We do not think that the higher expression of Drs in Skanda mutant upon S. aureus systemic infection is due a negative regulation the Toll pathway but rather to higher S. aureus burden. We conclude this because Drs is not higher than the wild-type upon injection of M. luteus and proteases. At this stage, we cannot exclude that there are differences between M. luteus and S. aureus.
  
  Under the heading "Psh and skanda redundantly regulate Toll signaling", the comparison should likely be between Figures 7A-7B and 5B-C (rather than 5A). When examining the effects of single versus double mutants on Drs expression, the Psh-Skanda double mutant clearly reduces Drs more than the Psh single mutant. However, in the context of microbial proteases, the pattern appears different: there is virtually no difference at 6 hours, while at 48 hours there may be a slight decrease in Drs expression in the double mutant compared to the Psh single mutant, although this difference would likely not reach statistical significance if tested. I don't know what this could mean, but I'd like to hear the authors' take on this. The reviewer is correct and we have revised our manuscript to mention the appropriate figure. Figures 7A-7B and 5B-C.
  
  The reviewer raised a good point; we believe that the additional effect of Skanda in absence of Psh is less marked upon microbial proteases because Psh already has a strong effect by itself in sensing proteases. In contrast there is higher redundancy between Psh and Hayan upon M. luteus and consequently the double mutant psh, Skanda have a stronger effect.
  
  __**Referee cross-commenting** __
  
  I also agree with the comments and points raised by the other reviewers.
  
  __Review____er #3 (Significance (Required)): __
  
  Research on the Drosophila immune response has significantly advanced our understanding of (innate) immune responses, both generally and in an evolutionary context. Despite over three decades of study, this work demonstrates that there are aspects of Toll signaling that remain unresolved. The authors identify a novel regulator of the Toll pathway and begin to elucidate its functions. Equally important, their findings underscore the complexity and context-dependency of the regulatory events that shape immune responses.
  
  We fully agree with the assessment of the reviewer. Our study highlights the complexity (and our ignorance) of this important facet of Drosophila immunity, as mentioned in the last sentence of the discussion.
  
  My fields of expertise are Drosophila melanogaster, innate immunity, cell-mediated immunity.
  
  __Review____er #4 (Evidence, reproducibility and clarity (Required)): __
  
  __Summary __
  
  In this study, the authors investigate the function of Skanda, a serine protease homolog (SPH) in Drosophila innate immunity using both biochemical and genetical approaches. The reason to focus on this SPH is that it lies at the same locus as two key proteases of Drosophila immune defenses, Hayan and Persephone, all of which are induced by an immune challenge. After having modeled this SPH and shown that the three amino-acid of the serine protease catalytic triad are either mutated or poorly oriented, they report that Skanda may limit the cleavage of proteases downstream of Grass, a key event for their biochemical activation. The study of an isogenized, putatively null, mutant line failed to reveal any impact of skanda on Toll pathway activation nor on melanization, albeit a strong but not moderate overexpression somewhat inhibits the formation of a melanization scab only after "clean" but not septic injury. These results are not in keeping with the biochemical analysis: the mutant would have been expected to display an enhanced immune response. Unexpectedly, skanda mutants are as highly susceptible to a low amount of Staphylococcus aureus injection as flies deleted for the adult-expressed phenoloxidases PPO1 and PPO2, melanization playing a key role in host defense in this infection paradigm. No strong impact on the bacterial load was detected at the sole investigated time point, 24h. Because the analysis of the single skanda mutant did not unambiguously reveal its role in host defense, the authors then studied double or triple mutants of the three protease genes and found a redundant role for Skanda with Persephone for Toll pathway activation after a challenge with a nonpathogenic Gram-positive bacterium or a bacterial protease. In the case of S. aureus infection, a strong induction of the Drosomycin gene, is observed at 48h of infection in the compound mutants, which was not observed with the nonpathogenic challenges. Evidence, reproducibility and clarity
  
  __Major comments __
  
  The authors state that "These results are consistent with a role of Skanda in resistance to S. aureus". This conclusion rests on a very fragile experiment that measured the bacterial burden 24h after challenge with a low dose of S. aureus: whereas wild-type control flies exhibit a dual low and high distribution of bacterial loads, skanda flies exhibit only the higher values. However, the bacterial load in skanda appears to be as high in persephone mutant flies that are much less sensitive to S. aureus than skanda flies. This makes it highly unlikely that the high susceptibility of skanda to S. aureus is due solely to resistance. The problem is compounded by the poor description of the experiment: it is not stated anywhere how many times the experiment has been performed, whether pooled data are shown, what each data point represents, pooled or single flies. A fine-grained time course with more biological samples would definitely be needed to convince the reader of a (limited) role in resistance. The authors do not consider the alternative, but not exclusive, possibility that skanda plays also a role in disease tolerance. The determination of the bacterial load upon death of single flies may provide some clues about this alternative function (Duneau et al., eLife, 2017). Another approach might be to determine whether the bacterial supernatant is toxic and whether skanda might protect from this toxicity. As Bomanins play a role in the host defense against S. aureus (this study, but see also Hanson et al., eLife 2019 in which the 55C deficiency susceptibility phenotype was stronger) and given the role of Bomanins in host defense against Gram-positive bacteria or fungal infections both in resistance and disease tolerance (e.g., Clemmons et al. PLoS Pathogens 2015, Lindsay et al., J. Innate Immun, 2018, Xu et al., EMBO Reports 2023, Lou et al., BioRxiv, 2025) and that BomS1 has an optimal Dorsal-related Immune Factor Binding site (Busse et al. EMBO J. , 2007), it may be useful to monitor the expression of several Bom genes in complement to that of the expression of Drosomycin, especially after S. aureus challenge. Furthermore, BomT1 is the only peptide that appears to play a role in resistance against Gram-positive bacteria, namely against E. faecalis. This series of qPCR experiments is rapid to make, provided the authors have kept the cDNAs of their samples.
  
  To address the reviewer’s comment, we extended the bacterial load analysis of S. aureus in skanda mutants (new figure 6E). Our results support a role for Skanda in both resistance and disease tolerance. This point is now briefly discussed in the Results section, and we have added references highlighting a role of the Toll pathway in disease tolerance. We did not elaborate further, as accurately monitoring S. aureus burden following low-dose infection remains technically challenging given the high pathogenicity of this bacterium.
  
  In the Discussion, the authors speculate "that Skanda acts at the level of Persephone-Hayan to allow Hayan to activate the Toll pathway. Skanda would skew the activity of the Persephone-Hayan platform to induce Toll signaling and resistance to S. aureus rather than cuticular melanization". This model does not fit with the fact that SPE is only moderately susceptible to S. aureus (Dudzic et al., 2019) and that spätzle mutant flies are either not sensitive at all (Dudzic et al., 2019) or moderately sensitive to it (Hanson et al., eLife, 2019) (see also below). Whether it may apply to host defense against other pathogens remains to be determined. To better understand the function of skanda, considering only S. aureus may be limiting as this bacterium is fundamentally not susceptible to the canonical Toll intracellular signaling cascade (e.g., Bischoff et al, Nat Immunol, 2004, Dudzic et al, Cell Reports, 2019) and to the final part of the Toll-activation proteolytic cascade as discussed above with SPE and Spätzle. The authors appear to have chosen not to display the results they have gained with Enterococcus faecalis (but forgot to remove their mention at two places in the Material and Methods): it would definitely be interesting to know what the outcome of these experiments was and also to investigate the susceptibility and microbial burden of skanda mutants to representative yeast and filamentous fungal pathogens, Aspergillus fumigatus being of special interest since its proliferation is limited through melanization whereas the Toll pathway protects against secreted virulence factors (Xu et al., EMBO Reports, 2023). This series of experiments would likely take some three months and might give additional insights into Skanda function(s).
  
  We agree with the reviewer that examining the role of Skanda in response to additional bacterial species could further help elucidate its function. However, the most robust phenotype we identified is a strong acute susceptibility to S. aureus, which is dependent on the Psh–Hayan–Skanda axis but independent of the SPE–Spätzle pathway. Because the bacterial strains suggested by the reviewers are primarily controlled by the SPE–Spätzle–Toll pathway, we did not pursue this direction further. However, in the revised version we have added survival analysis with Skanda to Candida albicans and Enterococcus faecalis (new supplement Figure 3F and G). Notably, we also observed an intermediate susceptibility to both Candida albicans and E. faecalis (see below). This indicates that Skanda is not a classical regulator of the Toll-PO cascade such as Grass, ModSP, SPE or Hayan/SPE.
  
  In general, figure legends are not highly informative and fail to provide key information such as the number of independent experiments, whether the data are representative or pooled, which statistical test was used, e.g., qPCR experiments (the descriptions are available for the analysis of survival and melanization experiments at the end of the Mat. and Meth section). As noted above, critical information is lacking to understand microbial load graphs. It is also difficult to check statements such as: ", while psh[sk1] flies showed a reduced Toll pathway reponse". Indeed, no statistical analysis has been performed to analyze any RTqPCR data. Given the low number of experimental data points, each data point ought to be displayed and not bar graphs, for which in addition the error bars are not defined. The Material and Methods section is incomplete. It does not include a description of all the in vitro synthesized proteins used in this study nor indicate the different tags. The primary and secondary antibodies used for Western blot analysis are not reported, e.g., those that detect cleaved spätzle. This would need to be included in the Table at the beginning of this section.
  
  In the revised version, we have addressed these points by adding statistical tests to the RT–qPCR analyses, displaying all data points, and improving the microbial load measurement. As discussed in the Material and Methods section, Table S2 provides information for all in vitro synthesized proteins used in this study, including affinity tags and the primary and secondary antibodies. On a more personal note, we first identified the striking susceptibility of Skanda/CG15046 flies more than 10 years ago, and the skanda project subsequently experienced a long period of discontinuation before we decided to reassemble and consolidate the most important findings. Unfortunately, this study did not result in a straightforward narrative with a “happy ending.” Nevertheless, we still consider this work an important step toward a better characterization of this aspect of fly immunity.
  
  __Minor points __ Introduction: 1. The authors may want to cite Stein, Cho&Stevens, FLY, 2013 when referring to the proteolytic cascade regulating the establishment of dorso-ventral patterning.
  
  This reference has been added
  
  The statement "The Toll-PO SP cascade can be DIRECTLY activated at the level of Psh-Hayan, through direct cleavage of the Psh protease bait region by microbial proteases" may be slightly misleading as only subtilisin is able to do this, the other tested proteases producing an inactive cleaved Psh that needed to be secondarily activated by a couple of specific cathepsins (Issa et al., Molecular Cell, 2018).
  
  Good point. This point has been corrected with the Issa reference added.
  
  Results 3. The reasoning of the second paragraph is difficult to follow as the reader does not understand how the cleavage sites can be computed. It would be important to state that the recombinant proteins are tagged. It would actually be very helpful to provide a scheme of the various recombinant proteins used in the study as had been done in the Shan et al., Science Advances article.
  
  We followed the reviewer’s good suggestions, modified the text accordingly, and added Table S2.
  
  With respect to Western blots, many of the bands are faint, e.g., SPE after the addition of Skanda cannot be detected on a printed version of the figure. It is also difficult to determine whether the reduction in band amount is reproducible as no indications are given in this respect. It is important that the images be quantified in several independent blots so that the observed reduction can be statistically assessed. With respect to PPO1 cleavage, it would be important to also check its cleavage in vivo, which would yield higher confidence on the relevance of in vitro study to the in vivo situation.
  
  In response to the reviewer’s suggestions, we repeated SDS-PAGE and immunoblot analysis, quantified band intensities, and performed statistical analyses for the samples shown in Fig. 3B and 3C (lower panels). The total number of blots for each representative is 3 to 4. For practical reasons, we are unable to assess PPO1 cleavage in vivo.
  
  First sentence of the paragraph "skanda mutants are highly susceptible": the authors might also want to cite Hanson et al, eLife 2019.
  
  We have added the Hanson reference and Ryckebusch et al 2025, which is more appropriate.
  
  In Dudzic et al., Cell Reports, 2019, the authors did not observe any susceptibility to S. aureus with Hayan[sk3] whereas here they find an intermediate sensitivity phenotype with Hayan[sk6]. Was the former not a null allele of Hayan? With respect to the 55C Bomanin deficiency, Hanson et al., 2019 had reported a stronger phenotype than that shown in Fig. 8A, with some 75% of flies dead within three days. Which study should we trust or does this reflect variations between experiments (hence the question about the representation of survival data: are these pooled data from thre independent experiments; how much variation was there between independent experiments?).
  
  Both Hayan mutant flies were null. We observed differences along the years with different experimenters; although the main results stand. We also tend to observe a stronger impact of psh than initially reported in response to M. luteus (Figure 5B), although this is consistent with its role in the PRR-Grass-SPE pathway. Considering all the parameters that influence survival experiments (temperature, humidity, time to form the bacterial pellet and sometimes bacterial strains) and possible cryptic infections (Nora infection), we consider these variations as expectable.
  
  It would be interesting to measure the S. aureus bacterial load upon skanda overexpression to confirm a putative role in resistance.
  
  This is an interesting suggestion but we did not do it because of the technical challenge that monitoring S. aureus burden represents. We have preferred to focus our attention on monitoring S. aureus in Skanda loss-of-function mutants.
  
  UAS-skanda: besides Fig. 6B, the authors should also refer the reader to Fig. S4A.
  
  The link to Fig S4A has been added.
  
  Genetic dissection of the skanda-psh-hayan gene cluster: the last sentence of the paragraph does not reflect what Fig. S7B is showing: one of the double mutants and the triple mutant displayed a significant intermediate susceptibility to S. aureus.
  
  This is in fact Ecc15 that we discussed. The reviewer is correct as the triple mutants and hayan,psh double have increased susceptibility to Ecc15.
  
  Paragraphs Compound mutants are EXTREMELY susceptible to S. aureus. The wording is likely too ...extreme: they do not seem to die much faster than skanda simple mutants, which were HIGHLY susceptible to S. aureus, like PPO1-PPO2 double mutants.
  
  The reviewer is correct and we have avoided to use the term ‘extremely’ in the revised version (replaced by ‘highly’ or removed).
  
  Last paragraph: psh mutants should be compared side-by-side with psh-skanda double mutants in the same RTqPCR experiment: it is difficult to judge whether the statement of equivalent Drosomycin expression after S. aureus challenge is true given the low resolution of the figures (Fig. 6C vs. Fig. 7B). Last sentence: it would be more appropriate to mention "host defense" rather than "resistance" since the authors did not check the bacterial burdens of the compound mutants.
  
  Experiments were done simultaneously on single and double/triple mutant but this represents kinetic with 4 times in 10 different backgrounds! We have preferred to separate the data to simplify the reading. We believe that the reader can compare the data despite display in two different panels. We have changed in all the manuscript host defense instead of resistance as following bacterial counting, we suspect that Skanda may play both in resistance and disease tolerance.
  
  Fig. 1: the scheme is not up to date and oversimplified. It should take into account the complexity revealed in the Shan et al. Science Advances article.
  
  We disagree on this point. This schema reflect inference done by genetics. An up-to-date figure is shown in Westlake, Hanson Lemaitre Handbook but would require a broad introduction. In the revised version, we have highlighted that this is simplified model based on genetics.
  
  Fig. S1: numbering the amino-acids in the sequence would help follow the text from Document S1. What are the residues written in light blue? It may be worth highlighting residue E194. Of note, there is a difference between the sequence for peptide 4 as found in the sequence displayed on Fig. S1: KTDRD YV and the sequence of peptide 4 in Table S1: KTDRE YV; the presence of a potential SNP should be indicated, even though it is not making a major change in terms of charge of the peptide.
  
  We included an asterisk at every tenth position and a numerical indicator near the end of each line to facilitate counting. Residues highlighted in cyan may represent cleavage sites of cSP48, Grass, or a trypsin-like protease released by Sf9 cells. The peptide (E194… R212) appears to undergo cleavage to generate P204LNLPLQP__R212__, which is detected in the secondary MS. The reviewer is correct on peptide 4 that we attribute to a potential SNP. This is now indicated in the legend of Figure S1.
  
  Document S1: trypsin digestion (just before second call to Fig. S1); should it not be purified proteases instead? The text should be somewhat reworded as it is currently slightly misleading.
  
  "In lane 8, peptide-1 through -19 were nearly undetectable". Table S1 shows that even though peptides 1, 2, 6, , 7 , and 11 are not expressed to strong enough a level to be displayed Fig. S1 lane 8 given the chosen scale, peptides 1, 2, 6, and 7 are expressed in the same range for slices 8B and 8C, whereas peptide 1 is found with just a two-fold difference in slices 8A and 8C.
  
  Points taken. To better illustrate the differences in band intensities in the top right panel of Fig. S1, we kept the same scale for bands A and B in line 8 (as well as for bands A-C in the top left and middle panels) and used the second y-axis for band C.
  
  Fig. S2: the effect of skanda on SP7 cleavage is not detectable when Hayan isoforms are co-incubated. The main text should be modified to take this into account. How do the authors explain that pro-MP1 levels are not different upon co-incubation with Psh or Hayan-PB with or without adding Skanda, even though the active MP1 form is detected only in the absence of Skanda? In contrast, the pro-MP1 band can be detected upon co-incubation with Skanda and Hayan-PA.
  
  Thanks for the comments. We repeated the experiments and obtained four independent blots for each. After scanning, integrated band densities for all paired bands (i.e., with and with Skanda) were quantified using ImageJ (Fig. S2 and data not shown). In the representative blots, Skanda had little effect on SP7 activation by Hayan-PA (507/527; 96%) or Hayan-PB (15,763/15,828; ~100%), in contrast to Psh (937/7,917; 12%). However, when ratios from all blots were considered, the mean reductions were 56 ± 14% for Psh, 49 ± 19% for Hayan-PA, and 65 ± 18% for Hayan-PB. For MP1, comparison of precursor bands is less reliable because small decreases in precursor intensity are difficult to quantify; therefore, we focused on the MP1 product. MP1 levels were reduced to 58 ± 8% (Psh), 44 ± 3% (Hayan-PA), and 90 ± 30% (Hayan-PB). SPE intensity was reduced to 38 ± 12% (Psh), 43 ± 5% (Hayan-PA), and 23 ± 4% (Hayan-PB). Ser7 intensity was reduced to 9 ± 4% (Psh), 35 ± 1% (Hayan-PA), and 27 ± 13% (Hayan-PB). In general, Skanda suppressed the activation of SP7, SPE, MP1, and Ser7 by Psh, Hayan-PA, or Hayan-PB. We included the information in Fig. S2 legend.
  
  Fig. S3B, S7A: the three genes of the locus are inducible upon immune challenge. Have any NF-kappaB binding sites been detected at the locus. It might be relevant to repeat the experiment shown in S3B and especially S7A after a challenge with M. luteus. These experiments are definitely not essential.
  
  We did not look to the presence of NF-kB sites in their promoters but they have been shown to be induced and regulated by the Toll pathway (De Gregorio 2002). We did not extend our manuscript in this direction.
  
  The mention 'Data not shown" is used twice. Not allReview Commons-affiliated journals accept it.
  
  These mentions have been removed.
  
  Reviewer #4 (Significance (Required)): A strength of this work is the dual biochemical and genetic characterization of a SPH, an endeavor that is important to understand further the function of this class of protease-like family of secreted proteins that have been so far imperfectly studied from both perspectives (Kambris et al., CB, 2006, but see Westlake Reproducibility study on BioRxiv, Jin et al. Frontiers Immunol. 2023). Unfortunately, the two approaches fail to provide an integrated view of Skanda's function(s). A weakness is that this study does not unambiguously reveal at this stage what are the functions of Skanda in the host defense against S. aureus, let alone against other pathogens controlled to some extent by the Toll pathway or melanization. The authors have not considered a possible role in disease tolerance to S. aureus. These limitations decrease the conceptual advance of this article.
  
  In the revised version, we have considered a role of Skanda in resilience. This article will be of interest to investigators working on the innate immunity of insects. This reviewer is an expert in the Drosophila innate immunity field.
  
  PeerReviewed
2. EMBOpress 26 Jun 2026
  
  in Review Commons
  
  Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Referee #1
  
  Evidence, reproducibility and clarity
  
  In the manuscript entitled "The serine protease homolog Skanda modulates Toll-phenoloxidase-mediated immunity in Drosophila," Vasanth et al characterize in detail a previously unstudied component of the insect immune response using first biochemical and then in vivo methods. Using proteins overexpressed and purified from insect cells, the authors provide evidence that Skanda could be a negative regulator of the SP cascade, impacting cleavage of proHayan and proPsh, and consequently Toll pathway and PPO1 activation. This work reaches further by transposing these findings into the D. melanogaster in vivo model. Here, however, the picture becomes more confusing as Skanda at native levels does not appear to regulate either the Toll pathway or the melanization cascade. Only one strong phenotype was identified in that decreased expression of Skanda increased susceptibility to S. aureus infection while increased expression decreased susceptibility. The mechanism for this remains unclear. To their credit, the authors carry out an in-depth analysis to rule out all the obvious possibilities. In the discussion, the authors explore the basis of discrepancies between their biochemical and genetic findings. We would suggest that an additional one to consider is differing roles or behaviors of Skanda in the microenvironments of the local site of injury (where S. aureus may be contained when it is tolerated) and the hemolymph. In summary, this is a valuable analysis of the innate immune component Skanda whose role has become somewhat clearer through these studies, but still remains obscure.
  
  Major Comments
  
  To assess bimodal distribution of bacterial loads within single flies in Fig 6E, authors should either: increase the sample size to allow for proper statistical assessment of different distributions among genotypes, specifically between w1118 and skanda_d107; or, provide a modelling framework for statistical testing. Otherwise, the present results seem insufficient to conclude that Skanda is playing a role in resistance to S. aureus.
  
  Another way to assess a role for tolerance in the Skanda mutant would be to measure BLUDs (https://doi.org/10.7554/eLife.28298 ) and/or transcription of CrebA (https://doi.org/10.1371/journal.ppat.1006847).
  
  The error bars on qRT-PCR datasets are large, the data points are not shown so we do not know how many replicates were included in the graphs (Fig 5 B and C, Fig 6C, Fig 7 A and B, and Fig 8B). Bar plots are not the most faithful reproduction of biological datasets, as they can hinder significant information regarding datapoints distribution and variation (Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm | PLOS Biology). We advise that, particularly in the case of datasets such as qRT-PCR, the final values of fold change are represented with individual dots, with the mean value clearly represented, whether with or without the additional bar graph. Furthermore, no statistical tests were applied to determine significance. Data points should be shown and appropriate statistical tests should be applied. The number of biological replicates should be included in the analysis and the statistical test applied should be noted in the figure legends.
  
  Although there are claims of Skanda conferring resistance to S. aureus infection, only Drs levels are tested. These conclusions could be strengthened by assessing expression levels of additional AMPs.
  
  Minor Comments
  
  Parag. 1: (data not shown) should be removed and if possible AlphaFold prediction of skanda conformation added. Alternatively, remove sentence.
  
  Parg. 3: 1000 mL? why not 1L?
  
  Parag. 5: , in last sentence that should be .
  
  Parag. 6: "a role at the same position..." does not convey the correct message< replace with equivalent?
  
  Figure axes (5D, 5E, 6D, etc...) of melanization assays are wrongly named "% melanisation", with "s"
  
  Parag. 21: compound mutants (if correctly interpreted as dataset presented in Fig. 8B) were tested at 6h, 24h and 48h, and not 32h, as written in the text
  
  Results section "skanda is not mandatory for the activation of the Toll pathway" adopts a literal translation which would probably be better phrased as "is not essential"
  
  Discussion parag. 2: "Skanda exhibits..."
  
  Discussion last parag: "..., but also underlies..."
  
  It has been evidenced that
  
  Additional comments:
  
  The sentence on page 2 beginning with "Upon binding, these PRRs..." is very long and difficult to follow. This should be rewritten.
  
  In many places in the manuscript bacterial "dose" is used in place of bacterial burden. The dose is the amount of a substance or bacterium given to the animal.
  
  Page 11: Skanda is described as a placeholder when I think a (competitive) inhibitor would be more appropriate.
  
  Referee cross-commenting
  
  I agree with the comments of the other reviewers.
  
  Significance
  
  Strengths: The authors take a multi-disciplinary biochemical and in vivo approach to understand the molecular interactions among SPs and SPHs and thereby uncover the role of the protein Skanda that might otherwise not have been appreciated. They have made extensive use of novel transgenic fly lines, generated in the context of this study, and have thoroughly tested their specificity and cis-acting potential. These will provide a resource to the field. In addition to the new description of Skanda, these findings strengthen previous knowledge regarding systemic infections with different bacteria (M. luteus, S. aureus) and reproduce the known redundancies of Psh and Hayan modes of action. Moreover, this research is relevant for the expansion of basic knowledge on innate immunity, particularly in the field of insect-pathogen interactions, making use of S. frugiperda cell lines and D. melanogaster adults and larvae. Although not at the focus of this work, the evolutionary conserved nature of these aspects of innate immunity across these two distant species enhance the importance of these findings.
  
  Weaknesses: Some assays do not include enough biological replicates and others do not have enough information on how many biological replicates were performed. Therefore, the conclusions drawn are difficult to assess. Lack of statistical analysis on the qPCR experiments complicates the interpretation of results.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2025.09.30.679548
www.biorxiv.org www.biorxiv.org

The SynMuvA lin-15A licenses natural transdifferentiation by antagonizing identity safeguarding mechanisms

1
1. EMBOpress 25 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  We thank the reviewers for their insightful comments.Please find below a point-by-point response.
  
  As the authors acknowledge in the section at the end of the discussion (Limitations of this study) it is not established that LIN-15A has a cell-autonomous function in Y-to-PDA transdifferentiation. Given that LIN-15A has a cell non-autonomous function in vulval development (Herman and Hedgecock, 1990) it is possible that its function here could also be. The authors have used an egl-5 promoter to rescue lin-15A through expression in rectal cells; however, all these cells are in a neighborhood. The lack of a promoter that is specfic for Y has impeded answering this question (a standard genetic mosaic analysis would be problematic because of the incomplete penetrance of the mutation). Although this issue is addressed in the section at the end of the Discussion, I think most readers would like to see this acknowledged earlier in the presentation, perhaps after describing the egl-5 rescue experiment.
  
  We thank the reviewer for this comment and agree that our data do not formally demonstrate a Y-cell autonomous role for LIN-15A during Y-to-PDA transdifferentiation, as we discussed in the manuscript. As suggested, we have modified the Results section immediately after the egl-5 rescue experiment to explicitly acknowledge this limitation early-on (see p8, l168-170) and retained the discussion in the "Limitations of the study" section.
  
  The experiment shown in Figure S1B is unconvincing. To show that they are detecting a LIN-15A-LIN-56 heterodimer, the authors need to show that antibody tags to both proteins detect the band. Mass spectrometry or biochemical purifications would also be helpful. As it is, they show the protein(s) detected depend genetically on lin-56 and lin-15A. It was also unclear what the other bands were in the mutant backgrounds
  
  We agree that the experiment shown in Figure S1B does not provide sufficient evidence to conclusively demonstrate the existence of a LIN-15A-LIN-56 heterodimer. While the detected species depend genetically on both lin-15A and lin-56, we agree that additional controls, such as detection through reciprocal tagging, biochemical purification, or mass spectrometry, would be required to firmly establish the molecular nature of the complex and to interpret the additional bands observed in the mutant backgrounds. As this experiment is not essential to the conclusions of the manuscript, we have removed Figure S1B and the associated statements from the revised version.
  
  Lines 207-212. The authors are making an argument that LIN-15A and LIN-56 function as "Licensers" not "Drivers" because they are not strictly required but appear to facilitate the process. Could it be that LIN-15A and LIN-56 function as "Drivers" but that in their absence the fidelity of the process is compromise? There are many ways that genetic redundancy can be manifested at biochemical levels and the concern is that there are other interpretations of the data. In this regard, the authors should consider rewriting the Abstract to focus on the genetic results underpinning the work. The current version or the Abstract focuses on an interpretation of the data, not the data itself.
  
  We thank the reviewer for this thoughtful comment. We agree that the Driver/Licenser terminology represents an interpretation of the genetic data and that additional activities acting alongside LIN-15A may exist: lin-15A alleles used in this study correspond to null alleles - that is total loss of lin-15A activity - and approximately half of the animals still successfully undergo Y-to-PDA transdifferentiation in these null mutants. Thus, lin-15A activity is either not strictly required to facilitate the initiation of the process (e.g., threshold model). Or this may point to other factors (than lin-56) able to somewhat compensate for lin-15A absence and that remain to be identified. In line with this interpretation, while we retrieved several alleles for some of the genes identified in our forward genetic screen (in which lin-15A was identified), that screen may not have been saturated. Note that both these hypotheses are compatible with a role for LIN-15A as a licenser of the initiation of the process.
  
  Importantly, our distinction between "Drivers" and "Licensers" is not solely based on the incomplete penetrance of lin-15A and lin-56 null mutants. First, the distinction reflects the different biological roles inferred from our genetic analyses. The previously characterized factors CEH-6, SOX-2, SEM-4, EGL-27, EGL-5 and HLH-16 are conserved plasticity factors that promote the initiation of transdifferentiation. Their loss results in a complete, or near-complete, failure of Y-to-PDA initiation, and they act within a common plasticity-promoting network. By contrast, LIN-15A and LIN-56 define a genetically distinct pathway. They are neither upstream nor downstream of the Driver cassette, and display additive interactions with partial Driver mutants. Second, loss of LIN-15A does not affect the fidelity or outcome of transdifferentiation. In all defective animals examined, the Y cell retains its normal position, morphology and rectal markers, indicating a failure to initiate the process rather than the production of an aberrant cell type. Third, the fact that a core Driver set is involved in different transdifferentiation events (ie Y-to-PDA and K-to-DVB) but not lin-15A or lin-56 further argues against LIN-15A acting as a Driver. And finally, and most importantly, lin-15A and lin-56 antagonize SynMuvB chromatin regulators known to safeguard differentiated cell identities, while the Drivers do not. In fact, the transdifferentiation process is mostly restored in some lin-15A; SynMuvB double null mutants, suggesting that LIN-15A main function is to block these genes activities. We therefore favor a model in which the Drivers cassette triggers transdifferentiation, whereas LIN-15A and LIN-56 facilitate the process by alleviating inhibitory constraints imposed by identity-safeguarding mechanisms. We have reformulated this in the manuscript in order to make it clearer and also clarified how we define Drivers and Licencers activitities (see p10, l211-217; p13, l283-289 and p14 l312-315). We have further reformulated the abstract to integrate the reviewer's comments.
  
  Minor Points
  
  __ 1. Line 279. The authors state that "LIN-15A becomes dispensable when member of the SynMuvB factors are absent." This statement is not completely accurate as the suppression is incomplete.__
  
  Addressed, the statement has been reworded in the revised version (see p13, l285-286)
  
  2. Line 294. The number in Tagble S1 is 58.8% not 65%
  
  Addressed, thank you for spotting this, Table S1 was correct, and the typo in the Results section was corrected (see p15, l319).
  
  3__. Lines 300-301. I couldn't find the data for lin-40. __
  
  __- __The data can be found in Fig. 3Bii (which we have more clearly indicated in the text) and SI table 1.
  
  __. Line 363. Should be "represses cell cycle genes." __
  
  - Addressed
  
  __5. Line 862. AJM-1 is not a tight junction component. AJM-1 is best described as a component of apical junctions. __
  
  __- __Absolutely ! Addressed
  
  Reviewer 2
  
  The conclusions derived from the presented data are generally comprehensible but should be phrased more carefully to grant full legitimacy. The reason is that the central mechanistic claim that LIN-15A licenses Td by antagonizing most of the SynMuvBs chromatin factors, including DREAM, rests on whole-animal ChIP-seq that cannot resolve the Y cell. The authors acknowledge that "it was not technically feasible to purify sufficient Y cells for analysis" and therefore use synchronized unstarved L1 whole-animal lysates. This is certainly legitimate, but demands more tact when using such a conclusion as the headline claim.
  
  We thank the reviewer for this important comment and agree that the mechanistic conclusions drawn from the ChIP-seq data should be presented more cautiously. As noted by the referee and in the manuscript, it was not technically feasible to isolate sufficient Y cells for chromatin profiling and therefore all ChIP-seq experiments were performed on synchronized whole-animal L1 populations. We agree that these experiments cannot directly establish the mechanism operating in the Y cell. Rather, our genetic analyses demonstrate that LIN-15A antagonizes identity-safeguarding SynMuvB factors during Y-to-PDA transdifferentiation. The ChIP-seq data provide an additional and independent line of evidence suggesting that this antagonism may involve modulation of DREAM chromatin occupancy. We have rephrased to state this more clearly. We thus have revised the Abstract (see p2, l7), Introduction (p6, L110-112), Results (see p18-19, l405-420) and Discussion (see p23-24, l523-542 and p25 l566-575) to more clearly separate the conclusions supported by the genetic analyses from the mechanistic interpretation suggested by the ChIP-seq data. We further clarify that the relevance of this mechanism to the Y cell remains a hypothesis consistent with, but not directly demonstrated by, the available data.
  
  Also, in the context of the ChIP-Seq experiments, it is understandable that it could not be conducted in a cell-specific manner, but two duplicates in some ChIP-Seq experiments (as stated in the material and methods) is below standard.
  
  We thank the reviewer for this comment and agree that two biological replicates represent the lower end of what is generally desirable for ChIP-seq analyses. To clarify, more biological samples were initially generated than are represented in the final analysis. In total, five independent biological preparations were performed for each genotype. However, the experimental design imposed substantial technical constraints. Because the experiments required tightly synchronized fed L1 populations (ie, not using a starvation step), standard synchronization procedures could not be used and animals instead had to be collected through successive hatch pulses, resulting in considerably lower yields. Combined with the mutant backgrounds analyzed, this led to variable ChIP-seq quality across preparations. To ensure robustness, we restricted the final analyses to datasets that passed all predefined quality-control criteria. As a result, some conditions were ultimately represented by only two high-quality biological replicates. We agree that this limitation should be made more explicit and have added this information in the Materials and Methods section (p35 l774-779). Despite the reduced number of replicates retained for some conditions, the genome-wide binding patterns observed for LIN-15B and LIN-35 in wild-type animals closely recapitulated those reported previously by the Ahringer laboratory (Gal et al., 2022; SI table 2), supporting the overall robustness and biological validity of the datasets used in this study. More generally, we have also tempered the interpretation of the ChIP-seq experiments throughout the manuscript. We view these data as supportive evidence consistent with a chromatin-level mechanism, rather than as definitive mechanistic proof, and have revised the text to reflect this more clearly.
  
  Regarding the genetic interactions with met-2: as MET-2 works in concert with other SET domain proteins, such as SET-25, and also HPL-2, is there a possibility they may be implicated?
  
  We also considered the possibility that the interaction observed with MET-2 could reflect a broader involvement of the H3K9 methylation machinery, given the well-established functional relationships between MET-2 and other SET domain proteins. To address this possibility, we tested whether SET-25 and SET-32 losses suppressed the lin-15A phenotype. In contrast to met-2 loss-of-function, neither set-25 nor set-32 mutations modified the transdifferentiation defects observed in lin-15A mutants. These observations suggest that the interaction is not a general property of all MET-2-associated SET domain proteins and may instead reflect a more specific role for MET-2 in this context, although we have not tested triple mutant combinations, such as met-2; set-25; lin-15A or met-2; set-32; lin-15A, and therefore cannot exclude additional contributions from these factors. However, based on the available genetic evidence, our data support a model in which the phenotype is more closely linked to the SynMuvB-centered identity-safeguarding machinery than to the canonical MET-2/SET pathways. We now mention these negative results p14, l290-295 and in the discussion (p22, l510-511) of the revised manuscript. HPL-2 itself was tested alongside the other SynMuvBs, as previously reported to be a SynMuvB (Fig. 4Ci). Loss of HPL-2 had the same effect than loss of the other SynMuvBs. Together these data further suggest that the canonical SynMuvB machinery is at play, including MET-2, but not a generic requirement for all H3K9 methyltransferases, and instead points toward a more specific role of MET-2 within the SynMuvB.
  
  The fact that Y-to-PDA in males (which involves a cell division) shows the same lin-15A dependence as in hermaphrodites is informative and a bit underplayed. Since this argues against a cell-cycle-coupled mechanism (an important aspect of the reprogramming field) for LIN-15A, it is worth elaborating on this in the discussion.
  
  We thank the reviewer for this insightful comment and agree that this result deserves further discussion. One of our initial hypotheses was indeed that LIN-15A might be specifically required in transdifferentiation events that occur without a cell division. Cell division and DNA replication have long been proposed to facilitate cellular reprogramming by promoting the dilution or resetting of identity-safeguarding mechanisms. In this context, it was conceivable that LIN-15A and LIN-56 might compensate for the absence of such a process during hermaphrodite Y-to-PDA transdifferentiation. However, our data do not support this model. We found that LIN-15A and LIN-56 are similarly required for Y-to-PDA transdifferentiation in males, despite the fact that this event occurs through a cell division. Conversely, neither factor is required for the K-to-DVB transdifferentiation, which also occurs in the rectum at a similar developmental stage and likewise involves a cell division. Together, these observations argue that the requirement for LIN-15A is not determined by the presence or absence of cell division. Rather, they suggest that the Licensers activity is context-dependent and linked to specific cellular identities. We agree that this point also strengthens the notion that Licensers are distinct from Driver factors, which function in both Y-to-PDA and K-to-DVB transdifferentiation. We have therefore modified the discussion (see p20 l441-443 and l455-480).
  
  Minor: __ - in the legend of Figure 1 and other places, it should be "Fisher's exact" instead of "Fisher exact" - line 31; exhibits instead of exhibit - line 85: results instead of result - line 228: involvement instead of involvment - line 293: "of missing" in loss of lin-36 had no effect while loss ... lin-53 further - lines 297 - 299: check sentence; reads not correct - line 395: "with an increase" - line 484: "with regard"__
  
  All points were all addressed in the revised version.
  
  Reviewer 3
  
  Based on the observation that LIN-15A does not affect SynMuvB expression in Y (figure S4), the authors conclude that antagonism of the SynMuvBs by LIN-15A is not likely mediated by a negative control of their expression, but rather by impacting their activity. However, as suggested by the authors, antagonistic functions on the same targe genes is also a possibility. The classical approach to test this would be through expression profiling. I understand that RNA-seq on single Y cells cannot be carried out for technical reasons and that bulk RNA-seq would not be informative. Importantly, the same reasoning applies to the ChIP-seq data that is presented in support for common regulatory functions of a subset of synMuvs and LIN-15A (Figure 6 and S6), which was obtained from whole animals. The relevance of these results to the Y to PDA Td process is therefore extremely limited, as the claim that LIN-15A restricts lin-35/DREAM binding on a subset of target genes is based on a reported decrease in DREAM binding in lin-15 mutants in bulk chromatin. This is especially true as both DREAM and LIN-15A are widely expressed proteins.
  
  We agree with the general limitation highlighted here. As the reviewer notes, neither expression profiling nor chromatin profiling can currently be performed specifically in the Y cell due to the extremely small number of cells involved and the lack of suitable purification strategies. Consequently, the ChIP-seq experiments were performed on synchronized - and fed - whole-animal L1 populations. These data do not directly establish the mechanism operating during Y-to-PDA transdifferentiation. Rather, our conclusions are based on two distinct observations. First, the genetic analyses demonstrate an antagonistic relationship between LIN-15A and multiple SynMuvB factors during transdifferentiation. Second, the ChIP-seq experiments provide independent evidence that LIN-15A can influence DREAM chromatin occupancy at the organismal level. We interpreted these observations together as supporting a model in which the genetic antagonism may involve modulation of SynMuvB/DREAM chromatin activity. We agree, however, that the ChIP-seq data do not demonstrate that these chromatin changes occur in the Y cell itself, nor do they identify the relevant target genes involved in Y-to-PDA transdifferentiation. We have therefore revised the manuscript to more clearly distinguish between the conclusions supported directly by the genetic analyses and the mechanistic interpretation suggested by the ChIP-seq experiments. Throughout the revised version, and in the discussion in particular, we present the chromatin-level model as a hypothesis consistent with the available data rather than as a demonstrated mechanism operating in Y (see p2, l7 ; p6, l110-112 ; p18-19, l405-420 ; p23-24, l523-542 and p25 l566-575).
  
  In addition there are specific issues with Figure 6, which is mislabeled: upregulated and downregulated applies to gene expression, while the numbers refer to binding peaks. Why are some numbers in red (not mentioned in the legend). An example of the corresponding genome browser tracks should be shown in supplementary. Was a spike-in used to normalize data?
  
  We thank the reviewer for these helpful suggestions. We agree that the terminology "upregulated" and "downregulated" is potentially confusing in the context of ChIP-seq peaks. In the revised manuscript, we have replaced these terms with "up-bound" and "down-bound" in Figure 6. Regarding the red numbers, these were originally highlighted to emphasise the relatively small number of peaks showing decreased occupancy in lin-15A mutants compared to the other genotypes analyzed. However, as this information was not explained in the legend and may be confusing to readers, we have removed the color coding in the revised figure. Following the reviewer's suggestion, we have also added representative genome browser tracks in the Figure S6E to illustrate the binding changes described in Figure 6. No exogenous spike-in controls were used in these experiments. The ChIP-seq workflow was intentionally designed to closely follow that used by Gal et al. (2022), to allow direct comparison with the published LIN-15B and LIN-35 datasets. However, several observations suggest that the patterns reported here are unlikely to result from normalization artifacts alone. First, the genome-wide binding profiles obtained for LIN-15B and LIN-35 in wild-type animals closely recapitulate those reported previously, providing an independent validation of the overall quality of the datasets. In addition, the different mutant backgrounds exhibit distinct peak gain/loss profiles rather than a common directional shift that would be expected from a systematic technical bias. Nevertheless, we acknowledge the absence of spike-in controls as a limitation of the dataset and have clarified this point in the revised manuscript in the Material and Methods section (see p36 l84-805).
  
  Overall the discussion is highly speculative and could be shortened and refocused on the actual findings reported. For example, the fact that GO terms associated LIN-15B targets are associated with membrane processes (mentioned above) is not sufficient to speculate that LIN-15A could increase the delaminating capacities of Y by alleviating SynMuvB repression of membrane process genes.
  
  Our intention was to discuss possible mechanisms that could connect the observed genetic interactions to the cellular events underlying Y-to-PDA transdifferentiation. We fully agree that some of these interpretations, such as the impact of the DREAM/LIN-15A antagonisms on membrane remodeling, are purely speculative in nature. We have removed the following sentence : "In brief, the role of the Licensers would be to provide a favorable chromatin context for cellular processes that favor/install a plastic state, possibly through the modulation of membrane processes as suggested by our ChIP-seq analyses (Fig. S6). » and changed it to "In this framework, Td Licensers would facilitate transdifferentiation by alleviating identity-safeguarding chromatin states, thereby creating a permissive context for the Drivers to execute the Td program. », and have removed the paragraph describing Y delamination. More generally, we have substantially shortened and refocused the Discussion section to answer the referee's comment.
  
  The classical definition of a licensing factor is a protein (or complex) that allows the start of DNA replication from a replication origin. In the field of reprogramming, the term "licenser" has been applied to pioneer factors which 'license' transcriptional reprogramming by accessing chromatin to initiate a series of events, including binding of additional, non-pioneer transcription factors and additional chromatin regulators. Here the authors apply the term 'Licensers' to LIN-15A and LIN-56 as factors that facilitate the Td process. This may lead to confusion (and implications) as to what these factors are actually doing.
  
  We thank the reviewer for raising this point. We agree that the term "licensing" has been used in several biological contexts, including DNA replication and, more recently, cellular reprogramming, where it is often associated with pioneer factors that initiate chromatin remodeling and transcriptional changes. However, our use of the term "Licenser" is intended to describe a distinct functional concept emerging from the genetic analyses presented here. We introduced this terminology to distinguish a class of factors that facilitate transdifferentiation by alleviating identity-safeguarding mechanisms from the previously identified "Driver" factors that actively promote the cell-fate transition itself. In this framework, LIN-15A and LIN-56 are not proposed to act as pioneer factors or direct initiators of transcriptional reprogramming. Rather, the genetic data support a role in creating a permissive context for transdifferentiation by antagonizing mechanisms that oppose cell-fate change. We agree, however, that this distinction was not sufficiently defined in the original manuscript and may lead to confusion. We have therefore revised the Results and Discussion to explicitly frame it in the context of transdifferentiation ("Td Licenser"), define what we mean, and to clearly distinguish this usage from previous applications of the term in DNA replication and reprogramming studies (for instance, see p10, l211-217; p13, l283-289 and p14 l312-315).
  
  __Minor comments: __
  
  __ Abstract: why are Drivers and Licensers in capitals? How is Driver defined? __
  
  __- __We use capital letters to signal that these represent two conceptual categories. However, this could be changed if that impairs reading. Drivers are defined in this study as plasticity factors whose loss completely prevents Td initiation (see p10, l211-217 and p14 l312-315).
  
  __Figure Aii: no PDE, ajm-1::GFP positive Y cells. It is not clear how the Y cell is identified-isn't ajm-1 supposed to surround the cell? The difference between the top and bottom ajm-1:egl-5 panels is not clear to a non expert. LIN-26 panel is missing. __
  
  The Y cell is identified by its location at the ventral-most position on the anterior side of the rectal slit. AJM-1 is a component of the apical junctions, hence it is expressed at the apical domain of the Y cell. The LIN-26 typo has been corrected, the marker used in this experiment is the rectal-specific gene egl-5 which labels the nucleus of the Y cell.
  
  2F color scheme : licensers are not in yellow but pink
  
  Addressed : they now are yellow in the revised version
  
  Fig S1: need to provide more details about experimental conditions for WB-stage, conditions (reducing agents?), nature of Q2015 antibody. In the absence of this information hard to substantiate claim of a LIN-15/LIN-56 heterodimer in the text -
  
  See answer to reviewer #1 : we agree that this experiment is dispensable for the results presented in this manuscript and adds more questions than useful information, and it has been removed from the revised version.
  
  __Line 130. What is the nature of the LIN-56 protein? This would be useful information __
  
  Addressed. We have indicated this early on in the introduction (p5, l95-96) and in the Results section (p7, l132-134). Note that little is known about LIN-56 except its association with LIN-15A in VPC specification and that is equally possesses a THAP-like C2CH motif.
  
  __ Line 38 yielding__
  
  Addressed
  
  __Line 48 identities suggested by Blau and Baltimore (1991). __
  
  Addressed
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.01.22.701184
www.medrxiv.org www.medrxiv.org

Feasibility and acceptability of hepatitis C virus self-testing models among high-risk groups in Nasarawa, Nigeria; exploratory cross-sectional analysis of an implementation study

1
1. JanLouis 25 Jun 2026
  
  in PLOS Global Public Health
  
  R0:
  
  Reviewer #1:
  
  Definitely a timely article - contributing to the evidence base around HCV self-testing - an important further step in HCV elimination methods.
  
  Overall, I thought the paper to be well-written and close to publication. However, there were some areas that I thought needed either amendment or possible additional analysis. In particular, I have some real concerns about the description of acceptability analysis.
  
  Abstract:
  
  Clarify that use of HCVST is referring only to antibody testing (this applies to the paper throughout)
  
  I would clarify that analysis is only based on 1,995 valid participants.
  
  Introduction:
  
  Perhaps just a quick sentence describing the nature of Nigeria's epidemic - i.e. is it generalised of population specific?
  
  Methods:
  
  IMPORTANT: Needs to be more information about participant recruitment. The paper at times implies that the large sample size equates to high acceptability - but there is no information about the refusal rate. This would actually be a critical aspect of determining acceptability of self-testing. If you recruited 2,000 participants - but an additional 2,000 refused to participate (for example) - this would suggest self-testing is acceptable to only 50% of clients.
  
  IMPORTANT: Very little in methods about your qualitative interviewing - this needs expansion.
  
  IMPORTANT: There is very little satisfactory information about the acceptability questionnaire - and it's subsequent transformation into an acceptability score using factor analysis. Currently, it reads as if the audience is simply supposed to trust the authors that all methods were satisfactory, and that - therefore - results are valid. You don't specify which questions were included in factor analysis, or if it was appropriate to score each item similarly or to aggregate scores. You also don't indicate if the questions were a previously developed tool or something developed specifically for the study - or that it's even available in the supplementary data. This almost feels like a separate paper, so that your method of transforming the acceptability survey can be more properly peer reviewed.
  
  IMPORTANT: For you acceptability analysis, how did you determine that 10% equals 'unacceptable'? Shouldn't people with only a 20% acceptability score be considered broadly non-accepting?? Also, considering all the noted limitations and collinearity issues with site - does it even make sense to compare by site?
  
  How did you determine the 2,000 sample size?
  
  The selection criteria is difficult to follow and should be clarified - perhaps with dot points?
  
  Specify that "blood-based" testing means finger prick?
  
  Again, need to clarify that testing is antibody, correct? Presuming also that RA information discussed testing procedures – i.e. what happened if the ST was positive and it’s interpretation?
  
  How many of each test did you have? Why did you not have enough tests so that any participant could choose whichever test they wanted for the entirety of the study period?
  
  Presuming drug abstention is not a requirement to treatment initiation?
  
  For the exposure variables - is 'education' referring to attendance or completion?
  
  You explain later in the paper that participants could be part of multiple KP categories - but probably good to explain in the "exposure variables" section.
  
  Results:
  
  Non-considering possible refusal rates - your results suggest that of 2000 participants, essentially 100% completed the self-test? This is a remarkable outcome – particularly for the 849 participants who were unobserved!
  
  "Five participants 255 took HCVST kits home but did not return to complete the endline survey, therefore no demographic or clinical information was recorded for those participants, and they were excluded from the analysis." Why wasn't demographic information collected at baseline, as would be usual?
  
  Suggest re-doing Figure 2 - removing the column for those 'tested' as this is inherently 100% and that every subsequent column is small by comparison. Add percentages to columns. Could consider comparing clinic types.
  
  Following on from methods comments regarding the acceptability data - you don't specify the number of participants falling within the purported "least 10%". The sentence describing qual data “Acceptability appeared to be more associated with clients characteristics than facility type…” - also suggests there are issues with your acceptability analysis methods.
  
  Did you consider differences across test type? In particular those who elected to conduct unobserved self-testing? What were the characteristics of these individuals?
  
  Sentence on page 18 “Observations about test type choice are thus descriptive only, not inferential” should probably be in the methods – following discussion of how tests ran out.
  
  Discussion:
  
  How much do you think the demographic differences between facility-type clients impacted outcomes? For example – OSS clients are much younger and more males.
  
  OSS clients used more ‘blood’ tests. Did this have any impact on acceptability?
  
  The statement about PWID ‘surprise’ may be misplaced. How many PWID did you approach, who refused to participate? How does the number recruited compare against the OSS client list? Also, it’s 28% (PWID) compared to (31% MSM) – are you surprised about MSM?
  
  “Additionally, we did not assess whether the acceptability scale functioned equivalently across ART and OSS settings, so observed differences should be interpreted with that caveat.” - what does this statement mean? Is this another potential flaw of the acceptability scale?
  
  I'd include explanation of the SVR12 limitation somewhere higher in the paper.
  
  AE:
  
  This is an important report on controlling viral hepatitis and, in particular, hepatitis C (HCV) in Africa. The report is well written and is also a good example of how to conduct feasibility studies. The authors managed to set a clear and moving background. However, there are a few issues that must be addressed.
  
  Issues: 1. Please indicate how the 2000 sample size was determined.
  
  Line 132 is a bit unclear. Can it be rewritten?
  
  About the acceptability, it needs to be recognized that this instrument has never been tested in this population before. The EFA and CFA results are good, but it is not a guarantee that this will work forever.
  
  The continuous acceptability score is used/analysed in two ways. One is the study of distribution using histograms and boxplots. The other one is through a dichotomization of the acceptability and use it as an outcome for the binary logistic regression. • For the study of distribution, as in figure 3, I would suggest a cumulative counts plot. On such plot, you could put counts on the left y-axis, and percentages on the right y-axis. Heights are easier to read than histogram areas. • About the “least 10% of acceptability” concept. Can we call this the lowest decile acceptability score? • Figure 4 is successfully to show the distribution of the acceptability score for a few key variables. However, because it is based on quartiles, it fails to tell us the proportion of observations below the line of the overall lowest decile. I suggest adding a table with such proportions.
  
  About the cascade. Table 2 is accurate and quite informative. Figure 2 is a bit deceptive, because it suggests that the denominators for each step are accurate [which is not true for the last step, for example]. So I suggest i) to keep table 2 and ii) compute accurate proportions (in percentages) and add 95% confidence intervals for the cascade plot.
  
  R1:
  
  Reviewer #1:
  
  Authors have responded to my comments
  
  Given the noted limitation of not being able to demonstrate comprehensive acceptability (without refusal rates) – it’s advised to further review the paper for instances of potential overstatement of findings. For example, on page 20. “…while both…facilities demonstrated high engagement…” – 1,000 clients is definitely a lot, but without the context of how many overall clinic clients there are, and how many may have refused, it would be prudent to temper the statements slightly.
  
  Importantly, the additional information provided by the authors about their bespoke acceptability measure gives me greater reservations – considering it has no prior validity testing. This is not addressed in the limitations and definitely should be.
  
  R2:
  
  Reviewer #1:
  
  Thank you to the authors for addressing my comments.
Visit annotations in context

Annotators

JanLouis

URL

medrxiv.org/content/10.1101/2025.11.17.25340458v1
www.biorxiv.org www.biorxiv.org

Distinct cortical encoding of acoustic and electrical cochlear stimulation

1
1. Public_Reviews 24 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Summary of revision for all referees:
  
  We thank referees for their constructive comments. To address their concerns, we now performed additional statistical analyses integrating both paired and unpaired data, performed positive controls for comparisons between NH- and CI- evoked iEEG measurements, developed tools for measuring and collected new experimental data on forward masking ECAP measurements in CI implanted rats (N=3), and reworked both manuscript text and figures to improve clarity. These most significant changes are summarized here, and a complete list of responses to reviewers and corresponding changes will follow.
  
  Summary of major changes to revised manuscript:
  
  (1) Statistical treatment of paired vs unpaired recordings using mixed-effects models (updates to all manuscript figures that compare NH vs CI); this largely confirmed the results reported in our original submission.
  
  (2) New analysis, controlling for information-theoretic cross-modality comparison (i.e., training with tone- and testing with cochlear implant-evoked iEEG measures, Fig. 8).
  
  (3) Clarification of methods (Supplemental Fig. 2 & manuscript text)
  
  (4) Additional experiments testing peripheral tuning of our 8-channel CI rodent model via forward masking ECAP measures across 3 animals (N=3, Supplemental Fig. 1)
  
  (5) Detailed response addressing robustness of tonotopy in NH and CI animals
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Strengths:
  
  The study poses a timely, clinically relevant question with clear implications for CI strategy. The analytical toolkit is appropriate: µECoG captures mesoscale patterns; TCA offers a transparent separation of spatial and temporal structure; and mutual-information decoding provides an interpretable measure of single-trial discriminability. Within-subject recordings in a subset of animals, in principle, help isolate modality effects from inter-animal variability. Where analyses are most direct, the acoustic condition yields higher single-trial decoding accuracy, which is a meaningful and clearly presented result.
  
  We appreciate the comments on the strengths of our analytic approaches.
  
  Weaknesses:
  
  Parts of the statistical treatment do not match the data structure: some comparisons mix paired and unpaired animals but are analysed as fully paired, raising concerns about misestimated uncertainty.
  
  Please see our response to specific comment #2 above. In short, we agree with this critique of our original analyses, and in our revised manuscript we re-analyzed all NH vs. CI comparisons using linear mixed effects models that incorporate both paired and unpaired observations within a single framework. This allows us to include all animals, account for within-animal dependence for paired experiments (normal hearing and cochlear implant data from the same animal when available), and to align the statistical tests with the data shown in the figures. In almost every case, the mixed effects models confirm our original conclusions. Two comparisons that were previously nonsignificant now reach criterion for statistical significance (Fig. 2E, p=0.048 and Fig. 6F, p=0.027). We updated the manuscript to report these values and to clarify the use of mixed effects modeling in the methods under the section titled, “Linear mixed effects modeling.”
  
  Methodological reporting is incomplete in places; essential parameters for both acoustic and electrical stimulation, as well as objective verification of implantation and deafening, are not described with sufficient detail to support confident interpretation or replication.
  
  Please see our response to comment #5 below. We have revised our manuscript to now include this information in the methods.
  
  Figure-level clarity also undermines the message. In Figure 2, non-significant slopes for CI, repeated identification of a single "best channel," mismatched axes, and unclear distinctions between example and averaged panels make the assertion of spatial organisation unconvincing; importantly, the normal-hearing panels also do not display tonotopy as clearly as expected, which weakens the key contrast the paper seeks to establish.
  
  This is an important point, thanks- please see responses to comment #1 above. We note that conventional tonotopic maps in auditory cortex are characteristic frequency maps, i.e., maps of topographic organization for responses to lowest-threshold stimuli (often presented around 20-50 dB SPL). Our maps were constructed from stimuli presented at 70 dB SPL, thus blunting crisp tonotopy to some degree. Furthermore, we quantified spatial organization using a previously published method from the Polley lab (Romero & Hight et al. 2020), in which local tonotopic gradient vectors (magnitude and direction) were computed from GCaMP responses at each pixel and projected onto a unit circle. Mean vector strength across all pixels was then compared to a shuffled distribution as a measure of tonotopic organization. We applied the same procedure to our iEEG best-frequency and best-channel maps. Both map types yielded mean vector strengths that were substantially larger than those derived from shuffled maps (p < 10<sup>-10</sup>), indicating that our maps have a consistent tonotopic (for BFs) or cochleotopic (for CI channels) organization that is highly unlikely to arise by chance. This is now included in our revised manuscript.
  
  Finally, the decoding claims would be strengthened by simple internal controls, such as within modality train/test splits and decoding on raw ERP/high-gamma features to demonstrate that poor cross-modal transfer reflects genuine differences in the underlying responses rather than limitations of the modelling pipeline.
  
  Please see our response to comment #12 below. In short, we have now included this analysis in revised Figure 8.
  
  Reviewer #2 (Public Review):
  
  Strengths:
  
  The study includes interesting analyses of the sound and cochlear implant representation structure based on decoders.
  
  We appreciate the comment on how interesting our analyses are, thanks!
  
  Weaknesses:
  
  The observation that responses to cochlear implant stimulation (stimulation) are spatially organized is not new (e.g., Adenis et al. 2024).
  
  We agree that it is not particularly novel to report that there is spatial organization to cochlear implant stimulation. However, we believe that our direct comparisons (when possible, within animal) between normal-hearing and cochlear implant modality maps is unusual in the literature, including asking how decoders based on one set of responses might apply to responses evoked from the other modality. Adenis et al. (2024) is a fantastic study of pulse shape and monopolar vs bipolar stimulation modes with a 6-channel implant in guinea pig, but as far as we can tell this study does also not compare normal hearing maps prior to deafening and implantation to the cochlear implant maps in the same animals.
  
  The claim that spatial and temporal dimensions contribute information about the sound is also not new; there is a large literature on this topic. Moreover, the results shown here are extremely weak. They show similar levels of information in the spatial and temporal dimensions, and no synergy between the two dimensions. This is however, likely the consequence of high measurement noise leading to poor accuracy in the information estimates, as the authors state.
  
  Good point, please see our response to comment #1 below.
  
  The main claim of the study - the mismatch between cochlear implant and sound representation - is not supported. The responses to each modality are measured in different animals. The authors do not show that they actually can compare representations across animals (e.g., for the same sounds). Without this positive control, there is no reason to think that it is possible to decode from one animal with a decoder trained on another, and the negative result shown by the authors is therefore not surprising.
  
  Good point, thanks- please see our response to comment #2 below, where we describe this new control we have added.
  
  Reviewer #3 (Public Review):
  
  Strengths:
  
  The model combining micro-eCoG and cochlear implantation and the methodology to extract both the Event Related Potentials (ERPs) and High-Gammas (HGs) is very well designed and appropriately analyzed. Likewise, the PCA-LDA and TCA-LDA are powerful tools that take full advantage of the information provided by the cortical ensembles. The overall structure of the paper, with a paced and exhaustive progress through each step and evolution of the decoder, is very appreciable and easy to follow. The exploration of single-trial encoding and stimulus identity through temporal and spatial domains is providing new avenues to characterize the cortical responses to CI stimulations and their central representation. The fact that single trials suffice to decode the stimulus identity regardless of their modality is of great interest and noteworthy. Although the authors confirm that iEEG remains difficult to transpose in the clinic, the insights provided by the study confirm the potential benefit of using central decoders to help in clinic settings… the reviewer wants to reiterate that the study proposed by Hight et al. is well constructed, relevant to the field, and that the overall proposal of improving patient performances and helping their adaptation in the first months of CI use by studying central responses should be pursued as it might help establish new guidelines or create new clinical tools.
  
  We thank the Reviewer for the positive comments about the thoroughness of our analyses and clear organization of our manuscript.
  
  Weaknesses:
  
  The conclusion of the paper, especially the concept of distinct cortical encoding for each modality, is unfortunately partially supported by the results, as the authors did not adequately consider fundamental limitations of CI-related stimulation. First, the reviewer assumed that the authors stimulated in a Monopolar mode, which, albeit being clinically relevant, notoriously generates a high current spread in rodent models.
  
  Thanks, this is an important potential concern. Please see our response to comment #5 of Referee 1 and responses to comment #3 below. We agree that monopolar stimulation would be expected to be less spatially specific than bipolar or multipolar modes. However, we chose monopolar stimulation because it is the main clinical configuration in human CI users and therefore most relevant for translational purposes. For our revised manuscript, we made new ECAP measurements of peripheral (spatial and temporal) tuning via a forward masking paradigm and demonstrate that monopolar is effectively tuned (Supplemental Fig. 2). Together with additional single-animal maps in Supplementary Figure 3, together with our vector-strength analysis (Response Fig. 2), demonstrate that even under acute monopolar stimulation we observe structured cochleotopic organization in cortex, rather than the extremely low-pass patterns one might expect if monopolar spread was a major contaminant.
  
  Second, comparing the averaged BF maps for iEEG (Figure 2A, C), BFs ranged from 4 to 16kHz with a predominance of 4kHz BFs. The lack of BFs at higher frequencies hints at a potential location mismatch between the frequency range sampled at the level of the cortex (low to medium frequencies) and the frequency range covered by the CI inserted mostly in the first turn-and-a-half of the cochlea (high to medium frequencies). Looking at Figure 2F (and to some extent 2A), most of the CI electrodes elicited responses around the 4kHz regions, and averaged maps show a predominance of CI-3-4 across the cortex (Figure 2C, H) from areas with 4kHz BF to areas with 16kHz BF. It is doubtful that CI-3-4 are located near the 4kHz region based on Müller's work (1991) on the frequency representation in the rat cochlea.
  
  Please see our responses to comment #3 below.
  
  Taken together with the Pearsons correlations being flat, the decoder examples showing a strong ability to identify CI-4 and 3 and the Fig-8D, E presenting a strong prediction of 4kHz and 8kHz for all the CI electrodes when using a pure tone trained decoder, it is possible that current spread ended stimulating indistinctly higher turns of the cochlea or even the modiolus in a non-specific manner, greatly reducing (or smearing) the place-coding/frequency resolution of each electrode, which in turn could explain the coarse topographic (or coarsely tonotopic according to the manuscript) organization of the cortical responses. Thus, the conclusion that there are distinct encodings for each modality is biased, as it might not account for monopolar smearing. To that end, and since it is the study's main message and title, it would have benefited from having a subgroup of animals using bipolar stimulations (or any focused strategy since they provide reduced current spread) to compare the spatial organization of iEEG responses and the performances of the different decoders to dismiss current spread and strengthen their conclusion.
  
  Please see our responses to comment #4 below as well as our responses related to monopolar vs bipolar stimulation. We agree that for future studies, it will be important to do a heads-on comparison of the differences between bipolar and monopolar stimulation depending on electrode location and stimulation intensity.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  We thank the reviewer for commenting on the strengths of our manuscript, including appreciating the power and timeliness of our approach.
  
  (1a) Figure 2 does not convincingly support the claim that "tone-evoked and CI-evoked iEEG measurements are spatially organized," particularly for CI data: Figure 2C repeatedly highlights the same "best channel," and the slopes in Figures 2B and 2G are non-significant; there are also discrepancies between panels (A vs. C, F vs. H) and mismatched frequency ranges (0-16 kHz vs. up to 32 kHz), which should be clarified as exemplar versus averaged displays and harmonized in scale.
  
  (First we note that Reviewer 3 also raised related concerns about the robustness of tonotopy in our iEEG data.) We address these by comparing our maps to previously published tonotopic maps, and using an established quantitative analysis of tonotopic strength from Romero & Hight et al. (2020).
  
  First, to place our tone-evoked iEEG maps in context, we overlaid them on the same spatial scale and orientation as both single-unit tonotopy in rat primary auditory cortex (A1) from Polley et al. (2006) and iEEG maps obtained with the same surface array in Insanally et al. (2016). The rostral–caudal and dorsal–ventral axes and cortical extents are matched across panels. Our best-frequency maps (Figure 2C) qualitatively recapitulate the high-to-low frequency gradient and spatial layout reported in both of these prior studies, supporting our claim that tone-evoked iEEG captures canonical mesoscale tonotopy. We have updated the manuscript results section to directly reference these two studies, “The area and orientations of tone-evoked maps qualitatively match those published from single unit recordings (Polley et al. 2006) and published using similar iEEG arrays (Insanally et al. 2016).”
  
  Second, to quantify tonotopy in a way that is directly comparable to previous work, we reproduced the analysis of Romero & Hight et al. (2020), who examined tone-evoked GCaMP signals (Romero & Hight et al. (2020)). In that paper, local tonotopic gradient vectors (magnitude and direction) were computed at each pixel and projected onto a unit circle; the mean vector strength across all pixels was then compared to a shuffled distribution as a measure of tonotopic organization. We applied the same procedure to our iEEG best-frequency and best-channel maps (Fig. 2C-E). Both map types yielded mean vector strengths that were substantially larger than those derived from shuffled maps (p < 10<sup>-10</sup>), indicating that our maps have a consistent tonotopic (for BFs) or cochleotopic (for CI channels) organization that is highly unlikely to arise by chance. We cite this paper for these analyses related to Figure 2.
  
  (1b) Figure 2C repeatedly highlights the same ‘best channel’
  
  We agree that many CI-evoked maps are dominated by a single channel, as seen in our exemplar and in the additional animals shown in new Supplemental Fig. 3. In Fig. 2C, channel 5 emerges as the dominant best channel, as CI-evoked activity in this animal is broad and is strongest for channel 5 (Fig. 2A). This reflects a feature of iEEG signals rather than a plotting artifact. Biophysically, iEEG reflects spatially summed local field potentials that low-pass filter underlying neural activity; these far-field signals aggregate excitatory and inhibitory processes and are not expected to show the sharp single-neuron tuning seen in spike recordings. As a result, broad peaks centered on the most strongly driven channels are expected. We have added text in the results section discussing these limitations, overall maps reduced from iEEG responses were similar in size and orientation compared to single unit maps, “albeit at coarser gradients likely due to aggregate recordings of excitatory and inhibitory activity and low-pass filtering due to potentials originating far from recording sites.” We also added in the results section the comparison of spatial correlations (Fig. 2B,G) at the extremes of stimulus separation “electrode separations (CI 1 vs ≥5 electrodes, ERP: p=0.01, HG: p=0.04)” as analyzed by linear mixed effects models.
  
  (1c) Mismatched frequency ranges
  
  We constricted the range of frequencies plotted in some panels (e.g., Fig. 2C from 1.4-32 kHz to 1.4-16 kHz) to emphasize the compressed range of tonotopic gradients and patterns.
  
  (1d) The slopes in Figures 2B and 2G are non-significant
  
  We agree that non-significant group-level slopes indicate that CI-evoked tonotopy is weaker than tone-evoked tonotopy, and we now emphasize this point. At the same time, the data exhibit systematic structure: for both ERP and HG, mean spatial correlations decline monotonically with increasing CI channel separation (Fig. 2B,G). We also directly compared spatial correlations at the extremes of stimulus separations (1 vs. ≥5-channel separation) and found a significant difference. This is updated in the manuscript as: “At the extremes, the spatial correlations were always higher for small vs. large tone separations (NH 0.5 vs ≥3.5 octaves, ERP: p<10<sup>-4</sup>, HG: p<10<sup>-4</sup> Student’s one-tailed t-test) and electrode separations (CI 1 vs ≥5 electrodes, ERP: p=0.01, HG: p=0.04).”. Together with the strong deviation from shuffled maps in the vector-strength analysis (Fig. 2E), we argue that analysis of spatial correlations indicates that CI-evoked maps are not random but reflect a coarse underlying gradient. In addition, as tone-evoked maps exhibit tonotopy, we asked if CI stimulation itself is at least spatially tuned in the periphery. Using ECAPs with a forward-masking paradigm (new Supplemental Fig. 1), we show that probe-evoked ECAPs are significantly more suppressed by adjacent than by distant maskers (N = 3), demonstrating functional spatial tuning of CI electrodes in the cochlea. We have also replotted these results in comparison with the same measurements from a human CI user (Author response image 1). This supports the interpretation that peripheral input is spatially specific and that the weaker cortical cochleotopy likely reflects the properties and resolution of iEEG and acute CI stimulation rather than a complete absence of spatial organization. Overall, the new comparative figures and analyses are intended to make transparent that (i) iEEG robustly captures tonotopy for acoustic tones, and (ii) CI-evoked CI-evoked responses exhibit coarser, but statistically non-random, cochleotopic organization.
  
  Author response image 1.
  
  Here, we compare data from the new Supplemental Figure 1C,D with human data (N=1) for spatial & temporal tuning in the periphery, as assessed by forward masking ECAP measurements. A) Spatial tuning functions were averaged across all probe electrodes and 3 animals (left) and 1 human subject (right) (black, mean; gray: s.e.m..; orange, average of individual subjects). B) Temporal tuning functions were averaged across all probe electrodes and 3 animals (left) and 1 human subject (right) (black, mean; gray, s.e.m.; orange, average of individual subjects). Note: human subject is the first-author, a long-term cochlear implant user (>10 years) with significant open set speech perception.
  
  (2) The statistical approach is inappropriate where pairing is incomplete: a Student's paired two-tailed t-test is used despite not all data being paired; a linear mixed-effects model would be more suitable, whereas an unpaired test risks reduced power.
  
  We agree with this suggestion. As the reviewer notes (also raised by Reviewer 3), our original analyses did not fully exploit the partially paired structure of the data. In the initial submission we used paired t-tests when animals contributed both normal-hearing (NH) and CI measurements, which meant that animals with only NH or only CI data were excluded from those tests.
  
  To address this, we have re-analyzed all NH vs. CI comparisons using linear mixed-effects models that incorporate both paired and unpaired observations within a single framework. This approach allows us to (i) include all available animals, (ii) appropriately account for within-animal dependence when both conditions are present, and (iii) align the statistical tests with the data shown in the figures. In nearly all cases, the mixed-effects models confirm our original conclusions. Two comparisons that were previously non-significant are now significant in the positive direction: Fig. 2E (p = 0.048) and Fig. 6F (p = 0.027, linear mixed-effects models). We have updated the manuscript to report these values and to clarify the use of mixed-effects modeling in the methods under the section titled, “Linear mixed effects modeling.”
  
  (3a) Given the surgical complexity, objective verification of implantation and deafening is needed (e.g., eABRs for implant function and post-deafening ABR thresholds)”
  
  We agree that objective verification of both implant placement and deafening is critical, particularly given the surgical complexity of multichannel CI implantation in rats. Note that we previously extensively documented deafness in our cochlear implant rats with eABRs, histology of hair cell counts, and behavior (turning the implant off and seeing performance drop to chance). As we argued in Glennon et al. Nature 2023, the primary outcome measure and definition of deafness is behavioral, as anatomical and physiological markers are correlates of functional deafness but ultimately deafness must be defined in terms of behavioral performance. This is described in more detail below.
  
  We agree that objective verification of both implant placement and deafening is critical, particularly given the surgical complexity of multichannel CI implantation in rats. Note that we previously extensively documented deafness in our cochlear implant rats with eABRs, histology of hair cell counts, and behavior (turning the implant off and seeing performance drop to chance). As we argued in Glennon et al. Nature 2023, the primary outcome measure and definition of deafness is behavioral, as anatomical and physiological markers are correlates of functional deafness but ultimately deafness must be defined in terms of behavioral performance. This is described in more detail below.
  
  Implant placement: Our primary concern during surgery is to ensure that the CI array is correctly positioned along the cochlear spiral toward the apex. As shown in Author response image 2, once the bulla is opened and the cochleostomy is made at the junction of the temporal bone and the stapedial artery, the orientation of the cochlear spiral is clearly visible under the surgical microscope. We advance the 8-channel array only in the apical direction, and we require that all 8 electrodes pass through the cochleostomy. A complete insertion of all 8 electrodes cannot be achieved with a basal-ward trajectory, so full insertion provides a strong anatomical confirmation that the array is directed apically. The white band on the array, visible just basal to the cochleostomy (Author response image 2), serves as a consistent visual marker of complete insertion. We have added text and this figure to the Methods to clarify these criteria, “We required that all eight electrodes pass through the cochleostomy, confirming that the array was inserted in the direction of the apex.”
  
  Verification of deafening: We also share the reviewer’s concern about confirming profound hearing loss, particularly because some CI animals were presented acoustic tones to drive individual channels. We used the same mechanical-only deafening procedure described and validated in our previous work (King et al., 2016; Glennon et al., 2023), which was chosen to minimize systemic side-effects and maximize post-surgical survival, validated in three ways:
  
  - Histology: In N=4 deafened animals, inner hair cell loss was ~50% and outer hair cell loss was near complete at almost 100% in all animals.
  
  - Physiology: For N=14 rats, acoustic ABRs were substantial before deafening but statistically similar to baseline noise after deafening.
  
  - Behavior: For N=16 deafened rats, behavioral performance with implant on was d′: 1.7±0.1, but when implant was turned off in a subset of sessions, performance dropped to chance (d′: −0.05±0.1, P < 0.0001).
  
  Author response image 2.
  
  Visual confirmation of a successful electrode insertion. The direction of an 8-channel array being implanted toward the apex is clear under microscope. Full insertion of all 8 channels is further confirmed by the white band’s (located after basal electrode) proximity to the cochleostomy.
  
  This combination of histological, physiological, and behavioral evidence indicates that the mechanical-only deafening protocol produces profound hearing loss, with no functionally relevant residual hearing at intensities equal to or greater than those used in our study (70 dB SPL). Given this prior validation under identical surgical and experimental conditions, we are confident that our CI animals were effectively deafened and that the iEEG responses we report are driven by the implant rather than by residual acoustic hearing. We now clarify this in the Methods and explicitly cite our validation: “(mechanical only, as described and validated in Glennon et al. 2023).
  
  (3b) One CI animal did not learn the task (Fig. 1C), potentially reflecting implantation efficacy.
  
  Good point, thanks. For both humans and rats, cochlear implant performance can be highly variable, reflecting a number of factors in terms of device performance, training efficacy and motivation, or other technical or biological sources of heterogeneity. We note however that not all animals included in this study were behaviorally trained, and wanted to show the full range of variable performance for the subset of animals that were trained (N=4 typical hearing and N=3 cochlear implant rats, one of the 4 trained animals lost the implant before it could be re-trained on the cochlear implant version of the task). We now highlight this range of performance variability in the results section and explain why N=4 normal-hearing and N=3 cochlear implant rats.
  
  (4) The behavioural paradigm and cohort accounting are unclear: Figure 1C shows four NH-trained rats, yet subsequent analyses include only two NH-trained animals, which is confusing.
  
  We have now clarified the relation between the behavioral cohort and the iEEG cohort in the revised manuscript. The key point is that the animals in Figure 1C are defined by their behavioral training history (NH vs CI training), whereas inclusion in the iEEG analyses is defined by the specific stimuli collected during acute recordings, and these two categorizations are not always the same. In total, four rats underwent both iEEG recordings and behavioral training. Of these four, three were subsequently deafened, implanted with chronic CIs, and trained on the CI-driven task (Fig. 1C). With respect to the acute iEEG experiments, we obtained tone-only iEEG in 1 animal, CI-only iEEG in 2 animals, and both tone- and CI-evoked iEEG in 1 animal.
  
  Thus, the “NH-trained” label in Figure 1C refers to behavioral training status, not to the stimulus conditions used during iEEG recordings. All iEEG measurements were acute and performed immediately after surgery (for CI animals) or in the normal-hearing condition, before any CI behavioral training. Consequently, the behavioral cohort in Figure 1C is larger than the subset of animals that contributed to specific iEEG contrasts in later figures, which explains why some panels include only two NH animals.
  
  To clarify this, we have added a new Supplementary Figure 2 that provides a timeline for each animal, indicating when behavioral training occurred, when deafening and implantation occurred, and which stimulus conditions (tones vs CI) were used for each iEEG recording. We kept this figure in the Supplementary section because the focus of the manuscript is on evoked iEEG measurements rather than behavior, but the revised text now explicitly refers to this schematic when describing the cohorts “The combinations of animals that underwent behavioral training and acute iEEG measurements are shown in Supplemental Fig. 2.”
  
  (5) Methods lack essential details: specify acoustic stimulus types and intensities, CI stimulation parameters (e.g., current/charge per phase, phase width, rate, loudness setting), and the recording state (awake vs. anaesthetised), which is only implied in the discussion.
  
  We agree that these details are essential, and Reviewer 3 raised similar concerns about methodological clarity. We have now expanded the Methods to specify the acoustic stimuli, CI stimulation parameters, and recording state.
  
  Acoustic stimuli: We now describe the acoustic stimulus set in the Methods, which references Insanally et al. (2016). Briefly, tones were pure sinusoids spanning frequencies from 1.4 to 32 kHz (half octave spaced), presented at 70 dB SPL with a duration of 50 ms with 2ms cosine-squared ramps and at a pseudorandom sequence of 1.25 Hz. These parameters are now updated in the methods under “Stimulus presentation for cortical sensory mapping in normal hearing rats.”
  
  CI stimulation parameters: CI stimulation used standard clinical-style monopolar mappings. We now specify in the Methods that pulses were biphasic, charge-balanced, with 8 µs interphase gaps and 25 µs /phase (total pulse width = 58 µs); stimulation rate was 900 pulses per second (pps); and current amplitude (and thus charge per phase) was set individually for each electrode based on its ECAP threshold. All stimulation levels were within normal and safe limits: charge densities remained below the Shannon limit and within the electrochemical “water window.”
  
  Loudness setting: In this study, CI stimuli were presented primarily at a single level—each electrode was stimulated at its ECAP threshold level for the tone-to-CI mapping experiments. We have added these details in the methods under the “Stimulus presentation for cortical sensory mapping in cochlear implanted rats” subsection.
  
  Recording state: All iEEG recordings reported in the manuscript were acute and performed under anesthesia. This is now stated explicitly at the start of the Methods section.
  
  (6) Plasticity and training effects warrant further consideration: although the manuscript reports no difference between naïve and trained rats, Figure 3 suggests greater across-trial variability for CI than NH that is not evident in the trained subset; examining relationships among behavioural performance, decoder performance, across-trial variability, and training duration would strengthen interpretation.
  
  We agree that plasticity and training effects are central questions for cochlear implant research and that iEEG is well suited to study how cortical representations evolve with CI use. However, the current dataset was collected mainly to compare cortical encoding of acoustic versus CI stimulation under matched, acute conditions (not necessarily after behavioral training with the implant, and we note that most studies of physiological responses to cochlear implant function in non-human species also do not incorporate aspects of training). All CI-evoked iEEG recordings were obtained immediately after implantation, before any CI-based behavioral training. As a result, any training effects reflected in the iEEG data can only arise from prior normal-hearing training, not from experience with CI stimuli themselves. Only a small subset of animals (N = 3 of 10) underwent behavioral training with cochlear implants, and their training histories (duration, performance levels, CI hardware status) are not uniform. This yields insufficient statistical power to meaningfully examine correlations among behavioral performance, decoder performance, across-trial variability, and training duration. While we note the reviewer’s observation that across-trial variability appears qualitatively different in the small, trained subset, we do not believe the current data justify strong conclusions about training-related plasticity.
  
  (7) Differentiating the CI rats stimulated directly or through the microphone of the speech processor -at least in the figures - would be useful to allow the reader to assess whether both stimulation strategies give rise to similar results.
  
  We agree that it is important to distinguish between rats stimulated directly via CI hardware and those stimulated acoustically through a speech processor. We now show in new Supplementary Figure 2, which animals received direct electrical stimulation and which were driven acoustically through the processor microphone. We also now plot tonotopic and cochleotopic maps for all CI animals in Supplementary Figure 3, with the stimulation mode indicated for each animal. As discussed in our response to comment #2 of Reviewer 3, we also provide validation that acoustic tones can be used to selectively drive individual electrodes via the speech processor. However, the sample sizes for the two stimulation strategies are small (N = 4 rats with direct CI stimulation, N = 3 rats with acoustic CI stimulation). For this reason, we have chosen not to draw strong statistical conclusions about differences between direct vs acoustic CI stimulation in the present manuscript.
  
  (8) Typographical error at the end of the introduction ("To this end we have designed and manufactured..."), and in the first paragraph of the Discussion ("...that both that...").”
  
  Thanks, we have updated the manuscript accordingly.
  
  (9) Inconsistent terminology: use a single form (e.g., "normal-hearing") throughout.
  
  Good suggestion, thanks. We have updated all main manuscript to only use normal-hearing. We found and changed two instances in which we used the acronym NH in lieu of normal-hearing, once early in the results section and once in the legend for Figure 3.
  
  (10) In Figure 3D (temporal), there appears to be an extra data point for the NH-trained group.
  
  Thank you for flagging this mis-labeling, which Reviewer 3 also pointed out. We have switched the appropriate data point in Figure 3D from ‘trained’ to ‘naïve’.
  
  (11) In Figure 4D, the yellow line is not defined; based on Figure 6D, it likely represents shuffled/chance performance and should be labeled accordingly (including beneath the chance line on the plots).
  
  We have updated Figure 6 to indicate that the yellow line does indeed reflect shuffled/chance.
  
  (12) Figure 8 would benefit from a control demonstrating that poor cross-modal decoding reflects train-test distribution differences rather than weak decoders (e.g., train on a subsample of NH and test on held-out NH), and from reporting decoding on raw ERP/HG features in addition to TCA-derived data.
  
  Good suggestion, thanks; we have now added this control. We agree that a positive control is necessary to show that poor tone→CI decoding reflects differences of underlying representations rather than a failure of the decoder or modeling approach. (Reviewer 2 raised the same point.)
  
  To validate our cross‑modal analysis pipeline, we re‑implemented the full procedure used in Figure 8, but instead of training on tone‑evoked responses and testing on CI‑evoked responses, we trained and tested on independent sets of tone‑evoked trials from the same animals (tone→tone). For each tone in each animal, we withheld 10 trials as a test set. Using the remaining trials, we fit the original TCA model to obtain spatial and temporal factors (Fig. 8A). We then fixed these factors and re‑optimized only the trial factors on the withheld tone‑evoked trials (Fig. 8B). The LDA decoder was trained on the trial factors from the original TCA fit and tested on the re‑optimized trial factors from the withheld trials, using the same classification pipeline as in the main analysis.
  
  As shown in the top panels of Figure 8C,D, this positive control yielded robust tone→tone generalization: predicted tone frequencies closely matched the actual tones, decoder performance was significantly above chance, and prediction errors were tightly clustered around the true stimulus, indicating that the decoder was tuned to tone frequency. In contrast, when we trained on tone‑evoked responses and tested on CI‑evoked responses, information transfer was markedly reduced (Fig. 8E-G).
  
  These results demonstrate that the TCA+decoder pipeline can reliably transfer information across independent tone‑evoked datasets, confirming that the method captures shared structure when it exists. The poor cross‑modal transfer between tone‑ and CI‑evoked activity therefore is unlikely to be due to a weak decoder or to a failure of the modeling pipeline, but instead reflects a genuine mismatch between CI and sound representations in auditory cortex. We have updated Figure 8 and the Results section to describe this positive control analysis and clarify the interpretation.
  
  (13) Perception and interpretation of signals are mentioned several times in the introduction, although perception is not explored in the manuscript (only neuronal processing). This might be confusing.
  
  We appreciate the need to distinguish between neuronal encoding and perception. We also feel we have been careful not to invoke relationships to perception when presenting analyses on iEEG measurements, but we did identify an opportunity to further clarify this distinction between neuronal processing and perception by adding text in the intro, as follows “for the auditory system to interpret patterns of evoked neural activity and inform downstream auditory areas.”
  
  (14) Figure 1C. Why is the performance of CI rats so much lower than what was previously published (Glennon et al., 2023)? Did the training duration change?
  
  The three animals that were behaviorally trained on the normal-hearing (pre-deafening) and cochlear implant task (post-deafening) are within the distribution of the full set of animals from Glennon et al. (2023). However, we note that for Glennon et al. (2023), as one of our behavioral criterion was days to d’ > 1, animals were trained daily until reaching that level and not included in the initial data set if they did not reach that level. However, as we were including animals in this study of iEEG responses that were not trained at all, we felt it appropriate to include this third animal as well, that was trained just for 3 days before recordings were made. The two other animals were trained for 9 and 13 days. We have now included this information in the methods.
  
  (15) The p-values = 0.5 should be given with an additional digit.
  
  We previously rounded to the nearest single decimal digit, for all p-values greater than 0.10. We have updated the figures and manuscript text to ensure precision at least to the second digit.
  
  Reviewer #2 (Recommendations for the authors):
  
  We thank the Reviewer for their thoughtful comments on our study.
  
  (1) Less noisy recording methods based on spike detection would provide stronger claims.
  
  We agree that spike recordings, particularly isolated single-unit activity, are powerful for testing hypotheses about sensory encoding in auditory cortex, and we plan to incorporate such approaches in future work. However, our decision to use iEEG arrays in the present study was deliberate and central to the scientific and translational goals of the project.
  
  First, iEEG and related population-level approaches such as scalp EEG (e.g., Lalor and Foxe, 2010; O’Sullivan et al., 2015) and fNIRS (e.g., Bortfeld et al., 2009; Peelle, 2017) are widely used in humans and have been highly successful in decoding sound- and speech-evoked responses, revealing fundamental principles of how sound and speech are encoded in the human brain. Because speech is uniquely human and cochlear implants are primarily designed to restore speech perception, aligning our recordings with clinically relevant, human-used modalities enhances the translational relevance of our work.
  
  Second, iEEG arrays provide distinct advantages over modern multi- and single-unit electrophysiology. Even with high-density probes, the spatial sampling of neuronal activity does not match the coverage of the 60-channel iEEG arrays used here, which span large extents of auditory cortex. One might instead consider optical methods such as calcium imaging to interrogate topographical encoding at single-neuron and mesoscale resolutions, as has been done in normal-hearing mice (Romero and Hight et al., 2019). However, calcium signals are intrinsically slow, limiting access to the temporal precision that is critical for CI encoding, and these tools are unlikely to be available in humans in the foreseeable future, substantially reducing their translational value.
  
  Using iEEG arrays, we show that CI-evoked responses are topographically organized, consistent with prior work (Klinke et al. 1999, Bierer and Middlebrooks 2002, Middlebrooks and Bierer 2002, including Adenis et al., 2024 now referenced in the manuscript). Our study extends these findings by exploiting simultaneous recordings across both spatial and temporal domains, which are essential for several key analyses (Figs. 3-8), including quantification of trial-by-trial variability, decoding of stimulus identity from single trials, and cross-modal comparisons between normal-hearing and CI-evoked iEEG responses.
  
  Thus, we believe that the strength of this study is due to, rather than in spite of, its use of iEEG arrays. This approach uniquely allows us to test hypotheses about CI encoding across cortical topography and time using a modality that is directly translatable to human research and clinical practice. In response to the reviewer’s concern, we have also (i) improved the statistical treatment of our data (by adopting linear mixed-effects models that incorporate both paired and unpaired observations), (ii) added additional positive controls (see response to comment #2), and (iii) collected new data that further validate our rodent CI model. Together, these additions strengthen the support for our conclusions while preserving the key advantages of the iEEG-based approach.
  
  (2) A positive control is necessary to claim the mismatch between CI and sound representations.
  
  We agree. We now have added a positive control specifically designed to validate our cross-modal analysis pipeline in our revised manuscript. As also suggested by Reviewer 1, the goal was to test whether our method can successfully transfer information when the training and test datasets are matched in modality (tone→tone), thereby ensuring that the observed failure of cross-modal transfer (tone→CI) is not an artifact of the analysis.
  
  To do this, we re-implemented the full pipeline used in Figure 8, but instead of training on tone-evoked responses and testing on CI-evoked responses, we trained and tested on independent sets of tone-evoked trials from the same animals. For each tone in each animal, we withheld 10 trials as a test set. Using the remaining trials, we fit the original TCA model to obtain spatial and temporal factors (Fig. 8A). We then fixed these factors and re-optimized only the trial factors on the withheld tone-evoked trials (Fig. 8B). The LDA decoder was trained on the trial factors from the original TCA fit and tested on the re-optimized trial factors from the withheld trials, using the same classification pipeline as elsewhere in the manuscript.
  
  As shown in the top panels of Figure 8C,D, this positive control yielded robust tone→tone generalization: predicted tone frequencies closely matched the actual tones, decoder performance was significantly above chance, and prediction errors were tightly clustered around the true stimulus, indicating that the decoder was tuned to tone frequency. In contrast, when we trained on tone-evoked responses and tested on CI-evoked responses, information transfer was markedly reduced and not different from shuffled controls (Fig. 8E-G).
  
  These results demonstrate that the TCA+decoder pipeline can reliably transfer information across independent tone-evoked datasets, confirming that the method captures shared structure when it exists. The poor cross-modal transfer between tone- and CI-evoked activity therefore cannot be attributed to a failure of the modeling pipeline but instead reflects a mismatch between CI and sound representations in auditory cortex. We have updated Figure 8, the methods, and the results section to include this new important analysis.
  
  Reviewer #3 (Recommendations for the authors):
  
  We thank reviewer 3’s appreciation for study design and the appropriateness of analyses taken. We also appreciate the recognition of noteworthiness, specifically that stimulus identity can be decoded on a single-trial basis and of the potential benefit of using central decoders in clinical settings.
  
  (1a) Animal heterogeneity: It is difficult to keep track of the animals used in this study, and some received a different protocol of stimulation (sounds through the speech processor vs. direct stimulation) and were also trained in a behavioral task using different target stimuli (4kHz vs. 22.6kHz, also no mention of the CI electrode used as a target).
  
  We have now clarified the animal cohorts and stimulation protocols in our revised manuscript. We added a new Supplementary Figure 2 that schematizes, for each animal if it underwent behavioral training with pure tones in the normal-hearing condition, if tone-evoked iEEG measurements were collected, if CI-evoked iEEG measurements were collected (and whether stimulation was direct or via the speech processor), and if it subsequently received CI-based behavioral training. Regarding the behavioral targets, we now specify in the Methods that for normal-hearing training, the target stimulus was a 22.6-kHz pure tone. For CI-trained animals, the target was either CI channel 3 (n = 2 rats) or CI channel 4 (n = 1 rat). Details about stimuli targets during behavior have been added to the methods section under “Behavioral training for tone and implant channel detection.”
  
  (1b) There is no comparison of the CI maps from rats tested with the speech processor and directly stimulated. How different were they? Was the frequency allocation of each electrode the same for each animal? Since data might already have intrinsic variability because of the grid placement, the mechanical deafening, and the cochlear implantation in each animal, such heterogeneity in the 'background' and stimulation protocol might blur the authors' results.
  
  Our study focuses on cortical encoding of single-channel CI stimulation, so it is indeed important to ensure that the stimuli are effectively delivered by a single electrode, regardless of whether they are driven acoustically via the speech processor or by direct electrical stimulation.
  
  Stimulation mode and frequency allocation: The project began with single-channel stimulation achieved by presenting pure tones to the speech processor (N=3 animals) and later transitioned to direct programmatic control of individual electrodes (N=4 animals) to simplify the experimental setup. In both cases, the goal was to activate only one CI channel at a time.
  
  For the programming speech-processor animals, the validation protocol described in Glennon et al. (2023) is as follows:
  
  - Set the number of active channels in the processor to 1 (the clinical default is 8) to avoid spectral spread across electrodes.
  
  - Disabled all additional signal-processing strategies (e.g., Scan, ASC, ADRO, SNR-NR, WNR).
  
  - Used customized frequency allocation tables that mapped narrow frequency bands to individual electrodes, as shown in Glennon et al., 2023, Extended Data Fig. 2.
  
  To confirm that a given tone drove only the intended electrode, we recorded tone-evoked electrodograms—measurements of the output at each electrode—and verified that only the targeted channel was active (Glennon et al., 2023, Extended Data Fig. 2). Thus, although the initial CI drive was acoustic, the effective stimulation at the array was restricted to a single electrode with a well-defined frequency allocation.
  
  For the direct-stimulation animals, we used the same underlying frequency allocations to choose which electrode to stimulate, but the pulses were delivered programmatically rather than via the speech processor. In both modes, the center frequency associated with each electrode was therefore defined consistently across animals, and stimulation was confined to one channel at a time.
  
  Comparison of maps across stimulation modes: We now explicitly indicate the stimulation mode (speech-processor vs direct) for each CI animal in Supplementary Figure 2 and plot the maps for all animals in Supplementary Figure 3. Qualitatively, the spatial organization of CI-evoked maps is similar across the two stimulation strategies; we do not observe systematic differences in map structure that would suggest large biases introduced by the stimulation mode. However, the sample sizes for each group are small (N = 3 speech-processor, N = 4 direct). For this reason, we have not performed formal between-mode statistics and instead treat stimulation mode as a source of minor heterogeneity, alongside inevitable variability from grid placement, mechanical deafening, and cochlear insertion. Given the electrodogram validation (Glennon et al., 2023, Extended Data Fig. 2) and consistent frequency allocation tables, we are confident that both approaches produce single-channel activation with comparable effective frequency assignments.
  
  (1c) The number of animals used is also confusing. The authors report 7 NH and 7 CI animals (14 total), 4 NH and 3 CI were trained before being implanted (so 3 naïve NH and 4 naïve CI remain). Figure 1C reports that only 3 trained NH performed with the CI (let us call them 3 NH->CI). But then Figure 1E reports only 1 trained NH->CI and only 1 trained NH and 3 naïve NH that got implanted later. On the other hand, Figure 1E reports only 1 true naïve CI animal, the 3 others being naïve NH that got implanted. For the sake of clarity, I would encourage the authors to provide a timeline of the procedures/stimulation protocols coupled with a schematic distribution of the animals.
  
  To address this, we have added a new Supplementary Figure 2 that provides, for each individual animal a chronological timeline (NH recordings, deafening, implantation, CI recordings); if it was behaviorally trained in the NH condition, the CI condition, or both; if CI stimulation was delivered via the speech processor or by direct electrical stimulation; and which stimulus conditions (tone-evoked iEEG, CI-evoked iEEG) were collected. This schematic makes it clear how the reported totals arise (7 NH and 7 CI for iEEG; 4 NH-trained and 3 CI-trained behaviorally) and shows which specific animals contribute to each panel in Figure 1 and to the later iEEG analyses. We now reference Supplementary Figure 2 in the Results when introducing the cohorts to guide readers through animal accounting.
  
  (2a) Methods and statistics: Deafening is only mechanical, with no direct or postmortem proof that deafening was complete. The authors cite previous studies, but that would have been a good control to have since mechanical deafening isn't as accepted as the chemical deafening, like Neomycin, especially when some of your animals were stimulated with pure tones through the speech processor.”
  
  We agree that rigorous verification of deafening is essential, particularly when some CI animals are driven acoustically through the speech processor. Ototoxic approaches (e.g., systemic or local neomycin) are one established method, but their effectiveness can be sensitive to dose and delivery, and they introduce systemic side-effects that can complicate long-term survival and recovery.
  
  Our laboratory has used the mechanical deafening procedure since it was first described in King et al. (2016) and more recently in Glennon et al. (2023). In King et al., mechanical and ototoxic methods were combined, and we found that ototoxic methods provided no more additional robustness in deafening compared to mechanical lesion. Instead, the additional time required for ototoxic drug application reduced survival times in what was already a very complex and long surgical procedure for bilateral deafening and unilateral cochlear implantation.
  
  In Glennon et al. (2023) we intentionally employed mechanical-only deafening to minimize side-effects while still achieving profound hearing loss in implanted animals. Glennon et al. (2023) provides an extensive validation of this mechanical-only protocol under the same surgical and experimental conditions as the present study. As we mentioned in our response to comment #3a of Referee 1, we assessed deafness through three measures:
  
  Histology: In N=4 deafened animals, inner hair cell loss was ~50% and outer hair cell loss was near complete at almost 100% in all animals.
  
  Physiology: For N=14 rats, acoustic ABRs were substantial before deafening but statistically similar to baseline noise after deafening.
  
  Behavior: For N=16 deafened rats, behavioral performance with implant on was d′: 1.7±0.1, but when implant was turned off in a subset of sessions, performance dropped to chance (d′: −0.05±0.1, P < 0.0001).
  
  This convergent anatomical, physiological, and behavioral evidence demonstrates that the mechanical procedure produces profound deafness, with no functionally relevant residual hearing at levels ≥90 dB SPL. Also as we mentioned in response to comment #3a of Referee 1, we believe that the behavioral criterion is most essential and also least common in the literature. Because the tones used to drive the speech processor in the current study were presented at 70 dB SPL, we have no reason to believe that residual acoustic hearing contributed to any of the CI-evoked responses we report.
  
  We now cite these validation data explicitly in the methods under the section “Bilateral sensorineural hearing loss” as follows “(mechanical only, as described and validated in Glennon et al. 2023)” to make clear why we consider the mechanical-only approach sufficient for ensuring deafness in the present experiments.
  
  (2b) What motivated the selection of 15 Principal Components for the PCA? That might need to be justified, maybe by scree plot or variance plot (Eigen Values or CEV), as if too many PCs are selected, you are at risk of losing information. Side comment for TCA: why is it important that the number of latent factors exceeds the number of tones or stimuli? Is there a way to justify this statement?
  
  We thank the reviewer for raising this point. Our choice of 15 components/latent factors was motivated by both theoretical and empirical considerations, which are now made explicit in the manuscript.
  
  For the PCA analyses, we selected 15 principal components for two reasons. First, because our decoder must discriminate between 10 tone conditions, we reasoned that providing at least as many dimensions as stimuli would be beneficial, while also allowing for the possibility that some components may carry little or no stimulus-selective information. We therefore chose a modest number of components that exceeded the number of tones (10) but avoided unnecessarily high dimensionality. Second, we empirically examined the variance explained as a function of the number of components. As shown in the new scree plots (Supplemental Fig. 4A), the cumulative variance explained enters a near-linear, low-slope regime beyond ~15 PCs, indicating diminishing returns for including additional components. Thus, 15 PCs capture a substantial fraction of the stimulus-related variance while minimizing the risk of overfitting and retaining a consistent dimensionality across animals.
  
  For the TCA analyses, we used 15 latent factors to match the dimensionality used in PCA and to ensure that the latent space was sufficiently flexible to represent the 10 tone conditions without being under-parameterized. In practice, increasing the number of TCA components reduces reconstruction error (Williams et al., 2018), but with diminishing improvement beyond a certain point. We therefore systematically evaluated model error as a function of the number of latent factors and found that error decreased rapidly up to ~15 components and then plateaued (Supplemental Fig. 4B). This pattern parallels the PCA scree plots and supports 15 as a reasonable trade-off between model flexibility and parsimony.
  
  We have updated the Results clarify these choices, as follows “The number of components (15) was chosen based on PCA scree plots (Supplemental Fig. 4A), which showed that explained variance entered a near‑linear, low‑slope regime beyond this point demonstrating a similar plateau in reconstruction error (Supplemental Fig. 4B).”
  
  (2c) Legend of Figure 2E, J states that a Student's paired t-test was used, meaning that only the 'linked' points of the graph were used (thus, comparing only animals that got tested NH then implanted). This is usually the same across the manuscript. Why not include all the points with an unpaired t-test? Otherwise, why are all the points plotted if they serve no purpose? This choice should be justified.
  
  We agree with this concern, which was also raised by Reviewer 1. We have revised our statistical approach accordingly in our revised manuscript. In the original submission, we used paired t-tests when animals contributed both normal-hearing (NH) and CI data, which meant that animals with only NH or only CI measurements were excluded from those comparisons even though they were shown in the plots.
  
  To address this, we have re-analyzed all normal-hearing vs. CI comparisons using linear mixed-effects models that include both paired and unpaired data within a single framework. This approach ensures that every plotted data point contributes to the statistical tests, properly accounts for within-animal dependence when both conditions are present, and avoids the loss of power that would arise from either paired-only or purely unpaired tests.
  
  The mixed-effects results are consistent with our original interpretations, with two comparisons becoming significant in the updated analysis: Fig. 2E (p = 0.048) and Fig. 6F (p = 0.027). We have updated the Results and figure legends to describe the use of mixed-effects models and to report these revised p-values. Together with the new tonotopy and cochleotopy analyses described above, these changes strengthen the statistical support for our conclusions without altering the overall interpretation of the data.
  
  (2d) Side comment: There are inconsistencies on the bar plots of Figure 6C (Missing a purple point) and Figure 3D (Temporal has 3 purple points).
  
  Thank you for flagging this mis-labeling (which Reviewer 1 also noticed). We have correctly updated the appropriate data point from trained to naive for Fig. 3D and from naive to trained for Fig. 6C.
  
  (3a) Pure tones and CI-evoked responses maps: It is the reviewer's understanding that Figure 2 is an averaged representation for all animals. Why is the tonotopic shift so dim for ERPs? The averaged maps aren't very convincing. How were the gradients on an animal-to-animal basis since Figure 2D is only an example animal? Also, everything has been evaluated at 70dB, where selectivity might not be best. It would have been easier to follow the tonotopic gradient at the CFs where contrasts are higher.
  
  We agree that the strength and interpretation of tonotopy/cochleotopy in our iEEG data needed to be presented more clearly. Reviewer 1 raised closely related concerns, and we have substantially expanded the analyses and explanations in response. Here we highlight the points that address your specific questions.
  
  Single-animal vs. averaged maps: We included both exemplar maps and population summaries in Figure 2. The panels analogous to Figure 2D show single-animal best-frequency (BF) or best-channel maps; these were chosen because they exhibit clear, interpretable gradients. In the exemplar shown, there is a local high-frequency (HF) region along the medial edge of the array that transitions to lower frequencies toward the rostral edge. For CI-evoked best-channel maps in the same animal, we observe a parallel pattern in which basal electrodes (e.g., electrode 8, representing higher frequencies) occupy the HF region and apical electrodes (e.g., electrode 1, lower frequencies) occupy the LF region.
  
  Averaged ERP maps, by contrast, necessarily blur some of this structure because iEEG is a summed field potential and animal-to-animal differences in array placement, cochlear insertion depth, and anatomy introduce variability. We have softened the language in the text to reflect that ERP-based tonotopy is coarse and weaker at the population level, while emphasizing that robust gradients are evident in single animals and in HG-based measures.
  
  Quantitative assessment across animals: To move beyond visual impressions, we added quantitative analyses that mirror those used in Romero and Hight et al. (2020) for calcium imaging data (Romero and Hight et al. 2020 and Fig. 2). For each map we computed local tonotopic gradient vectors at every pixel and summarized their magnitude/direction on a unit circle, then compared the mean vector strength to shuffled maps. Applied to our BF and best-channel maps, this analysis shows that both are significantly more ordered than shuffled controls (p < 10<sup>-10</sup>), indicating that the maps are tonotopic/cochleotopic rather than random, despite the apparent dimness of the gradients in some averaged ERP plots. These new results are described in the revised manuscript and shown in Romero and Hight et al. 2020 and Fig. 2.
  
  Effect of intensity (70 dB SPL) and “dim” gradients: We agree that stimulus level influences the apparent sharpness of tonotopy. Higher intensities tend to broaden tuning and compress the dynamic range of BF maps. As we now discuss in more detail (adapted from our response to Reviewer 1), tones were presented at 70 dB SPL, so we expect maps to emphasize mid-frequency regions (around 8 kHz) and to show somewhat broader tuning than maps derived at threshold. For CI stimulation, we used ECAP thresholds to set intensity, which is effective in our preparation because animals can robustly discriminate individual electrodes and these electrodes evoke clear cortical activity (King et al., 2015; Glennon et al., 2023).
  
  In summary, we clarified which panels in Figure 2 show single-animal exemplars vs population summaries, added quantitative analyses demonstrating spatial correlations are greater for adjacent stimuli compared to far-apart stimuli, and expanded the discussion of how recording modality and stimulus level influence the visibility of tonotopic gradients. These changes are intended to make the evidence for tonotopy/cochleotopy in our iEEG data (and its limitations) more transparent.
  
  (3b) Since new experiments might not be available, it is the reviewer's suggestion to add a supplementary figure showing a couple of animal examples following the format of Figures 2A and 2C that have more contrasted gradients to strengthen the group data. In the case of the CI-evoked responses map, this might also provide another argument to dismiss the potential monopolar smearing.
  
  Good suggestion, thanks. We now include a new Supplementary Figure 3 that shows additional single-animal examples for both tone-evoked and CI-evoked maps, following the same format as Figure 2C.
  
  Regarding monopolar stimulation, we agree that monopolar configurations are expected to be less spatially specific than bipolar or multipolar modes because current returns to an extracochlear reference electrode, potentially broadening the spread of excitation. We nevertheless chose monopolar stimulation because it is the predominant clinical configuration in human CI users and therefore most relevant for translational purposes. We acquired ECAP measurements of peripheral (spatial and temporal) tuning via a forward masking paradigm and demonstrate that monopolar is effectively tuned (Supplemental Fig. 2). Together with additional single-animal maps in Supplementary Figure 3, together with our vector-strength analysis (Romero and Hight et al. 2020 and Fig. 2), demonstrate that even under acute monopolar stimulation we observe structured cochleotopic organization in cortex, rather than the fully smeared patterns one might expect if monopolar spread completely dominated.
  
  We also note that all CI-evoked iEEG measurements were made acutely, immediately after implantation and before any CI-based behavioral experience. It is possible that with longer-term use and plasticity, cortical cochleotopy could become sharper than what we observe here under acute conditions. In this sense, our data provide a conservative baseline showing that even at the earliest stages of CI use, monopolar stimulation already engages tonotopically selective regions of auditory cortex. A longitudinal comparison of acute versus chronic maps would be an interesting direction for future work but is beyond the scope of the current study.
  
  (3c) Side comments: The legends of Figures 2D and 2I should mention that this is an animal example and not group data, as the rest of the figures are group data.
  
  Thank you for this suggestion to improve figure clarity. We have updated all of our figures, where appropriate, to indicate whether data are single or groups of animals.
  
  (3d) In general, some of the legends should be revised because they are sometimes too "strong". As an example, Figure 3B, D legend states: "Variability of iEEG measurements across trials (root mean square, rms) was consistently higher for cochlear implant-evoked compared to tone-evoked activity", despite three of the statistical tests being non-significant. The manuscript is correct, on the other hand.
  
  Good point. We revised the legend for Figure 3 to be consistent with the figure and the manuscript.
  
  (3e) The example spatial map given in Figure 3A for CI might not be the best choice since it is showing a pretty reliable trial-by-trial response, while your group data proves the opposite.
  
  We understand the reviewer’s concern and agree that the exemplar CI map in Figure 3A appears relatively reliable on a trial-by-trial basis. This example was chosen deliberately from an animal in which we had both NH- and CI-evoked iEEG recordings, so that the reader could visually compare the two conditions within the same preparation. In this animal, as in the group data, the differences between NH and CI trial-by-trial responses are subtle rather than dramatic.
  
  Our group-level analysis shows that the RMS error across trials is consistently higher for CI-evoked than for NH-evoked responses, but the absolute differences are small (< 0.1) and relatively uniform across animals. The spatial maps plotted in Figure 3A are representative of this pattern: both conditions show reasonably robust evoked responses, with CI responses nonetheless showing slightly greater variability. To avoid implying a stronger qualitative difference than is supported by the data, we have revised the text to emphasize that (i) CI-evoked responses remain clearly detectable on single trials, and (ii) the key effect is a small but consistent increase in variability across animals, as captured by the RMS error metrics, “We noted that the differences were qualitatively subtle (Fig. 3A, right panel), they were consistent across animals (Fig. 3B).”
  
  (4a) Decoders for CI stimulation Regarding CI stimulation, Pearson's correlations were truncated at a spacing of 5 electrodes. Likewise, none of the LDA classifiers show prediction for channels past CI-6. Again, that choice should be justified, or the missing channels should be presented.
  
  We truncated the correlation between electrodes at 5 because beyond that, the estimated means are significantly noisy. These estimated means are noisy because the number of data are significantly reduced, also significantly increasing the standard error. For example, for the maximum stimulus spacing, the number of pairwise correlations is at maximum the number of animals tested (i.e., N=7). We believe it’s important to be transparent, so we have included the non-truncated version of the figure here in this public review (Author response image 3). We leave the figures in the manuscript untouched but have updated the Figure 2 legend justify this selection of data.
  
  Author response image 3.
  
  Expanded figures for spatial correlations and LDA performance. A) The same data from manuscript Figure 2 are re-plotted but with expanded x-axes to include up to 4.5 octaves and 7 channels. Due to the smaller numbers of data at these points, the estimates for the mean spatial correlations are noisier. In all cases, the mean correlations are significantly higher for the first data point compared to the last 3 (NH, ERP p<0.001; NH, HG p<0.001; CI, ERP p=0.005; and CI, HG p=0.39, linear mixed effects models). B) The same data from manuscript figure 4 are re-plotted but with expanded x-axes to include up to ±3.5 octaves and ±6 channels.
  
  (4b) Finally, retrained PCA-LDA on spatial-only and temporal-only for CI are absent in Figure 3D. Since the authors were pretty consistent in showing both NH and CI alongside in the rest of the paper, it would be coherent to add the CI counterpart to Figure 3D, or maybe with a supplementary figure.
  
  We agree that consistency can be improved by including classifiers for CI-evoked measurements, though presumably for Fig. 6C and not Fig. 3D. Figure 6 has been updated accordingly.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.01.668170v2
www.biorxiv.org www.biorxiv.org

Linking Germline Telomere Removal to Global Programmed DNA Elimination in Tetrahymena Genome Differentiation

1
1. Public_Reviews 24 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  (1) We bioinformatically examined the repeat compositions of MLSs (Figure 3B), which clearly indicated that all MLSs are composed of repetitive sequences to a much greater extent than the rest of the genome.
  
  (2) We confirmed the blockage of chromosome breakage by the 4R-CBS mutations using a telomere-anchored PCR assay (Figure 5C-E).
  
  (3) We examined the effect of the 4R-CBS mutations on the expression of genes encoded in 4R-MDS by RNA-seq (Figure 9). This analysis unexpectedly revealed that gene expression from 4R-MDS is not significantly affected in the mutants, allowing us to extend our discussion.
  
  (4) We added two authors, Alix Lemoine and Tomoko Noto, who performed the experiments for these revisions.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this study, Nagao and Mochizuki examine the fate of germline chromosome ends during somatic genome differentiation in the ciliate Tetrahymena thermophila. During sexual reproduction, a new somatic genome is created from a zygotic, germline-derived genome by extensive programmed DNA elimination events. It has been known for some time that the termini of the germline chromosomes are eliminated, but the exact process and kinetics of the elimination events have not been thoroughly investigated. The authors first use germline-specific telomere probes to show that the loss of these chromosome ends occurs with similar timing as other DNA elimination events. By comparative analysis of the assembled germline and somatic genomes, the authors find that the ends of each of the germline chromosomes are composed of a few hundred kilobases of micronuclear limited sequences (MLS) that are removed starting around 14 hours after the start of conjugation, which initiates sexual development. They then develop an in situ hybridization assay to track the fate of one end of chromosome 4 while simultaneously following the adjacent macronuclear destined sequence (MDS) retained in the new somatic genome. This allows the authors to more clearly show that these adjacent chromosomal segments are initially amplified in the developing genome before the terminal MLS is eliminated. Finally, they mutate the chromosome breakage sequence (CBS) that normally separates the MLS terminus from the adjacent MDS region, to show that strains that develop with only one mutant chromosome can produce viable sexual progeny, but it appears that both the MLS and the MDS from the mutant chromosome are lost. If both chromosome copies have the CBS mutation, the cells arrest during development and do not eliminate many germline-limited sequences and fail to produce viable progeny. Overall, this study provides many new insights into the fate of germline chromosome ends during somatic genome remodeling and suggests extensive coordination of different DNA elimination events in Tetrahymena.
  
  Strengths:
  
  Overall, the experiments were well executed with appropriate controls. The findings are generally robust. Importantly, the study provides several novel findings. First, the authors provide a fairly comprehensive characterization of the size of the MLS at the end of each germline chromosome. I'm not sure whether this has been published elsewhere. Second, the authors develop a novel method to study the fate of chromosome termini during development and use it to conclusively track the elimination of these termini. Third, the authors show that the elimination of these termini appears to occur concurrently with most other DNA elimination events during somatic genome differentiation. And fourth, the authors show that failure to separate these eliminated sequences from the normally retained chromosome alters the fate of these adjacent MDS and the loss of the cells' ability to produce viable progeny.
  
  Weaknesses:
  
  It appears the authors did extensive analysis of the MLS chromosome ends, but did not provide too much information related to their composition. If this has not been published elsewhere, it would be useful to describe the proportion of unique and repetitive sequences and provide more information about the general composition of the chromosome ends. Such information would help the reader understand the nature of these MLS and how they may or may not differ from other eliminated sequences.
  
  We now calculated the proportions of unique and repetitive sequences for each MLS, and these data are included in Figure 3B and described in the main text of the revised manuscript. A more comprehensive analysis of chromosome-end composition, including detailed characterization in the context of the complete MIC genome assembly, is beyond the scope of the current study and will be presented in a future publication.
  
  Although the development of the novel FISH probes for large chromosome ends allowed for these novel discoveries, the signal in several images was visible, but often quite faint. I'm not sure there is anything the authors could do to improve the signal-to-noise ratio, but one needs to stare at the images carefully to understand the findings.
  
  We have submitted higher-resolution images for the revised manuscript, which we believe much improve the visibility of faint signals.
  
  One main weakness in the opinion of this reviewer is that the authors did very little to understand why, when a terminal MLS and the adjacent MDS fail to get separated because of failure in chromosome breakage, both segments are eliminated. The authors propose that possibly essential genes in the MDS get silenced, and the resulting lack of gene expression is the issue, but this and other possibilities were not tested. The study would provide more mechanistic insight if they had tried to assess whether the MDS on the CBS mutant chromosome becomes enriched in silencing modifications (e.g., H3K9me3). Alternatively, the authors could have examined changes in gene expression for some of the loci on the neighbouring MDS.
  
  The 4R-CBS mutation causes two distinct defects that should be considered separately: (1) co-elimination of 4R-MLS and the adjacent 4R-MDS during uniparental transmission of the 4R-CBS mutation; and (2) a global block of DNA elimination during biparental transmission of the 4R-CBS mutation.
  
  For the first defect, 4R-MLS and 4R-MDS may simply co-segregate into the nuclear compartment where DNA elimination occurs when the chromosome break that normally separates 4R-MLS from 4R-MDS is blocked. In this scenario, no additional process, such as spreading of scnRNA production, heterochromatin formation, or gene silencing, would be required to induce co-elimination. This point was not clearly stated in the previous manuscript, and we have now added a discussion of it to the revised manuscript.
  
  The possibility of gene silencing within 4R-MDS was raised as a potential explanation for the second defect. To test this possibility, we performed RNA-seq analysis of wild-type and 4R-CBS mutant cells to determine whether gene expression from 4R-MDS is affected by mutations at 4R-CBS. Contrary to our expectations, we found that genes in 4R-MDS are not significantly down-regulated in 4R-CBS mutant cells compared with other genes. This result suggests that the DNA elimination defect in these cells cannot be explained by silencing of genes located within 4R-MDS. We have added these RNA-seq data to Figure 9 and described them in the Results section. We have also revised the Discussion to propose alternative possibilities that may guide future investigations.
  
  The other main weakness is that since the authors only mutated the end of one germline chromosome, it is not clear whether the elimination of the MDS adjacent to the terminal MLS on chromosome 4 when the CBS is mutated is a general phenomenon, i.e., would happen at all chromosome ends, or is unique to the situation at Chromosome 4R. Knowing whether it is a general phenomenon or not would provide important insight into the authors' findings.
  
  As was described in the manuscript, the short (CBS = 15 nt) target within AT-rich and repetitive regions prevent designing gRNAs specifically targeting some of the chromosome end CBSs. We tried to mutate the CBS sequences of the left end of the chromosome 3 (3L) and the left end of the chromosome 5 (5L) by the strategy we used to mutate 4R-CBS but failed. Therefore, to systematically mutate other chromosome-end CBSs, we need to establish a different strategy, such as combining template-based repairing to CRISPR-induced DSB. We have explained this technical limitation and stated that “Our data support a critical role for 4R-CBS in separating 4R-MLS from 4R-MDS, but it remains unclear whether all MIC chromosome ends are strictly CBS-dependent for their elimination.” in Discussion (Page 12).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Nagao and Mochizuki investigated how the germline (MIC) telomere was removed during programmed genome rearrangement in the developing somatic nucleus (MAC). Using an optimized oligo-FISH procedure, the authors demonstrated that MIC telomeres were co-eliminated with a large region of MIC-limited sequences (MLS) demarcated on the opposite side by a sub-telomeric chromosome breakage site (CBS). This conclusion was corroborated by the latest assembly of the Tetrahymena MIC genome. They further employed CRISPR-Cas9 mutagenesis to disrupt a specific sub-telomeric CBS (4R-CBS). In uniparental progeny (mutant X WT), DNA elimination of the sub-telomeric MLS was not affected, but the adjacent MAC-destined sequence (MDS) may be co-eliminated. However, in biparental progeny (mutant X mutant), global DNA elimination was arrested, revealing previously unrecognized connections between chromosome breakage and DNA elimination. It also paves the way for future studies into the underlying molecular mechanisms. The work is rigorous, well-controlled, and offers important insights into how eukaryotic genomes demarcate genic regions (retained DNA) and regions derived from transposable elements (TE; eliminated DNA) during differentiation. The identification of chromosome breakage sequences as barriers preventing the spread of silencing (and ultimately, DNA elimination) from TE-derived regions into functional somatic genes is a key conceptual contribution.
  
  Strengths:
  
  New method development: Oligo-FISH in Tetrahymena. This allows high-resolution visualization of critical genome rearrangement events during MIC-to-MAC differentiation. This method will be a very powerful tool in this area of study.
  
  Integration of cytological and genomic data. The conclusion is strongly supported by both analyses.
  
  Rigorous genetic analysis of the role played by 4R-CBS in separating the fate of sub-telomeric MLS (elimination) and MDS (retention). DNA elimination in ciliates has long been regarded as an extreme form of gene silencing. Now, chromosome breakage sequences can be viewed as an extreme form of gene insulators.
  
  Weaknesses:
  
  The finding of global disruption of DNA elimination in 4R-CBS mutant progeny is highly intriguing, but it's mostly presented as a hypothesis in the Discussion. The authors propose that the failure to separate MLS from MDS allows aberrant heterochromatin spreading from the former into the latter, potentially silencing genes required for DNA elimination itself. While supported by prior literature on heterochromatin feedback loops, the specific targets silenced are not identified. While results from ChIP-seq and small RNA-seq can greatly strengthen the paper, the reviewer understands that direct molecular characterization may be beyond the scope of the current work.
  
  As mentioned in our reply to Reviewer #1’s comment above, we performed RNA-seq on wild-type and 4R-CBS mutant cells at 13.5 hpm and 15 hpm and found that genes in 4R-MDS are not significantly downregulated in 4R-CBS mutant cells (Figure 9), suggesting that the DNA elimination defect in these cells cannot be explained by aberrant heterochromatin spreading. Therefore, the link between the chromosome break at 4R-CBS and general DNA elimination remains elusive and will be a very interesting subject for our future research. We have added these results and revised the discussion in the manuscript.
  
  Reviewer #3 (Public review):
  
  Programmed DNA elimination (PDE) is a process that removes a substantial amount of genomic DNA during development. While it contradicts the genome constancy rule, an increasing number of organisms have been found to undergo PDE, indicating its potential biological function. Single-cell ciliates have been used as a prominent model system for studying PDE, providing important mechanistic insights into this process. Many of those studies have focused on the excision of internally eliminated sequences (IES) and the subsequent repair using non-homologous end joining (NHEJ). These studies have led to the identification of small RNAs that mark retained or eliminated regions and the transposons that generate double-strand breaks.
  
  In this manuscript, Nagao and Mochizuki examined the other type of breaks in ciliates that were healed with telomere addition. They specifically focused on the sequences at the ends of the germline (MIC) chromosomes, which have received relatively less attention due to the technical challenges associated with the highly repetitive nature of the sequences. The authors used the Tetrahymena model and developed a set of new tools. They used a novel FISH strategy that enables the distinction between germline and somatic telomeres, as well as the retained and eliminated DNA near the chromosome ends. This allows them to track these sequences at the cellular level throughout the development process, where PDE occurs. They also analyzed the more comprehensive germline and somatic genomes and determined at the sequence level the loss of subtelomeric and telomere sequences at all chromosome ends. Their result is reminiscent of the PDE observed in nematodes, where all germline chromosome ends are removed and remodeled. Thus, the finding connects two independent PDE systems, a protozoan and a metazoan, and suggests the convergent evolution of chromosome end removal and remodeling in PDE.
  
  The majority of sites (8/10) at the junctions of retained and eliminated DNA at the chromosome ends contain a chromosome breakage sequence (CBS). The authors created a set of mutants that modify the CBS at the ends of chromosome 4R. CBS regions are challenging for CRISPR due to their AT-rich sequences, making the creation of the 4R-CBS mutants a significant breakthrough. They used the FISH assay to determine if PDE still occurs in these mutant strains with compromised CBS. Surprisingly, they found that instead of blocking PDE, its adjacent retained DNA is now eliminated, suggesting a co-elimination event when the breakage is impaired. Furthermore, in biparental mutant crosses, no PDE occurred, and no viable progeny were produced, indicating that the removal of chromosome ends is crucial for proper PDE and sexual progeny development. Overall, the work demonstrates a critical role for 4R-CBS in separating retained and eliminated DNA.
  
  We appreciate Reviewer 3’s assessment.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  All reviewers agree that this study makes an important contribution to the field; however, they also offered several suggestions for how the manuscript could be improved. In particular, we draw your attention to the comments from Reviewer #1, who suggests that the manuscript could benefit from additional information on the general composition of germline chromosome ends, where available.
  
  As noted in our response to Reviewer #1 in the Public Reviews above, we have included an analysis of the fraction of repetitive sequences for each MLS as Figure 3B in the revised manuscript, highlighting the highly repetitive nature of MLSs compared with the rest of the genome.
  
  Reviewer #1 (Recommendations for the authors):
  
  As mentioned in the weaknesses section, the authors could provide more information regarding the nature of the sequences that make up the terminal MLS. There have been reports that these are highly repetitive; is that the case? Also, did the authors identify common repeats that are not internal to mic chromosomes that could be used to track all terminal segments of the five chromosomes? This would complement their mic-telomere probe.
  
  As noted in our response to Reviewer #1’s Public Review above, we have added an analysis of the fraction of repetitive sequences for each MLS as Figure 3B in the revised manuscript, which confirms that MLSs are highly repetitive.
  
  Apart from the moderately conserved Telomere Associated Sequence (TAS), described by Kirk and Blackburn (1995) and of unknown function, we were unable to identify any obvious shared repeats unique to MLSs that could support the development of pan-MLS-specific probes.
  
  One major weakness is that the authors did little to determine the cause of the elimination of the adjacent MDS along the 4R-MLS when the CBS was mutated. It would really improve the study if the authors could show that:
  
  (1) Gene expression of genes on the MDS is reduced in 4r-CBS mutant progeny.
  
  (2) Heterochromatin modifications are unexpectedly acquired on the MDS in mutants relative to wild-type chromosomes.
  
  (3) Do scnRNA specific to the MDS region appear in the mutant progeny during development, but not in wild-type crosses?
  
  Any data that would help support the authors' hypothesis regarding how the MDS region is eliminated when the CBS is mutant would definitely strengthen the conclusions of the study.
  
  As noted in our response to Reviewer #1’s Public Review above, we performed RNA-seq on wild-type and 4R-CBS mutant cells at 13.5 hpm and 15 hpm. Our analysis showed that genes within the 4R-MDS are not significantly downregulated in 4R-CBS mutant cells (Figure 9), suggesting that the DNA elimination defect in these cells cannot be attributed to aberrant heterochromatin spreading. Therefore, the connection between the chromosome break at 4R-CBS and general DNA elimination remains unclear and represents an important avenue for future investigation. We have incorporated these results and revised the discussion accordingly in the updated manuscript.
  
  The other main weakness is that by mutating the CBS of only one chromosome arm, one can't know whether the loss of the MDS with the MLS in the mutants is generalizable for all chromosome arms or is unique to 4R. The authors noted that they were unable to make any other mutated CBSs. Another way to try to get to this question is to try to rescue the mutant by inserting a new CBS into the 4R arm such that some MLS remains linked to the 4R-MDS and see whether removing the mic telomere is the issue, or would a block of MLS attached to the 4R-MDS be sufficient to cause its elimination. I'm not sure where to exactly put the new CBS, but worth thinking about.
  
  To introduce a new CBS into 4R-MLS, we would need to insert a CBS-containing construct into the MIC by homologous recombination during conjugation and then select engineered transformants using a drug resistance marker expressed from the derived MAC. However, because 4R-MLS is still eliminated in the progeny of 4R-CBS mutants, the introduced marker would be lost from the MAC even if homologous recombination were successful. Therefore, although the strategy suggested by this reviewer is very interesting, several technical innovations are required to make such experiments feasible, leaving this approach for a future project.
  
  It seems somewhat curious that the mutation of the CBS completely blocks nuclear development. In Paramecium, the failure to complete internal DNA elimination events can lead to alternative telomere addition. The caveat being that, in Paramecium, telomere addition appears more promiscuous than in Tetrahymena. It would be helpful to know how absolute the failure to produce progeny is in these mutants. Is it zero progeny in 10<sup>6</sup>, 10<sup>7</sup>, 10<sup>8</sup> ..... mated cells? Can the authors provide a possible lowest possible frequency?
  
  The viability tests were performed using bulk mating of 2.5 × 10<sup>4</sup> cells for each cross. Because ~70-80% of mating pairs complete the conjugation process and produce exconjugants under our standard culture conditions, and because we did not detect any 6-mp-resistant progeny from MUT x MUT crosses, we estimate that the probability of obtaining viable progeny in these crosses was less than 1 progeny per ~2 × 10<sup>4</sup> mating pairs. The number of cells used for the viability assay is described in the “Viability Test of Sexual Progeny” section of Materials and Methods and the estimated frequency of progeny production from the mutants has been mentioned in Results section in the revised manuscript.
  
  The one implication of the study is that chromosome breakage and DNA elimination, two different events, are coupled. In most mutants that block scnRNA-directed DNA elimination, both IES excision and chromosome breakage occur. In the study by McDaniel, SL. et al (2016). DRH1, a p68-related RNA helicase, is required for chromosome breakage in Tetrahymena. Biology Open pii: bio.021576. doi: 10.1242/bio.021576, germline knockouts of DRH1 could complete IES excision, but not chromosome breakage, indicating that the processes can be uncoupled. It may be useful for the authors to discuss this previous work in relation to their finding that failure in chromosome breakage can lead to DNA elimination of neighboring sequences.
  
  So far, DRH1 is the only gene reported to be required for chromosome breakage without affecting DNA elimination in Tetrahymena. However, McDaniel SL et al. (2016) examined chromosome breakage at only two CBSs (distinct from 4R-CBS), and thus it remains unclear how broadly chromosome breakage, including that at 4R-CBS, is affected in the absence of DRH1. In addition, McDaniel SL et al. (2016) assessed DNA elimination at three different IESs using PCR, whereas our study examined elimination of the repetitive Tlr1 transposon using FISH. Therefore, without further analysis of the similarities and differences in chromosome breakage and DNA elimination phenotypes between DRH1 knockout cells and 4R-CBS mutants, it is difficult to draw meaningful conclusions. Accordingly, we have limited ourselves to stating the following in the Discussion of the revised manuscript: “Moreover, chromosome breakage can be inhibited without disrupting DNA elimination, as shown in cells lacking zygotic expression of the p68-like RNA helicase Drh1 (McDaniel et al., 2016).”
  
  Minor corrections:
  
  Page 7, line 3: the text "......inducing chromosome break" should either be "......inducing chromosome breaks" or "......inducing a chromosome break".
  
  Corrected as “inducing a chromosome break”.
  
  Page 13, line 13: "......large block...." should be "......large blocks......".
  
  Corrected as suggested.
  
  Reviewer #2 (Recommendations for the authors):
  
  The authors can experimentally validate that chromosome breakage at 4R-CBS is indeed disrupted by the mutations. A PCR-based assay testing de novo telomere addition is a standard tool. In addition, MLS-linked telomere should only appear transiently during conjugation in WT cells.
  
  Because it was previously unknown whether de novo telomere addition occurs at the ends of MLSs upon chromosome breakage, we tested this using a PCR-based assay. We detected telomere-added chromosome ends of 4R-MLS and 3L-MLS, which were undetectable until 10.5 hpm, appeared at 12 hpm, and gradually decreased by 18 hpm in wild-type cells (WT × WT cross). Importantly, the appearance of the telomere-added 4R-MLS end, but not the 3L-MLS end, was blocked in 4R-CBS mutants (Mut x Mut crosses), strongly supporting that the 4R-CBS mutations specifically disrupt chromosome breakage at 4R-CBS. These new data are shown in Figure 5C–E and described in the Results section.
  
  The high FISH background during conjugation may be caused by the abundant presence of dsRNA, which is resistant to RNase A treatment but may be degraded by RNase III.
  
  The high FISH background was observed in the parental MAC at 9 and 12 hpm (Figure 2, 4, and S2) where dsRNA accumulation was not detected in the previous studies (Woo et al. 2016; Shehzada et al. 2024). In contrast, the MIC at 3 hpm and the new MAC at 9 and 12 hpm, where strong dsRNA accumulation was detected, showed much weaker background FISH signals (Figure 2, 4, and S2). Therefore, we believe that dsRNA is not the main cause of the high FISH background.
  
  It is likely that the long MIC telomere is treated as IES and targeted for DNA elimination. Indeed, telomere-specific scnRNA is abundantly produced during conjugation (http://www.ncbi.nlm.nih.gov/pubmed/19460867).
  
  We have cited the suggested literature and the following description has been added in Discussion to relate the reported telomere-derived scnRNAs to the abundant scnRNAs produced from MIC chromosomal ends: “In addition, telomere-complementary scnRNAs were reported to be produced specifically during conjugation (Cao et al. 2009).”
  
  Global disruption of DNA elimination may be a direct effect (DNA excision machinery affected) or indirect (unrepaired DSB and checkpoint activation).
  
  It has been reported that unrepaired DSBs caused by loss of Ku80 (Tku80) do not block DNA elimination in Tetrahymena (Lin et al. 2012). Therefore, checkpoint activation by unrepaired DSBs, if it occurs, is unlikely to explain the DNA elimination defect observed in the progeny of 4R-CBS mutants. Nonetheless, this direct-versus-indirect issue would be relevant when considering whether disruption of specific 4R-MDS-encoded genes in 4R-CBS mutants could cause the DNA elimination defect. Our new RNA-seq analysis, however, suggests that this possibility is unlikely. Therefore, we did not add further discussion of this direct-versus-indirect issue.
  
  Minor points:
  
  The zoom-in boxes in most images are barely visible.
  
  We have modified the zoom-in boxes to make them clearer.
  
  Page 13: scnRNA precursors (Cai et al., 2025) (Cai et al., in press). Is it one paper or two?
  
  They are two papers and the latter was published reacently. We have updated the citation.
  
  Reviewer #3 (Recommendations for the authors):
  
  The manuscript is well-written, with clear data, thoughtful discussion, and concise presentation. I have only a few minor comments below.
  
  For Figure 4 and others, the right panel shows the stats and percentages, with positive and negative labels. It's a bit confusing at first glance. I think it can be clarified what positive and negative mean in the legend.
  
  The legends of Figure 4, Figure 6 and Supplementary Figure S2, have been modified as “The presence (Positive) or absence (Negative) of the 4R-MLS FISH signal in new MAC (An) in 50 cells per time point was examined.”
  
  The quality of the FISH images is low at their current resolution. It is difficult to get a clear view.
  
  In the initial version, some images were in low resolution when we combined them into a single pdf file for review. In the revised manuscript, the images have been replaced with high-resolution images.
  
  The co-elimination of neighboring 4R-MDS when 4R-CBS is mutated, can this be viewed as a fail-safe mechanism to ensure the elimination of the chromosome ends? Regardless, the result begs the question of the significance of end removal and remodeling of PDE. Some speculations in the discussion might be helpful.
  
  Because the neighboring 4R-MDS contains approximately 100 predicted genes, its co-elimination would likely be too risky to evolve as a fail-safe mechanism for ensuring chromosome-end elimination in every generation. Instead, we interpret this as an erroneous process that can still be compensated for through endoreplication of the remaining, normally processed 4R-MDS from the non-mutated copy.
  
  We further speculate that the connection between chromosome breakage at 4R-CBS and the essential PDE process may serve as an evolutionary pressure to preserve the 4R-CBS locus in a chromosome breakage-competent state. We have added the following discussion to the revised manuscript (Page 15): “The observed link between chromosome breakage at 4R-CBS and the essential DNA elimination process may reflect the biological significance of MLSs and the importance of their removal from the MAC. Coupling these processes may have evolved as a mechanism to ensure that only functional chromosome-end CBS loci are preferentially transmitted to future generations.”
  
  Figure 1, legend, line 3, "the sexual reproduction process", do you mean "the sexual reproduction proceeds or initiates"?
  
  We meant “conjugation” = “the sexual reproduction process”. To make this clearer, we have revised the legend as “conjugation, which is the sexual reproduction process of Tetrahymena”.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.21.677574v6
www.biorxiv.org www.biorxiv.org

Medial prefrontal cortex encodes but is not required to generate goal-directed actions under threat

1
1. Public_Reviews 23 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  The authors conclude that mPFC is not required for avoidance, based on the minimal behavioral effects of optogenetic inhibition. While this interpretation is supported by the data, the choice of viral constructs could lead to an underestimation of the mPFC's role for other reasons. First, the choice of viral constructs could lead to an underestimation of the mPFC's role for several reasons. Specifically, the efficacy of eArch3.0 inhibition was not verified beyond histology, and its non-cell-type-specific nature could lead to disinhibition or compensatory activity in downstream regions. Although the authors' use of visual cortex (VI) inhibition as a control suggests that broad cortical inhibition does not impair avoidance, subcortical compensation cannot be ruled out. Additionally, Vgat-ChR2 targets only GABAergic neurons, potentially missing glutamatergic contributions. Addressing these limitations in the Discussion section would strengthen the manuscript.
  
  We thank the reviewer for these points. First, although we did not perform direct electrophysiological verification of eArch3.0 efficacy in mPFC in the present study, this construct has been extensively validated in prior work and is widely used to produce robust neuronal inhibition. In our experiments, the lack of behavioral effect with eArch3.0 inhibition converged with the results obtained using the independent Vgat-ChR2 approach, which we directly validated, supporting the conclusion that mPFC inhibition does not impair avoidance under these conditions. Our results are also consistent with previous studies showing that mPFC lesions do not impair avoidance behavior.
  
  Second, we agree that manipulating mPFC activity will necessarily influence downstream circuits, including subcortical regions, given the interconnected nature of these networks. Our goal was to test whether inhibiting mPFC activity alters avoidance behavior, not to isolate it from its targets. In this context, the absence of behavioral effects indicates that avoidance behavior can be supported without mPFC activity. While compensation is always a possibility, this usually reveals some impairment while compensation occurs, but we did not observe those effects. Our results are consistent with the idea that subcortical circuits normally mediate these behaviors.
  
  Finally, regarding Vgat-ChR2, activating GABAergic neurons is a well-established approach to suppress cortical activity, as these interneurons provide strong inhibition onto local glutamatergic neurons. Thus, this manipulation is expected to broadly reduce excitatory output in cortex. Indeed, the robust suppression of cortical activity we observed with GABAergic activation makes it unlikely that major glutamatergic contributions were missed.
  
  These points are in the paper, including the Discussion.
  
  Reviewer #2 (Public review):
  
  (1) There are few details on the linear mixed models in the methods. This section could be improved by including a mathematical description. More importantly, the reader never learns how accurately the models capture the data. Given that most conclusions rely on the models, it seems central to address this point carefully. For example, what is the explained variance, marginal, and conditional? Were the nested models compared to non-nested ones (e.g., AIC), what are the specific outputs of the likelihood ratio tests briefly mentioned in the methods?
  
  Model structure was defined a priori by the experimental design and hypotheses rather than selected through model comparison, but we verified the contribution of key model components (e.g., covariates, interactions, and random effects) using likelihood ratio tests comparing models. Regarding model performance, we now report for each model the marginal and conditional R<sup>2</sup> values (Nakagawa), which quantify variance explained by fixed effects alone and by the full mixed model including random effects. In addition, likelihood ratio test results for all fixed effects and interactions (χ<sup>2</sup> statistics) were already reported in the manuscript.
  
  (2) For several figures, there is a disconnect with the main text, in the sense that it is difficult to understand how statements in the main text connect with specific figure panels or bars in their graphs. This is particularly the case for the most complex figures, e.g., Figures 3, 4, and their supplements. It would be beneficial to introduce subfigure labels (A1, etc) and state explicitly in the main text what figure panel is described (in parentheses). Alternatively, breakdown the figures into multiple ones, decreasing ambiguity. This is important because it will help the reader better assess the strength of the results.
  
  We have significantly revised the manuscript to reduce ambiguity and thank the reviewer for each of their (28) requests, which we have implemented in full. We also added additional figure references to the Results to assist with readability. This has significantly improved clarity and readability.
  
  (3) It does not appear that the code and data used to produce the figures are made available. That would be very beneficial, given the complexity of the analysis and dataset collection procedures. It would also help readers better understand the results and probe their validity.
  
  As usual, we will share the full dataset in the VOR at Dryad after the revision is completed.
  
  Reviewer #3 (Public review):
  
  The main weakness, in my view, lies in the Results section. In the figures, the authors do not present any raw data, and the plots are shown as mean {plus minus} SEM without displaying the distribution of individual data points.
  
  We thank the reviewer for the recommendations. Individual data points are shown where appropriate (e.g., Fig. 1). However, most of our analyses involve repeated-measures, hierarchical data with multiple levels (cells and sessions nested within animals), where simple point overlays can be misleading or difficult to interpret without explicit linking across levels. We therefore use mean ± SEM visualizations for clarity in these summary figures, while preserving the full hierarchical structure in the statistical analysis through mixed-effects models. All data will be made available in the VOR to allow full inspection of the underlying distributions.
  
  It is both a strength and a weakness that the authors do not attempt to guide the reader through the Results section and instead present the findings with very little emphasis on the key outcomes of the GLM. While this approach is arguably the most transparent way to report results, it also makes the section quite difficult to follow and may discourage readers.
  
  I would recommend rewriting the Results section to make it more accessible to a broader audience. A similar issue applies to the figures: presenting all plots reflects a commendable commitment to transparency, but it would greatly benefit from a clearer narrative. As it stands, it is difficult to grasp the message of each figure by simply browsing through them.
  
  The full description (complexity) of the models is entirely in the legends and supplemental figures. This was done to make the results easier to follow. We have made all the changes noted above to facilitate readability while assuring there is enough transparency to assess the data. We think readability has significantly improved.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  Below are a few specific suggestions related to the main weaknesses mentioned above.
  
  (1) P4 L9: The sentence starting with "However, most ..." sounds more like a statement than a contrast with the previous sentence. Therefore, please delete "However" and please add references to justify the statement.
  
  Done.
  
  (2) P8: Definition of movement peaks. It would be great to have three videos illustrating the mouse behavior in the three different movement peaks. This would allow the reader to better understand the differences between no peaks 3 sec prior, more than 5 seconds, and one example that does not fit these two categories. In addition, what percentage of all peaks to the no peaks 3 sec prior and more than 5 sec represent?
  
  We added the percentages. The “3 sec prior” represent ~23% and the “5 sec” represent ~31%. However, we do not think adding a single video of one movement per these 3 cases would be useful as the dataset is composed of thousands of these movements.
  
  (3) P8: Last paragraph. When you state that you performed a linear fit between DF/F and movement, do you mean speed? In addition, the statement "integrating both signals over a 200 ms window" is incomplete. How is the window selected? Is the window 200 ms around movement onset or movement peak speed?
  
  Yes, the movement variable used in the linear fit corresponds to speed. Regarding the 200 ms window, this analysis does not focus on specific behavioral events such as movement onset or peak speed. Instead, both ΔF/F and speed signals were segmented into consecutive 200 ms windows across the entire recording session, and the linear relationship was computed across these paired segments. Thus, the analysis captures the overall relationship between neural activity and ongoing movement, rather than eventaligned dynamics. We have revised the text to clarify both the use of speed and the implementation of the 200 ms window.
  
  (4) P14: Discussion of AA19 and AA39 tasks: It would be helpful to clearly specify what percentage of actions you would expect given no learning, is it the 23% action dashed line indicated in the top panel of Figure 2B?
  
  The expected percentage of actions under no learning is not fixed, as it depends on the rate of spontaneous (non–cue-driven) crossings. In these tasks, we estimate this baseline using behavior during the noUS condition, where the action rate is ~23% (Fig. 2B). In the AA19 and especially AA39 tasks, this baseline decreases because spontaneous inter-trial crossings (ITCs) are progressively reduced, leading to lower expected action rates under no-learning conditions. Thus, the 23% baseline derived from noUS is lower in the AA19/39 tasks. In other studies, we explicitly included NoCS (no-cue) trials to estimate chance performance; however, in the present design we rely on the noUS baseline and the observed changes in ITC rate. We have clarified this point in the text.
  
  (5) P15 L2: "Considering tone intensity (Fig. 2B), CS1 avoids latencies increased at medium and high intensities but not a low intensity." This is confusing. Are you referring to the AA39 triangles under CS1 in the middle panel, left? They are all above the dashed reference line. So the plot seems to contradict the statement. If you are referring to AA19, the red dots also seem to show the opposite of the statement.
  
  The dashed reference line reflects latency during the noUS condition and is included for visual reference; however, these values are not directly comparable to those in the AA tasks, as noUS latencies are largely unconstrained and reflect baseline behavior rather than learned responding. The statement in the text refers specifically to changes across AA conditions, consistent with our analysis approach throughout the manuscript, where values are compared to the immediately preceding condition. In this case, we are referring to AA39 (triangles) relative to AA19 (circles). Under this comparison, CS1 avoidance latencies increase at medium and high intensities, but not at low intensity, consistent with the statistical contrasts. We have revised the text to clarify the points.
  
  (6) P17: "Movement and neural measures subtract the baseline from the other three windows at a trial level." Do you mean to say that for each measure, the baseline was subtracted? How is baseline defined (over which time window)?
  
  The baseline is defined in that same paragraph as the −0.5 to 0 s pre-CS window. To improve clarity, we have revised the text to explicitly restate this definition in the sentence describing baseline subtraction.
  
  (7) P17: "Fig. 2-Supplement 2A,B shows model-derived marginal means of movement averaged across tone intensities." Some explanation needs to be provided, since the previous figures show a dependence of behavior on tone intensity. Are you doing this based on Fig. 2-S1?
  
  Yes, these results are derived from the same model of the full data shown in Fig. 2–S1. In this particular analysis, tone intensity was included in the model but not retained when computing marginal means and contrasts, effectively averaging across intensity levels. The rationale for this approach is that tone intensity was primarily used to increase behavioral variability, particularly error rates, which are otherwise low in this task. Averaging across intensity therefore improves statistical power and allows us to more clearly isolate the effects of the primary factors of interest. We have clarified this point in the text.
  
  (8) P18: "Orienting magnitude was strongly dependent on tone intensity...". However, in Figure 2-S2, there is no information about tone intensity. So how is the reader supposed to see this? Same issue on P19 when discussing the action window. Generally, the description of Figure 2-S1 and S2 is difficult to follow and should be improved. It is not clear that all panels are referred to in the text.
  
  We have revised the start of the Movement section to clarify how tone intensity is treated across analyses and figures. Specifically, tone intensity is included as a factor in all statistical models; however, for clarity of presentation, it is sometimes collapsed in figures to reduce dimensionality and to emphasize other task-related factors. This manipulation was introduced primarily to increase behavioral variability (particularly error rates), thereby improving sensitivity for estimating the effects of the other task variables.
  
  We have also clarified when we reference Fig. 2–S2 legend that, although intensity is not displayed in the figure for visualization purposes, it is included in the underlying model and its effects are reported in the supplement.
  
  (9) P22, 23: Windows are mentioned, but not defined or indicated in figures.
  
  We have clarified in the text that the same time windows defined for movement analyses (baseline, orienting, action, and from-action) were also used for the neural analyses.
  
  (10) P22: "Covariates were standardized within each window so that estimated marginal means reflected ΔF/F at average covariate values." It is unclear what was done exactly. What do you mean by "standardized"? Maybe give an example here and elaborate in the methods.
  
  By “standardized within each window,” we mean that covariates were z-scored within each analysis window (i.e., each covariate was transformed to have a mean of 0 and a standard deviation of 1 within that window). This ensures that estimated marginal means correspond to ΔF/F evaluated at the average covariate values within each window. We have clarified this in the Methods and Results.
  
  (11) P24-25: Indicating spurious action on Figure 3-S2 (and in Figure 3) would help the reader follow the argument in the main text.
  
  We clarified this in the legends by indicating that actions not classified as AA, PA, Escape, or PA Error are spurious actions.
  
  (12) P25: "After controlling for ..., but this includes the effects of aversive stimulation." The second part of this sentence was not clear.
  
  We have clarified this sentence to indicate that avoidance errors are followed by aversive stimulation (i.e., errors are punished).
  
  (13) P34L3: "Classs" -> "Class".
  
  Fixed.
  
  (14) P42 top paragraph: There are two references to Figure 5-S1 panel D, but there is no panel D on the figure.
  
  Fixed.
  
  (15) P57: The sentence starting with "Random effects were specified ..." is very difficult to follow.
  
  We have revised this sentence to improve clarity by separating the description of the random-effects structure from the model syntax.
  
  (16) P57: The windows analyzed are finally defined at the bottom of this page. The information also needs to be included early in the results to improve comprehension.
  
  This is now included in the main text when windows are first used in the movement section.
  
  (17) P58: Several R packages are mentioned by name, but without specifying that they are R packages, which would facilitate reading.
  
  We added R.
  
  (18) P58 top paragraph: "Tuckey's correction", do you mean "Tukey's HSD test"?
  
  We thank the reviewer for noting this. We used Holm-adjusted p-values for multiple comparisons (as implemented in emmeans) and have revised the text.
  
  (19) P63: "features extracted from F/F" do you mean "DF/F"?
  
  Yes, fixed.
  
  (20) Figure 1B speed plots: it is not possible to visualize the lines at the movement peak because they overlap completely. You can either add an inset on the left of the peak (for each panel), magnifying that region, or play with the transparency of the traces to improve visibility. There is a similar issue in Figure 5A, B. (Alternatively, if it is not possible to solve the issue graphically, explicitly state that traces overlap.)
  
  We have fixed this by making some traces dashed in Figure1 and 1-S1, which reveals the underlying traces. We also stated that the peak speed completely overlaps. In Figure 5, we stated that traces overlap as expected; transparency or dashing does not work well with the colors used in Figure 5 and in fact the overlap emphasizes the similarity of the movements.
  
  (21) Legend 1A: abbreviation CCF not defined. Is it anterior to the left? Abbreviation WM not defined. The right panels are unclear. The legend states that they show a schematic of the location of the optical fibers, but that was not clear. Do the dots indicate the location of the fibers? Is the green region indicative of V1? Same for dark gray in the mPFC panel. What are the lighter grey regions and the blue region? Does 'lateral' mean 'lateral from midline'? Please clarify these points.
  
  CCF is defined in Methods, and the typesetting process will adjust abbreviations as needed per the journal. We have defined MW and clarified all the other points in the legend.
  
  (22) 1B: "peaks taken at a fixed interval > 5 s", this is a bit confusing. If the interval is fixed, the exact time interval should be given. If it is > 5 s, then this suggests that it is not fixed. Do you mean "at intervals > 5 s"?
  
  Yes, fixed.
  
  (23) Figure 1-S1C: is the area the integral of the z-scored DF/F above zero DF/F? If so, it should have units of seconds (integral over dt of a dimensionless variable). Similarly, the Peak is a z-score value? In addition, is the time to peak in seconds? What is zero? Peak time of movement?
  
  We thank the reviewer for raising these points. We have clarified the terminology in the text and figure. Specifically, “area” was inaccurately labeled and refers to the mean z-scored ΔF/F within each analysis window (not a time integral). Peak values correspond to the maximum z-scored ΔF/F within the window, and time to peak is reported in seconds relative to the alignment point. We have also clarified the definition of time zero and included these definitions in Methods.
  
  (24) Figure 2-S1: It is not clear if this figure is obtained by averaging across all animals. Please explain in the legend.
  
  We clarified that values represent averages across mice.
  
  (25) Figure 2-S2: Are the speeds in A and B in units of cm/s (vertical axis)? This needs to be indicated.
  
  We have clarified in the figure legend that movement speed is expressed in cm/s.
  
  (26) Figure 5A, scale bar: It looks like a Delta is missing in front of F because the label reads 0.5 F/F instead of 0.5 DF/F. I am unclear why there are three colored traces for the speed panels. If the colors denote neuron classes, does this mean they were recorded in different sessions, allowing the authors to distinguish activation speed for each class separately?
  
  We fixed the scale bar typo. The speed traces in the bottom panels are shown to illustrate that movement is highly similar across activation types within each avoidance mode, indicating that the observed large differences in neural activity cannot be attributed to differences in movement. Minor differences in the speed traces arise because activation types are composed of neurons that can be recorded in the same or different sessions, and each activation type may not be present in every session. We added several sentences to this section that should fully clarify the issue.
  
  (27) Figure 4-S1 legend B: Please indicate why the two panels are missing for the PA case (for the confused reader).
  
  We have clarified in the legend that panels are not shown for correct CS2 passive avoids because these trials do not involve an action, and therefore from-action alignment cannot be defined.
  
  (28) Figure 5-S A, B: Units missing for speed.
  
  Fixed.
  
  Reviewer #3 (Recommendations for the authors):
  
  I cannot assess the scientific validity of the study design as it is too far away from my direct field of expertise. But I found the authors' arguments convincing, and the results sound pretty consistent with the little I know of the field. The recording methods are good and the statistical analysis robust. So my only recommendation for the authors would be to work on the figures to improve clarity.
  
  Thank you. We have introduced various changes that we hope will facilitate readability for a wider audience while preserving the necessary details.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.11.25.690391v3
www.biorxiv.org www.biorxiv.org

Dominant α-tubulin mutations rescue tauopathy neurodegenerative phenotypes in C. elegans

1
1. Public_Reviews 23 Jun 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This study identifies mutations in alpha-tubulin that suppress Tau-induced neurodegeneration using the C. elegans model of Tauopathy, suggesting a potentially interesting role for microtubule properties in modulating Tau toxicity. These missense mutations cluster in the C-terminal Tau-interacting helix 12 region of alpha-tubulin genes (tba-1, tba-2, and mec-12). Further analysis, particularly using the strongest suppressor tba-2, shows that it rescues Tau-induced behavioral deficits and neuronal loss without significantly altering bulk tau-phosphorylation, aggregation, or binding to soluble tubulin. The authors suggest that altered microtubule properties underlie the neuroprotective effects, and manipulating microtubule properties may have therapeutic potential.
  
  Strengths:
  
  The study is conceptually interesting as it shows that Tau-induced neurotoxicity can, in this model, be partially uncoupled from canonical pathological hallmarks such as Tau-hyperphosphorylation and aggregation. The identification of multiple independent mutations in the same structural region of three alpha-tubulin genes provides support for the functional relevance of helix 12 in modulating Tau-induced toxicity. The authors demonstrate significant rescue of behavioral deficits (using motility and manual thrashing assays) and neuronal loss in both WT-tau and FTLD-associated TauV337M in combination with mutant alpha-tubulins, suggesting a general mechanism for tubulin-regulated modulation of Tau-toxicity. Moreover, the correlation between mutant tubulin expression levels and the extent of rescue supports a causal relationship.
  
  Weaknesses:
  
  One of the major claims of this manuscript is that altered microtubule properties suppress Tau toxicity. The only supporting evidence in this context provided by the authors is reduced taxol-stabilized microtubule mass, which does not fully explain neuronal loss or the rescue of behavioral deficits. What remains unclear is whether these mutations alter microtubule dynamics, catastrophe, lattice stability, or axonal transport.
  
  We agree with Reviewer #1’s critique that the evidence presented does not fully explain neuronal loss and requires further investigation. This first manuscript characterized the mutations discovered through forward genetic screening techniques and provided data to support the positive correlation mutant expression and level of suppression. We believe the studies and data presented here help to formulated the next testable hypotheses, and guide the next lines of experimentation. We are encouraged by Reviewer #1’s assessment that exploration of microtubule dynamics, catastrophe, lattice stability and axonal transport will be critical to testing the hypothesis that mutant tubulin drives suppression of tau toxicity through changes to microtubule properties. These suggestions are highly relevant and align with our priorities as we recently submitted an application for a 5-year research award to support these key questions.
  
  To address this specifically, the reviewer recommended “The microtubule-dependent axonal transport should be examined in tubulin mutants and compared with mutant tubulin + Tau conditions. Imaging of mitochondrial or synaptic vesicle markers, along with appropriate quantifications (velocity or run length), may provide a functional readout linking microtubule changes to neuronal survival.”
  
  We agree with the reviewer that these experiments will be highly valuable to further understand the mechanisms underlying suppression, and we have planned to complete these experiments upon receipt of funding that would directly support the completion of these experiments.
  
  The authors show that mutant tba-2 reduces total tau levels by ~45%. This level of reduction is likely significant but underexplored in the manuscript. Why are the Tau levels reduced? How is Tau getting cleared- is there enhanced autophagy or ubiquitin-proteasome pathway getting upregulated in tba-2 + Tau animals? Or one or more of the Tau species not detectable by the antibodies used in this study? The observation that the mec-12 mutant rescues Tau-induced phenotypes without altering Tau levels suggests that suppression can occur through Tau-independent mechanisms. This raises an important unresolved question regarding the extent to which suppression is Tau-dependent vs Tau-independent across different mutant alpha-tubulin genes, complicating the interpretation of the rescue phenotypes.
  
  We think the reviewer has addressed an important point that there may be both tau-dependent and tau-independent mechanisms at work here, and we will add greater nuance to this in our discussion. Additionally, we agree these two potential mechanistic pathways merit further exploration. To address this, we have planned to conduct experiments using reporter C. elegans lines crossed with our mutant tubulin/tau-transgenic lines to detect potential upregulation of these pathways as mechanisms for tau clearance.
  
  Given that Tau primarily associates with the microtubule lattice in vivo, measuring interactions with soluble tubulin may not fully capture biologically relevant binding dynamics and therefore does not exclude the possibility that these mutations alter tau-microtubule interactions at the lattice level or may affect the binding of other MAPs/regulators, thereby altering stability or trafficking.
  
  In the discussion we acknowledge the limitation of only examining the binding affinity between soluble tubulin and tau and intend to complete further studies with polymerized microtubules containing mutant α-tubulin. We will expand discussion of this in the text. Similar to reviewer 1, we have also concluded that the next line of experimentation will focus on mutant alpha-tubulin effects on the microtubule polymer such as changes to MAP interactions, stability and trafficking. We have applied for and hope to receive funding to address these questions in the near future.
  
  To address this concern specifically, we plan to conduct these experiments using C. elegans extracts to polymerize microtubules and subsequently test the binding of recombinant human tau. These co-sedimentation experiments are expected to be included in the revised manuscript.
  
  A large body of conclusions is drawn from behavioral rescue and biochemical assays. This limits the understanding of how molecular changes in tubulin might affect cellular mechanisms of neuroprotection. Are there changes in the neuronal microtubule organization, Tau localization, or its redistribution in the mutant alpha-tubulin background? Are there differences in soluble vs oligomeric vs insoluble Tau in mutant tba-2 and mec-12 animals?
  
  The reviewer raises relevant questions regarding elucidation of the mechanisms underlying mutant tubulin-mediated suppression at the cellular level. To address this concern we will analyze the cellular distribution of tau in neurons from mutant and non-mutant C. elegans.
  
  Ultimately, our goals are to identify and connect the underlying biochemical mechanisms with the observed prevention of cell death as Reviewer 1 has identified. Their suggestion to explore cellular-level changes such as mutant tubulin effects on tau distribution is highly relevant. We therefore plan to test this directly by imaging neurons in C. elegans strains expressing fluorescently labeled tau and/or immunohistochemical techniques to stain for tau in C. elegans neurons.
  
  The suppression of behavior in the co-pathology model is interesting but mechanistically insufficient, mainly because the underlying basis of suppression is not examined in these models. Moreover, it remains unclear whether tubulin-Tau genetically interacts with Aβ or TDP-43, and what cellular mechanisms account for the partial rescue observed in these co-pathology models.
  
  In agreement with Reviewer #1’s assessment, we have concluded these data, while interesting, do not substantially expand our understanding apart from the existing data. Without additional information regarding the underlying mechanisms, they do not provide substantial novel insights and we have therefore chosen to remove the co-pathology data sets from the revised version of the manuscript to refine the scope of the data and hypotheses discussed in this work.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript by Benbow et al. identifies, through a genetic screen, key tubulin mutants that, with high confidence, rescue tau-mediated ND phenotypes. This manuscript is well written, and the experimental results strongly support the authors' claims that these tubulin mutants can rescue ND-linked phenotypes in C. elegans while having little to no direct effect on Tau aggregation.
  
  Strengths:
  
  Benbow et al. use a relatively unbiased forward genetic screen to identify mutations associated with phenotypes that suppress tauopathy-related defects. The authors then logically focus on the various α-tubulin missense mutations identified in H12, which are known to localize to the external face of microtubules. The authors also carefully compare their established tauopathy-associated phenotypes in the WT TauH model, with and without specific α-tubulin mutations, using appropriate controls throughout. Lastly, the authors provide partial mechanistic insight into the α-tubulin mutant-mediated rescue, showing that these effects are independent of tau aggregation and tau phosphorylation, and instead suggest that the α-tubulin mutations may confer altered microtubule assembly properties based on the sedimentation assays.
  
  Weaknesses:
  
  While the claims are largely supported by the experimental outcomes, the authors at times do not provide enough detail in the text for readers to interpret the data sets independently. In addition, some claims appear to be slightly overstated relative to the data or the degree of error associated with those data.
  
  We appreciate the feedback regarding the need for additional clarity for independent analysis of the datasets. We will revise the figures and text to increase clarity for the readers. We will review statements and edit language in accordance with their degrees of error as appropriate.
  
  The authors measure tau binding affinities using soluble tubulin but do not assess tau binding to assembled microtubules. This is an important limitation, as the physiologically relevant interaction involves α/β-tubulin heterodimers, either free or incorporated into the microtubule lattice. Furthermore, the binding analysis appears to focus only on the D429N α-tubulin mutant, which further limits physiological relevance, as β-tubulin, which is also required for normal tau binding, is not explicitly considered.
  
  We acknowledge that the limited conclusions may be drawn from soluble tubulin interactions with tau and additional analysis with polymerized microtubules will be useful in understanding tau-microtubule binding affinity. The analysis was completed with isolated pools of tubulin from C. elegans, not recombinant mutant tubulin, so this is a heterogenous mixture of tubulin composed of α/β heterodimer subunits, and a mixture of the mutant isotype within the larger pool of wild type isotypes. While this further complicating the analysis, and is the likely source of variability, it incorporates the normal heterodimer subunit biochemistry.
  
  Given that tau prominently binds the microtubule lattice we agree with the reviewers that the assessment that experiments with polymerized microtubules containing mutant tubulin would offer a greater understanding of the effects of mutant alpha-tubulin on microtubule properties and potential mechanisms of toxic tau suppression. To test this directly we intend to complete co-sedimentation experiments using C. elegans extracts from wild type and mutant tubulin expressing C. elegans incubated with recombinant human tau.
  
  In conclusion, the thoughtful commentary and suggestions from reviewers will help improve the manuscript. We plan to complete the following experiments to address their concerns.
  
  (1) Assess tau localization in mutant tba-2 and mec-12 C. elegans as compared to tau-transgenic C. elegans without tubulin mutations. We plan to use immunohistochemical techniques and/or imaging of Dendra2-labeled tau to assess the sub-compartmental distribution of tau in C. elegans neurons. This addresses Reviewer #1’s question of whether the mutant tubulin changes tau localization in neurons.
  
  (2) Assess changes mutant-tubulin driven changes to tau affinity for polymerized microtubules. To address both reviewers concerns regarding the limitations of biding experiments with tau and soluble tubulin, We plan to use C. elegans extracts to tests whether microtubule polymers containing mutant alpha-tubulin alter tau-microtubule co-sedimentation.
  
  (3) Using C. elegans reporter lines we plan to assess whether tau clearance occurs in tba-2 mutant tubulin C. elegans through the upregulation of autophagy or ubiquitin degradation pathways.
  
  (4) Evaluate the neuroprotective effects of mutant alpha-tubulin in cholinergic neurons using a C. elegans strain expressing a fluorescent label specifically in cholinergic neurons.
  
  We plan to make textual revisions to increase clarity, aid in independent analysis of the presented datasets, and better address the possibility of both tau-dependent and tau-independent mechanisms. We appreciate the Reviewers attentive reading and thoughtful feedback for the improvement of this manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.18.712642v1
www.biorxiv.org www.biorxiv.org

Training neural networks from scratch in a videogame leads to brittle brain encoding

1
1. Public_Reviews 23 Jun 2026
  
  in eLife
  
  Reviewer #3 (Public review):
  
  Summary
  
  In this paper, the authors have 5 human subjects learn to play Super Mario Bros while undergoing fMRI for 15 hrs each. They compare a reinforcement learning (RL) model (PPO), an imitation learning (IL) model, and a vision model (ResNet) in their ability to play the game, match human behavior, and, critically, explain human brain activity.
  
  The key findings can be summarized as follows:
  
  (1) RL, IL, and vision models explain similar amounts of variance in the BOLD signal (Fig 2a), with a significant but small trend of RL > IL > ResNet (Tab 1).
  
  (2) Untrained models with the same architecture explain a smaller but very similar amount of variance (Figure 2a, Table 1).
  
  (3) The brain maps across all models (and layers) are strikingly similar, with the strongest effects in visual, parietal, and motor regions (Figures 2b, 2d; Supplementary Material II).
  
  (4) Behavioral and neural performance are correlated across model checkpoints (but not levels), such that later checkpoints in training have better behavioral and neural encoding performance (Figures 3 & 4), although the neural effect plateaus pretty quickly.
  
  (5) Out-of-distribution performance is quite poor, both behaviorally (Figure 5a) and neurally (Figure 5b).
  
  I believe this work will be of interest to neuroscientists, cognitive scientists, and AI researchers alike. There has been a growing trend in neuroscience to adopt AI models as cognitive models of complex perception and action, while at the same time, AI researchers are increasingly looking at the brain for inspiration. The key finding of this paper -- that these models fail to generalize to out-of-distribution levels -- questions the core assumptions of this whole enterprise.
  
  Strengths:
  
  Unlike previous studies applying machine learning to naturalistic game-play, the authors take great care to make sure their models are evaluated on an equal footing, using equivalent or similar architectures/number of parameters and training data.
  
  While the number of subjects (5) is relatively small, the amount of data per subject (15 hours) is impressive, which is important for fitting the imitation learning & ResNet models and for obtaining reliable encoding performance for each individual subject. The authors employed a train/val/test split and held out sets, the gold standard in the literature.
  
  Overall, the paper was well-written and easy to follow. The figures clearly illustrate the main findings.
  
  Weaknesses:
  
  (1) Missing statistical tests
  
  I think the main weakness of the paper is that many of the claims are qualitative in nature and lack appropriate statistical tests, for example:
  
  - "The conv3 layer has the highest brain encoding score";<br /> - "Robust association between task performance and brain encoding" ;<br /> - "Level patterns strongly predict brain encoding";<br /> - "Brain encoding performance was severely degraded";<br /> - "Effect of training on brain encoding was apparent".
  
  While these effects are indeed qualitatively visible in the figures, it is unclear which of these differences are significant (with the notable exception of Table 1). I believe the paper would benefit substantially if these effects were quantified and every claim were supported by the appropriate statistical tests. As an example, with the exception of Table 1 and the corresponding paragraph, I could not find any p-values in the results section.
  
  (2) Missing model performance and human-likeness
  
  Also absent from the results is an assessment of model performance on the task and similarity to human performance/behavior. From Figures 3 and 4, we can see that the game score of PPO is around 500-1000 - how does that compare to the humans? We can also see that the imitation scores for IL are around 0.4-0.7, but what does that mean? Such results would be crucial to assess if the models have indeed learned to play the games and/or imitate the humans, and therefore, whether they would be good candidates as cognitive models (before even looking at brain activity). At minimum, plotting the human versus model game scores (see e.g. Tomov et al. 2023 Neuron, Figure 2) would be helpful; or, if you'd like to dig deeper, showing that human actions are more valuable or more likely under those models (see e.g. Cross et al. 2022 Neuron, Figure 2). It might also be helpful to look at imitation scores for the RL model and game performance of the imitation model -- I suspect they will both be bad, but they can at least serve as informative baselines for their counterparts.
  
  (3) Possible undertraining
  
  Relatedly, one possible explanation for why the Untrained model does so well is that all the models may be effectively undertrained. For example, while there are no training curves in the paper, it seems from the spacing of the checkpoint game scores (x-axis on Figure 3c) that the RL model may not have converged yet (it would be helpful if those were somehow colored by training epoch). Showing training curves would be helpful (i.e., something similar to Figure 3a, except with performance on the y-axis).
  
  Additionally, it would be great to provide more details regarding the PPO training protocol. How many episodes? How many steps per episode? How many steps for all of the training? Similarly, for the imitation learning model: batch size, number of epochs, optimizer, scheduler, etc.
  
  (4) Mysterious poor encoding performance of Untrained and ResNet models on the held-out set
  
  Critically, and related to that, I'm a little confused about the Untrained model results on the held-out set (Figure 5b, top row on the right). Why should those be any different from the test set results with the Untrained model (Figure 2a, right, fourth row from the top)? It makes sense why the other models are worse on the held-out set -- they have never been trained on any frames from those levels. However, the untrained model has not been trained on *any* frames from *any* levels, including the test set and the held-out set.
  
  The same is true for the ResNet model, which is pre-trained on a completely separate data set and yet similarly shows worse performance on the held-out set compared to the test set.
  
  This cannot be explained by the ridge regression, which has no parameters or hyperparameters fitted on either the test set or the held-out set.
  
  The big discrepancy in the untrained model & ResNet results between the test and the held-out set makes think that there is something substantially different about the levels in that held-out set; that they are truly out of distribution compared to the other 20 levels (e.g., maybe they're the last 2 hardest levels and look completely differently? e.g. ResNet proxy in Fig 5c shows worse performance than the mean, which is indicative of an anti-correlation). Alternatively, it may be some issue with the analysis pipeline. The poor generalization results are central to the claims of the paper, so I believe this should be clarified.
  
  (4) Brittleness conclusion rationale
  
  I'm not quite on board with the author's rationale that "[poor model performance on the out-of-distribution levels] demonstrates that the models we tested are limited in scope and may not provide a valid inference of brain-like processing, as human behavior remains robust and generalizable across levels".
  
  For one, unlike the models, humans were actually trained on those levels, so it would not be surprising if they perform just as well on them as on the other levels (but do they? Again, it would be great to see some behavioral data from the humans and the models).
  
  Second, as the authors themselves show, task performance and human-likeness do not really correlate with neural encoding across levels (Fig 4a & b, respectively), so even if model performance remained "robust and generalizable" on the held-out levels, that will not necessarily translate to good neural encoding.
  
  Thirdly, and perhaps most importantly, unless the test set and held-out set were sampled exclusively from the practice phase when the subjects have mastered all the levels (that doesn't seem to be the case, but the authors should clarify), then the humans are continuously learning, which means that their own internal representations of the game are evolving. That's not the case for the models, which I assume are in "inference mode" when their representations are extracted for neural encoding. That is, their weights are frozen. So there's a fundamental mismatch between the mode in which humans are operating (continuously learning and executing) and the mode in which the models are operating (just executing). While this is true for all the levels, it may partially account for the discrepancy in the held-out set specifically.
  
  Review 3
Visit annotations in context

Tags

Review 3

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.11.28.691119v1
www.biorxiv.org www.biorxiv.org

Apparent cooperativity between human CMV virions introduces errors in conventional methods of calculating multiplicity of infection

1
1. Public_Reviews 22 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the current reviews.
  
  Public Review:
  
  Reviewer #1 (Public review):
  
  Suggestions to clarify the study:
  
  In the revised version, the authors carefully consider these suggestions and provide further details, clarifications and even some new results. Regarding the question of how infection of a cell with one virus could lead to lower probability for a secondary infection, I think that it is possible that infected cells activate antiviral programs that lead, for example, to lower expression of surface receptors. This has been considered at least in hepatitis C virus infection. However, this is a minor point.
  
  Yes, the possibility that infection of a cell by a virion would reduce chance of infection by another virion was allowed in our model. However, such as a process will not result in apparent cooperativity (n>1) in our model, and thus, is irrelevant to the issue of apparent cooperativity we identified.
  
  Reviewer #2 (Public review):
  
  In their article, Peterson et al. wanted to show to what extent the classical "single hit" model of virion infection, where always the same quantity of virion is required to infect a cell, does not match with empirical observations based on human cytomegalovirus in vitro infection model, and how this would have practical impacts in experimental protocols.
  
  Strengths:
  
  The use of a very simple and robust experimental assay, where they infected cells with serially diluted virions and measured the proportion of infected cells with flow cytometry. This convincingly showed how the proportion of infected cells differed from a "single hit" model which they simulated using a simple mathematical model ("power-law model"), and better fitted a model where virions need to cooperate to infect cells.
  
  The use of different cell types and virus strains, which allows to draw some generalizations.
  
  The exploration of the mechanisms that could explain this apparent cooperation, using biologically plausible simulations.
  
  The practical consequences that this phenomenon has for lab virologists as well as modelers.
  
  Thank you.
  
  Weaknesses:
  
  The impossibility to discriminate between biological mechanisms is an important limitation of this study and calls for developing experimental designs able to further understand this question.
  
  The outcome of the virion clumping remains highly sensitive to the choice of the clumps size distribution, which is itself very complicated to estimate, especially at high dilution.
  
  The impossibility to directly fit the mathematical models to the data limit them to a qualitative discussion.
  
  Overall, this work is very valuable as it raises the general question of how the estimate of infectivity can be biased if extrapolated from a single virus titer assay. The observation that HCMV virions often cooperate and that this cooperation varies between context seems robust. The putative biological explanations would require further exploration.
  
  This topic is very well known in the case of segmented viruses and the semi-infectious particles, leading to the idea of studying "sociovirology", but to my knowledge this is the first time that it was explored for a non-segmented virus, and in the context of MOI estimation.
  
  Thank you. We would note, however, that inability to discriminate between alternative models is not a weakness per se. It shows that our work goes beyond a somewhat typical approach in mathematical modeling to offer a single explanation for a phenomenon in question (rather than focusing on discriminating between alternatives that is often hard to do).
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) I now understand better the graphical abstract. I think my eye was too much attracted by the increase in specific infectivity that you see for more than 1 genome/cell, which is not the point of your paper. I am wondering if you should not guide even more the reader, by pointing out that the fact that the initial decline in specific infectivity represents apparent cooperativity.
  
  Let’s hope that the readers are smart enough to understand what to focus their eyes on. At the end, this is a graphical abstract that is not supposed to have too much text explaining where to look.
  
  (2) For your one-inflated geometric distribution, I agree that the estimations would remain very hypothetical because you would have to make many assumptions, however I think a hurdle model where you would fit the P(clump size = 1)=f1 and P(clump size = (i) following a one-truncated geometric distribution would be more appropriate because it would lead to a distribution closer to your PDF from figure S11C.
  
  The issue is that our data are not in clump sizes but in diameter of the clump D. This is why we opted for using a mixture of continuous distributions, not a mixture of discrete distributions. We are sharing the DLS data, so others are welcome to do another try of fitting other types of distribution to the data.
  
  (3) For the DLS data, I understand your choice to include all the datapoints, however I find the interpretation confusing: if I understand correctly, you consider that f1, the fraction of the smaller distribution, represents clumps of one virion. However, its median size is 10 times smaller than a virion. So, the number of clumps with one virion would be overestimated. I think it would be helpful for the reader to clarify this aspect, either in the results around lines 503-512, or in the discussion. Could it be that at higher dilution, what is represented by this smaller distribution would almost only be debris because the virions are so rare?
  
  When fitting a mixture of two log-normal distributions f<sub>1</sub> represents the proportion of clumps of larger size (as was described in the materials and methods). The actual estimated value of f<sub>1</sub> is not highly relevant in calculating change in PDF of the distribution only for D>=d (230nm) as shown in Suppl Fig S11C. But we now realize that this variable f<sub>1</sub> may be confused with a variable f<sub>1</sub> used to denote the fraction of clumps with virion size=1 (in Fig 5C). We now mention that in the caption of Supp Fig S10.
  
  (4) For the dashed diagonal lines of fig 2, what I don't understand is the choice of the intercept that seems a bit random. I was wondering if it would not be more helpful to make it so that the dashed line intersects the observation for 1 genome/cell, which could then be interpreted as a deviation from the "single hit" model extrapolated outside of 1 genome/cell?
  
  The diagonal lines in Fig 2 are exactly the same in ALL panels, as are the x/y axes ranges; the slope of the line (equals to 1) allows visually to see when the regression (shown by think black lines) deviates from slope=1, i.e., indicates apparent cooperativity. We will keep the lines are they are. Thank you for the suggestion, though.
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this paper, the authors conduct both experiments and modeling of human cytomegalovirus (HCMV) infection in vitro to study how the infectivity of the virus (measured by cell infection) scales with the viral concentration in the inoculum. A naïve thought would be that this is linear in the sense that doubling the virus concentration (and thus the total virus) in the inoculum would lead to doubling the fraction of infected cells. However, the authors show convincingly that this is not the case for HCMV, using multiple strains, two different target cells, and repeated experiments. In fact, they find that for some regimens (inoculum concentration), infected cells increase faster than the concentration of the inoculum, which they term "apparent cooperativity". The authors then provided possible explanations for this phenomenon and constructed mathematical models and simulations to implement these explanations. They show that these ideas do help explain the cooperativity, but they can't be conclusive as to what the correct explanation is. In any case, this advances our knowledge of the system, and it is very important when quantitative experiments involving MOI are performed.
  
  Strengths:
  
  Careful experiments using state-of-the-art methodologies and advancing multiple competing models to explain the data.
  
  Weaknesses:
  
  There are minor weaknesses in explaining the implementation of the model. However, some specific assumptions, which to this reviewer were unclear, could have a substantial impact on the results. For example, whether cell infection is independent or not. This is expanded below.
  
  Suggestions to clarify the study:
  
  (1) Mathematically, it is clear what "increase linearly" or "increase faster than linearly" (e.g., line 94) means. However, it may be confusing for some readers to then look at plots such as in Figure 2, which appear linear (but on the log-log scale) and about which the authors also say (line 326) "data best matching the linear relationship on a log-log scale".
  
  This is a good point. We included a clarification to indicate that linear on the log-log scale relationship does not imply linear relationship on the linear-linear scale. We wrote:
  
  “Because most data did not exhibit a linear relationship between virion concentration and infection probability we fitted the models to subsets of data best matching a linear relationship on a log-log scale. Note that linear relationship on log-log scale may still be nonlinear (on linear-linear scale) when n!=1.”
  
  (2) One of the main issues that is unclear to me is whether the authors assume that cell infection is independent of other cells. This could be a very important issue affecting their results, both when analyzing the experimental data and running the simulations. One possible outcome of infection could be the generation of innate mediators that could protect (alter the resistance) of nearby cells. I can imagine two opposite results of this: i) one possibility is that resistance would lead to lower infection frequencies and this would result in apparent sub-linear infection (contrary to the observations); or ii) inoculums with more virus lead to faster infection, which doesn't allow enough time for the "resistance" (innate effect) to spread (potentially leading to results similar to the observations, supra-linear infection).
  
  In our models we assumed cells to be independent of each other (see also responses to other similar points). Because we measure infection in individual cells, assuming cells are independent is a reasonable first approximation. However, the reviewer makes an excellent point that there may be some between-cell signaling happening in the culture that “alerts” or “conditions” cells to change their “resistance”. It is also possible that at higher genome/cell numbers, exposure of cells to virions or virion debris may change the state of cells in the culture, and more cells become “susceptible” to infection. This is a good point that we now list in Limitations subsection of Discussion; it is a good hypothesis to test in our future experiments. We write:
  
  “Accrued damage model is also consistent with the idea that at higher genome/cell values, the inoculum itself (including cell and/or virion debris) may impact overall susceptibility of all cells in the well, for example, making them more susceptible to infection. It may be expected, though, that exposing cells to debris would increase cell resistance to infection; this would result in n < 1 that we did not observe at small genomes/cell values.”
  
  (3) Another unclear aspect of cell infection is whether each cell only has one chance to be infected or multiple chances, i.e., do the authors run the simulation once over all the cells or more times?
  
  Each cell has only one chance to be infected. Algorithm 1 clearly states that; we will add an extra sentence in “Agent-based simulations” to indicate this point.
  
  (4) On the other hand, the authors address the complementary issue of the virus acting independently or not, with their clumping model (which includes nice experimental measurements). However, it was unclear to me what the assumption of the simulation is in this case. In the case of infection by a clump of virus or "viral compensation", when infection is successful (the cell becomes infected), how many viruses "disappear" and what happens to the rest? For example, one of the viruses of the clump is removed by infection, but the others are free to participate in another clump, or they also disappear. The only thing I found about this is the caption of Figure S10, and it seems to indicate that only the infected virus is removed. However, a typical assumption, I think, is that viruses aggregate to improve infection, but then the whole aggregate participates in infection of a single cell, and those viruses in the clump can't participate in other infections. Viral cooperativity with higher inocula in this case would be, perhaps, the result of larger numbers of clumps for higher inocula. This seems in agreement with Figure S8, but was a little unclear in the interpretation provided.
  
  This is a good point. We did not remove the clump if one of the virions in the clump manages to infect a cell, and indeed, this could be the reason why in some simulations we observe apparent cooperativity when modeling viral clumping. We have explored this in the revision and found that it does not really impact how infection rate scales with the genomes/cell (e.g., see Suppl Fig S8).
  
  (5) In algorithm 1, how does P_i, as defined, relate to equation 1?
  
  These are unrelated because eqn.(1) is a phenomenological model that links infection per cell to genomes per cell. P_i in algorithm 1 is “physics-inspired” potential barrier.
  
  (6) In line 228, and several other places (e.g., caption of Table S2), the authors refer to the probability of a single genome infecting a cell p(1)=exp(-lambda), but shouldn't it be p(1)=1-exp(-lambda) according to equation 1?
  
  Indeed, it was a typo, p(1)=1-exp(-lambda) per eqn 1. Thank you, it has been corrected in the revised paper.
  
  (7) In line 304, the accrued damage hypothesis is defined, but it is stated as a triggering of an antiviral response; one would assume that exposure to a virion should increase the resistance to infection. Otherwise, the authors are saying that evolution has come up with intracellular viral resistance mechanisms that are detrimental to the cell. As I mentioned above, this could also be a mechanism for non-independent cell infection. For example, infected cells signal to neighboring cells to "become resistance" to infection. This would also provide a mechanism for saturation at high levels.
  
  We do not know how exposure of a cell to one virion would change its “antiviral state”, i.e., to become more or less resistant to the next infection. If a cell becomes more resistant, there is no possibility to observe apparent cooperativity in infection of cells, so this hypothesis cannot explain our observations with n>1. Whether this mechanism plays a role in saturation of cell infection rate at lower than 1 value when genome/cell is large is unclear but is a possibility. We added this point to Discussion in revision (see our text above that includes this point).
  
  (8) In Figure 3, and likely other places, t-tests are used for comparisons, but with only an n=5 (experiments). Many would prefer a non-parametric test.
  
  We repeated the analyses in Fig 3 with Mann-Whitney test, results were the same, so we would like to keep results from the t-test in the paper.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The strains of HCMV used have a fluorescent reporter "in place of the US11 gene". Can you provide a brief comment on whether and how this gene deletion affects HCMV replication?
  
  US11 is a resident ER protein that is considered an "immune evasion factor". It promotes ERAD of MHC I and has no observable effect on replication of HCMV in cultured cells (Berger 2000 JVI, Wiertz 1996 Cell). We now add this information in Materials and methods section of the paper. We write:
  
  “All BAC clones were modified to express green fluorescent protein (GFP) or the monomeric red fluorescent protein mCherry (mCherry) with En passant recombineering by replacing US11 with the eGFP or mCherry gene, respectively. US11 is a resident ER protein that is considered an “immune evasion factor”. It promotes ERAD of MHC I and has no observable effect on replication of HCMV in cultured cells [27, 28]. Infectious HCMV was recovered by electroporation of BAC-DNA into MRC5 cells which were then co-cultured with either HFFCs (TB and TR) or HFF-tet cells (ME).”
  
  (2) I didn't understand what the section "Virus titer assays" refers to. When was this used? How or why is this different from the "Virus stock dilution and dose-response assay"? Also in this section, you refer to NHDF cells - can you provide more information about these? And how does a different type of cell affect the titer assay (here measured as infected cells), since this is one of the main points of your paper?
  
  Apologies for the confusion. In Ryckman lab we routinely generate viral stock and titrate it using a specific cell type, Normal (or neonatal) Human Dermal Fibroblasts (NHDF). This way, the titer of the stock is consistent between experiments by different researchers in the lab. We then use standard 10-fold dilutions to define the number of infectious units per mL of the stock. We now name this subsection as “Quantification of viral stock infectivity using standard 10-fold dilutions”. After the stock was quantified, we then used that stock in our actual experiments with very small dilution factor df that allowed us to detect deviations of the rate of infection from single hit model.
  
  (3) In many places, "powerlaw" is written. This is usually written as two words, "power law".
  
  Because powerlaw comes together with “model”, we decided to use “power-law model”.
  
  (4) Line 75: "have" instead of "has"?
  
  (5) Line 84: "with" repeated.
  
  Corrected, thank you.
  
  (6) Line 116: This section "Cell lines" seems to describe three cell lines, "HFF cells and MRC5 cells" and then "EC" cells.
  
  HFF cells are fibroblasts used in our main experiments and MRC5 cells are another type of fibroblasts. We used MRC5 cells in the first step of recovering infection HCMV from BAC DNA (electroporation). We clarified this in Materials and methods. We write:
  
  “Cell lines. Human foreskin fibroblast cells (HFFCs or fibroblasts) and MRC5 cells (also fibroblasts) were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Sigma) supplemented with 5% heat-inactivated fetal bovine serum (FBS, Rocky Mountain Biologicals, Missoula, MT, USA) and 5%Fetalgro® (Rocky Mountain Biologicals, Missoula, MT, USA). We used MRC5 cells in the first step of recovering infection HCMV from BAC DNA (electroporation). For main experiments we used HFFCs as fibroblasts. Human retinal pigment epithelial cells (ECs or ARPE-19, American Type Culture Collection, Manassas, VA, USA) were cultured in a 1:1 mixture of DMEM and Ham’s F-12 medium (DMEM:F-12, Gibco) and supplemented with 10% FBS.”
  
  (7) Line 188: Because the virus is double-stranded, do you have to divide the qPCR result by 2 to get genomes?
  
  This is typically accounted for in our calculations of genome/cell.
  
  (8) Line 200: Typically, one would write "500g" and not "500xg".
  
  Corrected.
  
  (9) Line 248: It would be clearer to write "cell type C different from cell type C2".
  
  Here C and C_2 refer to actual numbers of cell in the titration/growth experiments, so it is comparing numbers, not cell types. We kept the relationship as it is.
  
  (10) Definition of cell class: what is n in p_n, the total number of cells, or are these divided into n classes of resistance?
  
  This part was incorrectly copied from an earlier version, both cell resistance and virion infectivity was sampled from normal distributions with different mean and variances (see Table 1). We corrected the text to reflect this.
  
  (11) Line 272 to 273: Something seems to be missing, as the change of line doesn't make sense.
  
  Thank you. Edited to improve readability. Now it reads
  
  “Clumping hypothesis. In the basic model the number of virions a given cell is exposed to follows a Poisson distribution. However, it is well recognized that as virions are produced by infected cells, they may form clumps/aggregates; the number of virions per clump/aggregate may deviate from, for example, the Poisson distribution [33].”
  
  (12) Line 283: How lambda is chosen is not indicated here, only later (line 424), but at this point, one can confuse it with lambda in equation 1. Is it the same? It also doesn't seem to be indicated in your Table 1.
  
  The mean of the Poisson distribution in clump simulations lambda is not the same as lambda in eqn 1; we re-named the mean of Poisson distribution as lambda_c which is estimated by fitting a Poisson distribution to clump size distribution estimated from DLS experiments. Because it was dependent on the virus stock dilution, it is not listed in Table 1. However, we did perform additional simulations assuming lambda_c=2 (Suppl Fig S10).
  
  (13) Equation 6: I understand that you mostly used kappa=0, but in equation 6, would it be positive or negative (if not zero)?
  
  We probably expect kappa to be negative but we did not fully explore this extension of the model.
  
  (14) Line 350: Instead of "infection rates" would "infection frequencies" be better?
  
  We agree. Changed (also changed in the sentence above that line).
  
  (15) Line 366: I found this sentence a bit awkward.
  
  We edited it to the best of our ability to improve it.
  
  “Importantly, for most HCMV strain-target cell combinations we estimated n>1 (Figure 2 and Supplemental Table S2). With n>1 increase in virion concentration (i.e., higher genomes/cell values) results in a higher than linear increase in the probability of a cell to be infected (eqn. (1)) indicating cooperation between virions at infecting cells. We call this phenomenon “apparent cooperativity”.
  
  (16) Figure 2, panel L: I wonder if it would be better to include the panel with the name of the experiment, but no data. Currently, it takes a while to find what you are talking about in panel L (or at the very least, indicate the panel in the caption).
  
  Changed
  
  (17) Figure 2: When you say that experiments were done at least twice, are you referring to the GFP and mCherry versions of the experiment, or replicates within each of those fluorescent labels?
  
  Replicates with each of those labels.
  
  (18) Figure 3: What is the number on top of the black bars? I think it is the average of the paired fold change. Is this right? Why, in panel E, is it 1.32 when only one goes up?
  
  Yes, fold change. Indeed, 1.32 was a typo, it is 0.70, thank you for noting.
  
  (19) Line 408: delete the word "there".
  
  Done. Thank you.
  
  (20) Line 412: Instead of "The", it should be "Then".
  
  Done. Thank you.
  
  Reviewer #2 (Public review):
  
  In their article, Peterson et al. wanted to show to what extent the classical "single hit" model of virion infection, where one virion is required to infect a cell, does not match empirical observations based on human cytomegalovirus in vitro infection model, and how this would have practical impacts in experimental protocols.
  
  They first used a very simple experimental assay, where they infected cells with serially diluted virions and measured the proportion of infected cells with flow cytometry. From this, they could elegantly show how the proportion of infected cells differed from a "single hit" model, which they simulated using a simple mathematical model ("powerlaw model"), and better fit a model where virions need to cooperate to infect cells. They then explore which mechanism could explain this apparent cooperation:
  
  (1) Stochasticity alone cannot explain the results, although I am unsure how generalizable the results are, because the mathematical model chosen cannot, by design, explain such observations only by stochasticity.
  
  Our null model simulations are not just about stochasticity; they also include variability in virion infectivity and cell resistance to infection. We agree that simulations cannot truly prove that such variability cannot result in apparent cooperativity; however, we also provide a mathematical proof that increase in frequency of infected cells should be linear with virion concentration at small genome/cell numbers.
  
  (2) Virion clumping seemed not to be enough either to generally explain such a pattern. For that, they first use a mathematical model showing that the apparent cooperation would be small. However, I am unsure how extreme the scenario of simulated virion clumping is. They then used dynamic light scattering to measure the distribution of the sizes of clumps. From these estimates, they show that virion clumps cannot reproduce the observed virion cooperation in serial dilution assays. However, the authors remain unprecise on how the uncertainty of these clumps' size distribution would impact the results, as most clumps have a size smaller than a single virion, leaving therefore a limited number of clumps truly containing virions.
  
  As we stated in the paper, clumping may explain apparent cooperativity in simulations depending on how stock dilution impacts distribution of virions/clump. This could be explored further, however, better experimental measurements of virions/clump would be highly informative (but we do not have resources to do these experiments at present). Our point is that the degree of apparent cooperativity is dependent on the target cell used (n is smaller on epithelial cells than on fibroblasts) that is difficult to explain by clumping which is a virion property. Per comment by reviewer 1, we have done more analyses of the clumping model to investigate importance of clump removal per successful infection on the detected degree of apparent cooperativity. We found that it was not critical to our conclusions (Suppl Fig S8).
  
  The two models remain unidentifiable from each other but could explain the apparent virion cooperativity: either due to an increase in susceptibility of the cell each time a virion tries to infect it, or due to viral compensation, where lesser fit viruses are able to infect cells in co-infection with a better fit virion. Unfortunately, the authors here do not attempt to fit their mathematical model to the experimental data but only show that theoretical models and experimental data generate similar patterns regarding virion apparent cooperation.
  
  In the revision we now provide examples of our earlier simulations that “match” experimental data with a relatively high degree of apparent cooperativity (Supp Fig S9).
  
  Finally, the authors show that this virions cooperation could make the relationship between the estimated multiplicity of infection and viruses/cell deviate from the 1:1 relationship. Consequently, the dilution of a virion stock would lead to an even stronger decrease in infectivity, as more diluted virions can cooperate less for infection.
  
  Overall, this work is very valuable as it raises the general question of how the estimate of infectivity can be biased if extrapolated from a single virus titer assay. The observation that HCMV virions often cooperate and that this cooperation varies between contexts seems robust. The putative biological explanations would require further exploration.
  
  This topic is very well known in the case of segmented viruses and the semi-infectious particles, leading to the idea of studying "sociovirology", but to my knowledge, this is the first time that it was explored for a nonsegmented virus, and in the context of MOI estimation.
  
  Thank you.
  
  Reviewer #2 (Recommendations for the authors):
  
  Major comments:
  
  Two aspects of the work would benefit from further thought:
  
  (1) The simulation of virion clumps: in both cases (Poisson distribution or one-inflated geometric distribution), the proportion of clumps containing more than one virion will be small. For the Poisson distribution, as you fit the powerlaw model on the range of genomes/cell < ~ 3 genomes/cell (Figure 4B). I wonder to what extent this explains the sudden rise in infections/cells you observe above that limit. It would be interesting to plot the (cumulative) distribution of the clump sizes at different dilution levels to have a better idea.
  
  The reviewer has a good eye, indeed, the relationship between infection frequency and genomes/cell is linear up to a point, and we believe the inflection point reflects the genomes/cell values when clumps contain more than 1 virion. Here is the results of simulations with distribution of virions/clump plotted:
  
  Similarly, for the one-inflated geometric distribution, the proportion of clumps of size 1 is the sum of two events: f1, plus 1-f1 times the probability that the geometric distribution is zero, if I follow the methods on lines 287-294. I wonder if this is appropriate regarding the estimates made with the DLC. In particular, Figure 5C shows that the proportion of clumps of size 1 is more than ~ half of all the clumps, and does not seem to be the same distribution as the estimates made on Figure S9C. Maybe a hurdle model would be more appropriate?
  
  This is a fair point. In our analyses we found that modeling clump size distribution is tricky and required various assumptions. The issue with the DLS data is that we do not really know the distribution of intact virions per clump so how to relate the size of the clump to the number of virions in a clump is wide-open; we explored several possibilities and found that the answer (whether clumping results in apparent cooperativity) depends on assumptions of how clumps are modelled (e.g., compare Fig 4B and Suppl. Fig S11). Hurdle model is not appropriate for clumps because by our definition of a clump, it must have at least 1 virion. Our key observation, however, is that the degree of apparent cooperativity depends on the target cell type – and thus should be independent of virion clumping (unless there is viral cooperativity in the clumps). Overall, we decided that exploring more clumping models would take extra effort, but it is unclear if it brings any benefits to our conclusions.
  
  The analysis of the clump size distribution using dynamic light scattering, in Figure S8. If I interpret correctly, events with size < 230 nm should be excluded as they do not represent clumps of virions but rather media impurities or cell debris. Therefore, I don't understand the choice of fitting the whole set with a combination of two normal distributions, as even the larger normal distribution covers clumps < 230 nm. If the f1 indicated here is the one used in the methods line 287-294, this is then wrong because it does not represent the fraction of clumps of size 1, but rather debris.
  
  We used two normal (on log-scale) distributions when quantifying clump distribution data (Supp Fig S10) to avoid sub-selection of the data; in this way, two distribution fit the whole dataset with excellent quality. An alternative approach would be to sub-select data with size >=230nm and fit a normal (or similar) distribution of the clumps; such an approach may generate biases and/or unreliable estimates at high dilutions due to small number of clumps with large size (e.g., see Supp Fig S10S-X). In our simulations to model clump distribution and infection (Fig 5) we attempted to simulate the estimated clump size distribution (Suppl Fig S11C) only approximately. Again, because in our measurements we don’t really know the number of virions per clump, efforts to model exactly clump size distribution, we believe, are not going to give full answers.
  
  (2) Figure 4 and results lines 419-465: Why didn't you try to fit the different models to the data, instead of qualitatively comparing the estimate of n in the simulations with arbitrary parameters to the one for empirical data? Your models match the expectation of virion cooperation by design, so they are not more convincing for a virologist than logical non-quantitative reasoning. They would be of stronger evidence in my opinion if you could show how well they fit the data. You could then directly compare the different models' fits using goodness-of-fit metrics and decide whether one is better than another or if they all explain equally well the observations.
  
  Well, we have 11 different relationships between infection rate and genome/cell, finding parameter combinations that would match all the data with at least 2 alternative models seems excessive at present but it is a good direction as we get extra funding to continue this work. It is also difficult to extensively search for the parameter values that would result in a perfect fit of the stochastic simulations to data since the methods of fitting agent-based models to data are not fully developed. However, following this suggestion we now show results of simulations for the two alternative models (accrued damage and viral compensation) that we believe do match experimental data somewhat (see new Suppl Fig S9).
  
  Minor comments:
  
  (1) Graphical abstract: This requires more context as it is too rough here to help me understand the general idea of the paper. Plus, why does specific infectivity first decrease with genome/cell?
  
  We added few elements to the graphical abstract including the strain and target cell used. The decrease in specific infectivity at lower genome/cell is due to apparent cooperativity.
  
  (2) Equation (7): It would be beneficial for the reader if the reasoning behind the likelihood computation were further described.
  
  This is a relatively standard approach to model/estimate parameters of a binary outcome, e.g., see Wikipedia: https://en.wikipedia.org/wiki/Logistic_regression
  
  (3) Line 352-357: could the drop in infectivity also be enhanced/explained by increased cell mortality? Did you gate on cell viability during FCM?
  
  The infection rate was measured in live cells only, so increased cell mortality may be an explanation.
  
  (4) Figure 2: I don't understand the dashed diagonal lines: what do they represent exactly? Especially, wouldn't the single-hit model depend on p(1), in which case it should vary by cell x virus?
  
  As the caption to Figure 2 clearly states, diagonal dashed lines show the slope =1 (i.e, single hit model), so one would be able compare how far the data and/or model fit line deviate from 1. The note for p(1) in panel A is to illustrate how p(1) is calculated; obviously it varies by the strain-cell combination as is indicated in Suppl. Tab S2).
  
  (5) Fig3G: Is it not surprising to find a positive relationship between p(1) and n? I would have intuitively expected that the stricter the environment is, the more cooperation you observe. But maybe these viruses did not evolve in this context, and therefore, this relationship is different from what you expect from an evolutionary optimum.
  
  Well, we simply don’t know. The relationship simply suggests that there is connection between infectivity of a single virion and the degree of apparent cooperativity. We are not certain what is the context in which these viruses have evolved.
  
  (6) Flow cytometry assay: could it be possible that cells infected by more virions generate more fluorescent proteins and are therefore less likely to be false negatives? Maybe you could compare the fluorescence intensity distribution among infected cells in the context of low MOI vs high MOI?
  
  This is an interesting point. From presented flow cytometry plots (e.g., Suppl Fig S3), the MFI for infected cells does not seem to depend on the dilution (or genome/cell).
  
  (7) Figure S9B: I did not understand this figure. Are the axes labels correct? How is it possible to have less than 1 virion/well?
  
  The y axis shows a scaled number calculated from integrating estimated clump size distribution, we assume 1 “scaled” virion/well at highest virion/cell values. With scaling, yes, it is possible to have less than 1 virion/well.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors dilute fluorescent HCMV stocks in small steps (df ≈ 1.3-1.5) across 23 points, quantify infections by flow cytometry at 3 dpi, and fit a power-law model to estimate a cooperativity parameter n (n > 1 indicates apparent cooperativity). They compare fibroblasts vs epithelial cells and multiple strains/reporters, and explore alternative mechanisms (clumping, accrued damage, viral compensation) via analytical modeling and stochastic simulations. They discuss implications for titer/MOI estimation and suggest a method for detecting "apparent cooperativity," noting that for viruses showing this behavior, MOI estimation may be biased.
  
  Strengths:
  
  (1) High-resolution titration & rigor: The small-step dilution design (23 serial dilutions; tailored df) improves dose-response resolution beyond conventional 10× series.
  
  (2) Clear quantitative signal: Multiple strain-cell pairs show n > 1, with appropriate model fitting and visualization of the linear regime on log-log axes.
  
  (3) Mechanistic exploration: Side-by-side modeling of clumping vs accrued damage vs compensation frames testable hypotheses for cooperativity.
  
  Thank you.
  
  Weaknesses:
  
  (1) Secondary infection control: The authors argue that 3 dpi largely avoids progeny-mediated secondary infection; this claim should be strengthened (e.g., entry inhibitors/control infections) or add sensitivity checks showing results are robust to a small secondary-infection contribution.
  
  This is an important point. We do believe that the current knowledge about HCMV virion production time – it takes 3-4 days to make virions per multiple papers (see Fig 7 in Vonka and Benyesh-Melnick JB 1966; Fig 3B in Stanton et al JCI 2010; and Fig 1A in Li et al. PNAS 2015) – is sufficient to justify our experimental design but we do agree that an additional control to block novel infections with would be useful. We had previously performed experiments with a HCMV TB-gL-KO that cannot make infectious virions (but the stock virions can be made from complemented target cells). We will investigate if our titration experiments with this virus strain have sufficient resolution to detect apparent cooperativity. However, at present we do not have the resources to perform novel experiments.
  
  (2) Discriminating mechanisms: At present, simulations cannot distinguish between accrued damage and viral compensation. The authors should propose or add a decisive experiment (e.g., dual-color coinfection to quantify true coinfection rates versus "priming" without coinfection; timed sequential inocula) and outline expected signatures for each mechanism.
  
  Excellent suggestion. Because infection of a cell is a result of the joint viral infectivity and cell resistance, it may be hard to discriminate between these alternatives unless we specify them as particular molecular mechanisms. But we tried our and listed potential future experiments in the revised version of the paper. Specifically, we write:
  
  “Second, while we have proposed alternative mechanisms that may result in apparent cooperativity, at present we could not discriminate between these alternatives, in part, because the models lacked specifics – e.g., if virions interacting with a cell reduce its resistance to infection, what does it mean exactly [12]? If virions in a collection augment their infectivity (which may be expected for segmented viruses), how does that viral compensation actually work? Designing experiments that would discriminate between these alternatives would require focusing on a specific mechanism. For example, it may be that that the initiation of gene expression is difficult but is more efficient when there are more virions bringing in more tegument transactivators like pp72/ppUL35 [59]. Alternatively, it may be that there is a bona fide resistance mechanism at play here (e.g. “interferon”) that is antagonized by a viral tegument protein (like TRS1/IRS1 that acts against PKR and 2’5’OAS) [60]. Accrued damage model is also consistent with the idea that at higher genome/cell values, the inoculum itself (including cell and/or virion debris) may impact overall susceptibility of all cells in the well, for example, making them more susceptible to infection. It may be expected, though, that exposing cells to debris would increase cell resistance to infection; this would result in n < 1 that we did not observe at small genomes/cell values. Addressing these hypotheses is an area of future research that will require funding.”
  
  (3) Decline at high genomes/cell: Several datasets show a downturn at high input. Hypotheses should be provided (cytotoxicity, receptor depletion, and measurement ceiling) and any supportive controls.
  
  Another good point. We do not have a good explanation, but we do not believe this is because of saturation of available target cells. It seemed to only happen (or was most pronounced) with the ME stocks, which are typically lower in titer and so the higher MOI were nearly undiluted stock. It may be the effect of the conditioned medium. Or perhaps there are non-infectious particles like dense bodies (enveloped particles that lack a capsid and genome) and non-infectious, enveloped particles (NIEPs) that compete for receptors or otherwise damage cells and these don’t get diluted out at the higher doses. We included the point about cell death in Discussion of the revised version of the paper. Specifically, we write:
  
  “We also do not have a clear explanation of why infection frequency declines at high genomes/cell values for some strain-cell combinations (e.g., Figure 2A, C, D, I, J). Because we measured cell infection in live cells, increase in cell death at higher genomes/cell values may result in the decrease in the number of viable cells.”
  
  (4) Include experimental data: In Figure 6, please include the experimentally measured titers (IU/mL), if available.
  
  This is a model-simulated scenario, and as such, there is no measured titers.
  
  (5) MOI guidance: The practical guidance is important; please add a short "best-practice box" (how to determine titer at multiple genomes/cell and cell densities; when single-hit assumptions fail) for end-users.
  
  Good suggestion. We now include best-practice box using guidelines developed in Ryckman lab over the years in the revised version of the paper. This is how it reads:
  
  “Match viral titration methods to the experiment as far as possible. This includes using the same dilution of the viral stock, the cell type, duration of inoculation, and readout of infection.
  
  When possible, determine the degree of apparent cooperativity (“n”-value, eqn. (1)) for each virus strain/cell type pair being studied.
  
  If n= 1 (no cooperativity), it is reasonable to calculate experimental MOI based on stock infectivity value determined from a convenient stock dilution.
  
  If n > 1 or unknown, then stock infectivity should be determined at a dilution resulting in an MOI as close as possible to the desired experimental MOI. Alternatively, the inoculum size can be empirically determined to yield the desired number of infected cells. In these ways different virus/cell type pairs can be compared more fairly.
  
  Box 1: Recommendations on titrating viral stocks and on performing experiments when comparing different viral strains.”
  
  Reviewer #3 (Recommendations for the authors):
  
  FROM PUBLIC REVIEWS (2) Discriminating mechanisms: At present, simulations cannot distinguish between accrued damage and viral compensation. The authors should propose or add a decisive experiment (e.g., dual-color coinfection to quantify true coinfection rates versus "priming" without coinfection; timed sequential inocula) and outline expected signatures for each mechanism.
  
  This is a good point but to propose a good experiment we need to narrow down the “generic” mechanism to specific processes/genes. We put forward some ideas but clearly more work is needed here:
  
  “Second, while we have proposed alternative mechanisms that may result in apparent cooperativity, at present we could not discriminate between these alternatives, in part, because the models lacked specifics – e.g., if virions interacting with a cell reduce its resistance to infection, what does it mean exactly [12]? If virions in a collection augment their infectivity (which may be expected for segmented viruses), how does that viral compensation actually work? Designing experiments that would discriminate between these alternatives would require focusing on a specific mechanism. For example, it may be that that the initiation of gene expression is just difficult but is more efficient when there are more virions bringing in more tegument transactivators like pp72/ppUL35 [59]. Alternatively, it may be that there is a bona fide resistance mechanism at play here (e.g. “interferon”) that is antagonized by a viral tegument protein (like TRS1/IRS1 that acts against PKR and 2’5’OAS) [60]. Accrued damage model is also consistent with the idea that at higher genome/cell, the inoculum itself (including cell and/or virion debris) may impact overall susceptibility of all cells in culture, for example, making them more susceptible to infection. It may be expected, though, that exposing cells to debris would increase cell resistance to infection; this would result in n < 1 that we did not observe at small genomes/cell values. Addressing these hypotheses is an area of future research that will require funding.”
  
  (1) Methods transparency: Include raw spreadsheets or tables of dilution factors and per-well genome estimates used for Figure 1A; this will help reproducibility of the df = 1.3-1.5 pipeline.
  
  Provided as supplemental xlsx file.
  
  (2) Epithelial vs fibroblast contrast: Since n is lower on epithelial cells, expand on cell-intrinsic barriers that could dampen apparent cooperativity, and if this argues against simple clumping.
  
  Indeed, this is our point that we raised in Discussion. Since ECs show lower n than fibroblasts, this observation argues against clumps. Going forward the contrast between cell types will be an approach to understand mechanism. One difference is entry pathways, the ECs involve endocytosis and endosome acidification whereas the fibroblasts do not. There are clearly different receptors involved also, although they are not clearly characterized. One recent report that might be relevant is Ohman 2024 PNAS that shows the gH/gL/UL128-131 complex (aka, "pentamer") is not just dispensable for entry into fibroblasts, but inhibitory. They suggest that the pentamer might bind to a receptor on fibroblasts that activates a pathways that acts against viral IE expression, It could be that in this situation, more virions are really helpful to overcome that block, whatever it is. We now update this point in Discussion.
  
  (3) Visualization: In Figure 2, consider showing confidence bands for the fitted slope (n) within the colored fit window and reporting n {plus minus} SE in the panels.
  
  Because we used custom scripts to fit models to data, showing bands of model predictions was a bit complex and would interfere with data points. But we now show 95% Cis for the estimated value n (that are listed in Suppl. Tab S2).
  
  (4) Symbols: Define all symbols (e.g., V₀, n) on first use in the main text, not only in Methods.
  
  Done.
  
  (5) Plot axes check: Explain non-uniform axis labeling ("genomes/cell," "infections/cell").
  
  This comment was unclear – which labels were not “uniform”? Genomes/cell indicate the expected number of genomes (or virions) that a cell is on average exposed to, infections/cell indicates the probability that a cell actually gets infected.
  
  (6) Confidence interval for estimated parameters: Figure 3 A-C, please report estimated parameter intervals.
  
  These are listed in Suppl. Tab S2. Putting Cis for all estimates would clutter the figure making it hard to tell which CIs are for which estimate. But we put the Cis for estimated parameter n in Figure 2.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.23.650360v2
www.biorxiv.org www.biorxiv.org

Intracellular growth of Chlamydia trachomatis leads to global histone hypermethylation by impairing demethylation

1
1. Public_Reviews 22 Jun 2026
  
  in eLife
  
  Author response:
  
  Reviewer #1 (Public Review):
  
  This study by Charendoff et al provides interesting observations related to global histone hypermethylation in host cells, during Chlamydia trachomatis infections. The core observation they report is that the host histones are highly hypermethylated during infection, and this appears to be an amplifying effect due to continuous inhibition of demethylases, in part due to a metabolic shift in the host where succinate amounts (which inhibit demethylases) increases. The authors claim specifically due to the bacteria, since antibiotic treatment prevents histone hypermethylation (but leaves you wondering about cause/consequence correlations).
  
  The core observation of hyper methylation is very interesting, and well documented. There are a number of points to consider though in order to fully substantiate the findings, and close out loose ends. My comments are broad - and built around the interpretations (vs the data presented).
  
  (1) Related to observations coming Fig 1C etc, and connecting to Fig 3 - the hyper methylation appears to be across different protein arg/lys residues - and is not histone specific. So, is it just a consequence of high SAM pools and flux in infected cells? i.e. the bacterial infection increases SAM pools in cells, and provides an increase in substrate pools for the methyltransferases, leading to protein hyper methylation. The approach used here only measures steady-state SAM amounts (and not SAM flux or utilisation).
  
  For example, reduced SAM amounts in nuclei could be due to increased utilisation of SAM. The experiments done with the demethylase does not actually answer this question - if you decrease demethylase activity, you will get an increase in net methylation. The authors see an increase in net methylation in the infected cells - this would suggest that in addition (or perhaps primarily) to reduced demethylase activity, there could be much higher SAM utilisation/flux. Again, the over expression of JMJ proteins does not resolve this problem.
  
  This is an important point. Indeed, one limitation of the initial version of the paper was that we had measured SAM concentration only at one time point (40 hpi) and on the whole population. During revision we used a ratiometric sensor to measure SAM concentration in cells (PMID 34937909). We observed cell-to-cell heterogeneity in SAM levels in HeLa cells, as previously reported in other cell lines. Chlamydia inclusions develop asynchronously, which allows to observe, 40 hpi, a continuum of early (low bacterial load) to late (high bacterial load) stages of infection. We observed no correlation between bacterial load and SAM level, and SAM levels were globally similar when comparing infected and non-infected cells. This experiment strongly supports the hypothesis that protein hypermethylation is not due to an increase in SAM during infection. The data were added in the New Fig. 3. Note that the former Fig. 3 is now split into New Fig. 3 and New Fig. 4.
  
  (2) Adding to this - what happens to SAM pools in the cells treated with the inhibitors? This actually may not look like the slightly reduced SAM pool observed in infected cell nuclei. Also, what is the SAM/SAH ratio (a very useful indicator of methylation activity).
  
  Based on the high cell-to-cell heterogeneity of SAM levels observed with the ratiometric probe, we reasoned that measuring SAM/SAH ratio without single cell resolution would not bring crucial information. Also, the discrepancy between data displayed in new Fig. 3A (nuclear extracts) and 3C (live cell imaging) indicate that SAM might be less stable in cellular extracts from infected cells compared to non-infected ones, which would complicate the interpretation of the data. Therefore, we did not implement LC-MS/MS on nuclear extracts to measure SAM/SAH ratio.
  
  (3) There is a correlation/implication issue here in Fig 2 - cells with C. trachoma's infection show hyper methylation. But these are the only cells with high C. trachomatis. So it is a bit ingenious to say that histone hyper methylation correlates with bacterial proliferation. The cells without bacteria don't have hyper methylation - and that does not have anything to do with the bacterial proliferation.
  
  In Fig. 2B, we compared the methylation signal within the population of infected cells only (excluding the uninfected cells). We edited the text to clarify this point. “We observed that, within the population of infected cells, the sum intensity of the mCherry signal was higher in cells that displayed hypermethylation of H3K9me3 than in cells with low level of H3K9me3, indicating that histone hypermethylation correlated with bacterial load (Fig. 2B).”
  
  (4) The claim that demethylase activity is down in infected cells again comes primarily from the increased succinate (2-fold) amounts in infected nuclei - and then correlated with experiments where succinate, (permeable) a-KG are supplemented in excess. While I personally like the hypothesis that the hypermethylation might be a result of an imbalance in cofactors (succinate vs a-KG) in infected cells, the data presented is very premature to make that conclusion. Again, steady state measurements of only succinate cannot provide a clear answer to that question. For example, is there a clear allocation/flux difference (between a-KG, and leading out to glutamate/glutamine, vs flux through the TCA and increased succinate accumulation? Is there a bottleneck/build-up of succinate in cells that might lead to the increase in nuclei? This also opens another direction of possible regulation - increased histone succinylation. When you see a large increase in succinate in the nucleus, before looking at demethylase activity - it becomes obvious if succinate itself increases histone succinylation (through HATs).
  
  Our work confirms the accumulation of succinate in cells infected by C. trachomatis, previously reported in Rother et al 2018. The reason for this accumulation remains to be investigated in detail. We have previously shown that OxPhos is relatively stable in infected cells (PMID 35931114), indicating that the flux through the TCA of the eukaryotic host proceeds normally. As mentioned in our discussion, the TCA of the bacteria is disrupted with several enzymes missing, although not in the step immediately downstream of succinate/fumarate production. Still, synthesis of succinate and fumarate (fumarate accumulation was observed in the Rother 2018 study) by bacterial enzymes might contribute to their accumulation in infected cells. The approach we chose to measure methylation at the proteome level is not suitable to look for histone succinylation, because of the diversity of post translational modifications on histones, which occur in combinations. However, following on this reviewer’s comment, we reanalysed the proteomic data to compare protein succinylation levels in infected and non-infected samples. We detected 41 succinylated peptides in the infected samples, against 23 in the uninfected samples. For many of these, we did not have quantitative data in all condition and only one protein, transportin 1 (TNPO1), reached statistical significance, with a 4-fold increase in succinylation in infected samples. Thus, while essentially qualitative, this analysis fully supports the hypothesis that succinate accumulates in infected cells. These data were added to Table S1 and to the result section.
  
  (5) What might the authors hypothesise about why this hyper methylation happens? It appears in some ways that hyper methylation happens - potentially due to a metabolic bottleneck that the bacteria triggers (and there is a build-up of SAM and/or succinate, and altered flux out of a-kg). The methylation is just a visible outcome - but may not be central to pathogenesis or viability.
  
  We discussed this question in the penultimate paragraph of the discussion by giving some elements of answer to the question: “Does it benefit the host or the bacteria? ». In our study, we showed that protein hypermethylation affected the transcriptional response of the host. We did not investigate whether the activity of some of the host proteins engaged in the response to infection were affected. It might be the case, considering that methylation is a common PTM regulating protein’s activity. Still, we agree with this reviewer that hypermethylation might not be central to pathogenesis or viability. Addressing this question would require a complex model in which protein methylation levels could be controlled experimentally.
  
  Reviewer #2 (Public Review):
  
  Strengths:
  
  (1) Because the study compares genuinely infected cells with uninfected cells within the same infected cell population, it enables a clearer and more rigorous comparison.
  
  (2) By using multiple Chlamydia species and cells from multiple host species (human and mouse), and obtaining consistent findings across these systems, the study demonstrates the generality of bacterium-induced epigenomic alterations.
  
  (3) The study shows that the epigenomic changes are caused by reduced activity of JMJC domain-containing lysine demethylases, demonstrating through multiple complementary approaches-including the use of a demethylase inhibitor, overexpression of target-specific demethylases, and analysis from the perspective of cofactors required for JMJC domain-containing demethylases-that decreased lysine demethylase activity constitutes the molecular mechanism underlying the increased H3 methylation levels induced by Chlamydia infection.
  
  (4) By performing ChIP-seq analyses of H3K4me3 and H3K9me3, the study clearly delineates, on a genome-wide scale, how infection leads to increased levels of these epigenomic marks.
  
  Weakness:
  
  (1) Reduction of cofactors such as Fe2+ or a-KG decreases the activity of JMJC-domaincontaining lysine demethylases (thereby directly affecting histone H3 lysine methylation). However, these cofactors are also involved in the activities of other epigenetic regulators, such as TET enzymes that contribute to DNA demethylation and SIRT family proteins that mediate histone deacetylation. Therefore, it cannot be excluded that modulation of these factors indirectly leads to the changes in H3 lysine methylation dynamics targeted in this study.
  
  Indeed, reduction of the concentration of Fe2+ and aKG is expected to have other consequences in addition to the inhibition of JMJC-domain containing lysine demethylases on which we focus in this study. As a matter of fact, we reported a decrease in the methylation level of host DNA in infected cells, and we brought some elements that might explain the discrepancy between DNA and histone methylation status in the discussion (e.g., infected cells display enhanced expression of GADD45, which recruit TET enzymes and thus facilitate DNA demethylation). This example illustrates the complexity of host/pathogen interplay, which affect many parameters simultaneously. Indeed, we cannot rule out that modulation of enzymatic activities other than JMJC-domain containing lysine demethylase contribute significantly to the hypermethylation phenotype.
  
  (2) Related to point 1, although overexpression of JMJC-type demethylases has been shown to reduce the Chlamydia infection-induced increase in H3 lysine methylation, it is well known that over production of these enzymes, while target-specific, also leads to a genome-wide reduction of lysine methylation. Thus, a decrease in lysine methylation upon expression of these demethylases does not necessarily demonstrate that the infection-induced increase in H3 lysine methylation is caused by impaired JMJC-type demethylase activity.
  
  We fully agree. We included this experiment to show that increasing the expression of one demethylase only restored demethylation of its cognate target. This support the hypothesis that if the hypermethylation is due to poor demethylase activity, it is likely that several demethylases show impaired activity (as opposed to a scenario in which failure of activity of a single demethylase would indirectly affect all other methylation marks).
  
  Reviewer #3 (Public Review):
  
  In this manuscript, the authors explore a molecular basis for hypermethylation of histones in epithelial cells infected with the obligate intracellular bacterial pathogen Chlamydia trachomatis. This is of particular interest given that Chlamydia is known to drastically alter host cell gene transcription, and histone hypermethylation would suggest a new way by which Chlamydia interferes with gene expression of its host. Histone methylation was previously implicated in the introduction of dsDNA breaks in infected cells, and the chlamydial effector NUE was reported to methylate histones, but the role of this modification in dictating host cell gene transcription has been unexplored. The authors use a suite of tools to approach this question, including various -omics techniques, genetic approaches, and biochemical assays. Overall, the manuscript provides many interesting pieces of data, though some of them are difficult to reconcile, which may reflect methodological hurdles that are not fully addressed in the current version of the manuscript. My major concerns regard the rationale/interpretation for various mechanistic experiments and that the heterogeneity of the histone hypermethylation phenotype is not addressed which I believe may explain some apparent inconsistencies in the results.
  
  We thank this reviewer for insightful comments. We address these two major concerns during revision and bring some elements in our responses below.
  
  Using an immunofluorescent approach, the authors show that a subpopulation of the nuclei in Chlamydia-infected cells (~10-20%) exhibit high amounts of methylated histone species. This occurs during the late stages of infection, near the time when Chlamydia would lyse the host cell and positively correlates with bacterial burden.
  
  Accordingly, halting chlamydial growth blocks the onset of histone hypermethylation. Exogenously supplying cofactors for histone demethylases, the low activity of which is implicated in the histone hypermethylation phenotype, reduces histone hypermethylation. In general, these data are compelling and raise interesting questions about the role of histone methylation in governing chlamydial egress from infected cells. Interestingly, these behaviors seem to arise independently of NUE, the secreted chlamydial histone methyltransferase, supporting the notion that a metabolic reprogramming may underlie the hypermethylation phenomenon.
  
  As noted above, the authors propose that hypermethylation arises due to decreased demethylase activity in infected cells. However, the data do not conclusively support this interpretation. For example, the approaches used to probe demethylase activity rely on (i) a direct biochemical measure of demethylase activity, (ii), pharmacological inhibition of demethylase, and (iii) heterologous expression of a specific demethylase. With the exception of (i), these approaches would be expected to alter histone methylation regardless of the source. That is, inhibition of demethylases should increase histone methylation regardless of whether the source of methylation is increased methylase or decreased demethylase activity. Similarly, overexpression of a demethylase would be expected to reduce cognate histone methylation arising either from increased methylase or decreased demethylase activity.
  
  We agree with the reviewer’s comments. The experiment using pharmacological inhibitors (ii) show that infected cells are sensitized to these inhibitors but doesn’t provide direct mechanistic insight. The experiment using heterologous expression of demethylases (iii) was included to show that increasing the expression of one demethylase only restored demethylation of its cognate target. This supports the hypothesis that several demethylases show impaired activity (as opposed to a scenario in which failure of activity of a single demethylase would indirectly affect all other methylation marks).
  
  The most direct evidence for impaired demethylase activity come from the direct measure of demethylation of H3K4me3 in nuclear extract (i). It is strengthened by indirect evidence that metabolite concentrations hinder demethylase activities late in infection: 1/ iron and DMKG supply diminish hypermethylation of histone lysine residues 2/ succinate levels (a competitor of aKG) are two-fold higher in nuclei isolated from infected cells. This latter finding was confirmed during revision as we identified more succinylated proteins in infected samples compared to non-infected ones.
  
  We also considered the possibility that infected cells displayed increased histone methyl transferase (HMT) activity. This would be compatible with decrease KDM activity and could contribute to the histone hypermethylation. Unfortunately, this hypothesis cannot be tested directly (as we did for the measure of H3K4me3 demethylation activity). Indeed, SAM is notoriously labile and in vitro assays to measure HMT require to add exogenous SAM to cell extracts to detect any HMT activity, which would not allow us to test activity based on endogenous SAM levels.
  
  Instead, we used a ratiometric sensor to measure SAM concentration in cells (PMID 34937909). Chlamydia inclusions develop asynchronously, which allows to observe, 40 hpi, a continuum of early (low bacterial load) to late (high bacterial load) stages of infection. There was no correlation between bacterial load and SAM level, and this level was globally similar when comparing infected and non-infected cells. This experiment supports our hypothesis that protein hypermethylation is not due to an increase in SAM during infection.
  
  This experiment was also very interesting because it revealed a high cell-to-cell heterogeneity in SAM levels in HeLa cells. Thus, in some cells, SAM might be limiting, which could explain why only a fraction of cells display histone hypermethylation.
  
  Still, we cannot fully rule out the possibility that increase in SAM availability late in the infectious cycle in some cells, and is immediately consumed through protein methylation, resulting in no net [SAM] increase. The discussion was expanded to take these comments into consideration.
  
  Altogether, we think that the evidence of decrease KDM activities in infected cells late in infection are strong. Our data do not rule out the possibility that additional mechanisms may contribute.
  
  Moreover, the authors report that the effect of the demethylase inhibitor on histone hypermethylation is significantly potentiated by infection, suggesting that infected cells have greater methylase activity than uninfected cells, because the latter barely respond to the presence of demethylase inhibitor. In other words, a dramatic increase in histone methylation in the presence of demethylase inhibitor is most parsimoniously explained by increased methylation (no longer being removed by demethylase), not decreased demethylation (which would be analogous to treatment with demethylase inhibitor). The authors do not directly assay methylase activity. These concerns extend to the rationale used to justify experiments with infected mice, which the authors treat with the demethylase inhibitor.
  
  The observation that the same concentration of JIB-04 leads to an increase of histone methylation in infected cells and not in non-infected cells, is coherent with the data showing that aKG or iron supply diminish histone hypermethylation in infected cells. Indeed, the inhibitor is taken up similarly by infected and uninfected cells but the potency of the inhibitor will depend partly on levels of iron, aKG and succinate found in the cellular milieu so same concentration of inhibitor may inhibit demethylase activity in cells with higher succinate and/or low aKG and low iron but fail to inhibit demethylase activity in cells with higher iron or aKG or lower succinate. In other words, high iron, high aKG or low succinate will “buffer” JIB-04 and make it less potent since JIB-04 partly acts by competing with the iron (competitively) and the aKG (mixed competitive inhibition) PMID 23792809. The same phenomenon is expected for SD70 and TACH101 that share aspects of the mode of action of JIB-04 regarding partly competing for aKG and/or iron in the catalytic site.
  
  The authors perform experiments to characterize the consequence of hypermethylation genome-wide. Because the authors do not enrich for those cells which exhibit histone hypermethylation, the results reflect the mixed population, and therefore presumably dilute out important signal related to the phenomena under investigation. For example, the proteomic analysis of post-translational modifications identifies only one methylated histone species, whereas the immunofluorescent approach shows consistent effects across five different methylated histone species. Moreover, the chromatin immunoprecipitation analysis indicates that there is unexpectedly a lower density of methylated histones at regions which are also enriched in uninfected cells. The authors argue that this suggests increased methylation is happening "outside" of these histone-dense regions, but direct evidence in support of this claim is lacking.
  
  The caveat of bulk analyses as opposed to single cell resolution is indeed important to consider when analysing the chIP-seq data and we emphasized this point in the revised manuscript. We could have sorted the cells with high bacterial burden; this would probably have given stronger differences between the two samples. Still, the change in distribution of H3K4me3 in infected samples was very clear and statistically significant. A change in H3K9me3 distribution would be more difficult to catch, as the mark is more widespread.
  
  In sum, this paper provides compelling evidence in support of the notion that histones are hypermethylated at various residues late in chlamydial infection, that this process is modulated by known cofactors of demethylases, and is the result of high levels of bacterial replication in the cell. That histone hypermethylation governs host gene transcription during chlamydial infection suggests a relatively novel mechanism by which Chlamydia subverts the host cell to establish a replicative niche or egress to infect a new cell. The information obtained regarding the methylation status of host proteins and host gene transcription controlled by a metabolic cofactor during infection will be a useful resource for other researchers. However, in the current version of the manuscript, the mechanistic basis for these behaviors is relatively unclear.
  
  We thank this reviewer for constructive feedback. We believe that the mechanistic conclusions of our report have been strengthened during revision with additional experiments and text clarification.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.04.597420v2
www.biorxiv.org www.biorxiv.org

Shared and organ-specific gene expression programs of fibrotic diseases

1
1. EMBOpress 22 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  General Statements
  
  Thank you for providing an assessment of our manuscript. Below, we outline our revision plan. The revisions address four main areas: the relationship between the identified molecular signatures and fibrosis severity or disease etiology; the criteria used to identify disease-associated fibroblasts; the interpretation of the genes and biological processes highlighted by our analyses; and the broader biological insights supported by the study.
  
  As part of the revisions implemented, we have:
  
  Associated organ-specific fibrotic molecular signatures and fibrosis severity scores available in the clinical metadata, helping to relate the identified transcriptional patterns to biologically meaningful aspects of fibrosis. Extended supplementary figures that more clearly present the decision-making process used to identify fibroblast subpopulations associated with fibrosis. Revised the methods, figures, legends, and captions in response to the reviewers' suggestions to improve clarity. Expanded the discussion of the results by incorporating the literature suggested by the reviewers, thereby providing additional context for the identified fibrotic signatures. Extended our spatial analysis using a more robust identification of fibrotic regions.
  
  We plan to:
  
  Extend our cell-cell communication and spatial analysis using deconvolution methods Provide comparisons between our unsupervised multicellular factor analysis of multiple studies with our supervised fibrotic signatures to ensure coherence between analyses. Perform additional comparisons between specific pairs of organs and additional cell types, instead of focusing solely on the comparison of all organs simultaneously. Expand the results and discussion to clarify the relevance and limitations of our study. We believe these revisions will strengthen our resource manuscript and will help us to provide a robust and reliable description of fibrotic processes across organs.
  
  Description of the planned revisions
  
  Reviewer #1
  
  Reviewer #1, major comment 1: The group has been developing cutting edge bioinformatic tools for the community. The authors also provided scripts and the processed data for reproducibility. I have no doubt in their implementation of the methodology. I also understand the reasons of the objective tone throughout the manuscript. However, the authors made very little claims with biological significance. The conclusion of the study is vague with almost nothing mentioned in the abstract. What are the cross-organ effects in fibrosis identified in this study? I believe some additional claims would facilitate the reader with less technical knowledge to grasp the study better.
  
  We understand the concern of the reviewer regarding the lack of an explicit discussion of the biological significance in the abstract and other parts of the manuscript, as most of the manuscript is focused on the comparison of studies at different levels. Our study defines which fibrosis-associated transcriptional patterns are reproducibly detectable across the currently available public single-cell datasets, while also identifying where cross-organ interpretation remains limited. We observed that some disease-associated transcriptional patterns recur across organs and studies, particularly in mesenchymal and endothelial compartments. In contrast, other compartments, including myeloid cells, showed weaker cross-organ agreement, which may reflect either greater tissue-context dependence or stronger sensitivity to differences in disease stage, sampling, and annotation. Finally, we observed a convergence of fibrotic signals in a subset of mesenchymal cells and show which genes are specifically expressed in actively scarring regions across organs, with TIMP1 being consistently identified as highly expressed in fibrotic regions by disease associated fibroblasts across tissues and modalities.
  
  Our results should be interpreted as robust and reproducible cross-dataset fibrosis signatures rather than definitive evidence for a specific pathophysiological mechanism. Therefore, we believe that the primary contribution of this study lies not in assigning causal roles to individual genes or pathways, but in providing a systematic framework for identifying fibrosis-associated programs that are reproducibly observed across studies, organs, and disease etiologies. As our analysis is entirely computational, we intentionally avoid making strong mechanistic claims without experimental validation. Instead, we envision this resource as a means to prioritize candidates and generate hypotheses for future functional studies.
  
  To address the reviewer's concern, we will make more explicit claims of our observations within our abstract and throughout the text to make our intentions and conclusions clearer. We will be more explicit about what information we are providing with our resource and how it can best be leveraged. We will further include our conclusions about cross-organ agreements described above, as well as specific observations from our analyses that help the reader to get a better grasp of the study.
  
  Reviewer #1, major comment 3: The authors performed multicellular factor modeling in each organ and identified factors that are distinct in fibrotic and reference tissue in Fig. 2B, e.g., factors 1 and 2 in heart. Are these factors driven by specific biological pathways? Could these factors also be used to identify common biological functions in fibrotic tissue across organs?
  
  We agree that, in principle, the latent factors identified by the multicellular factor models could be interrogated for their biological interpretation. Each factor is associated with a gene-weight vector per cell type, which can be analyzed similarly to a differential expression signature to identify enriched pathways and biological processes.
  
  However, we chose not to pursue a systematic factor-level interpretation for three reasons. First, as shown in Suppl. Figure 3, the contribution of individual factors to the separation between fibrotic and reference samples varies substantially across organs. In some organs, the distinction is largely captured by a single factor, whereas in others it is distributed across multiple factors. Second, because the models were trained independently for each organ, there is no direct correspondence between factor identities across organs, making cross-organ comparisons of individual factors difficult to interpret. Finally, we were not able to capture a fibrosis-related transcriptomic program from all organs.
  
  We therefore used the multicellular factor analysis primarily as an unsupervised approach to assess whether common fibrosis-associated variation could be detected across datasets. The observation that fibrotic and reference samples consistently separated along latent factors suggested the presence of shared disease-associated signals. For the subsequent biological interpretation, however, we opted for a supervised analysis framework based on differential expression and downstream functional enrichment, which allowed more direct and robust comparisons across organs and disease contexts.
  
  We will revise the manuscript to make this rationale more transparent to the reader. In addition, we will include an analysis demonstrating that the gene weights associated with the disease-relevant latent factors closely resemble the corresponding organ effect sizes in heart, kidney, and lung, illustrating that biological interpretation at the factor level yields conclusions that are highly consistent with those obtained from the supervised differential expression analysis. This further supports our decision to base the downstream functional analyses on the organ effect sizes, which provide a more straightforward framework for cross-organ comparison.
  
  Reviewer #1, major comment 4: Although strong organ-specific effects, the author detected similar transcriptional changes in endothelial and mesenchymal cells in heart and lung at Fig. 3B. The analysis on disease-associated fibroblasts also showed much higher overlapped between heart and lung compared to, e.g., liver and kidney in Fig. 4C. Are there additional shared fibrosis features or functions in mesenchymal cells or disease-associated fibroblasts in heart and lung?
  
  Reviewer #1, major comment 5: There seems to be certain degree of similarities among the epithelial cells in kidney and lung in Fig. 4B.
  
  Shared response for comments 4 and 5:
  
  Given the high number of combinations of comparisons, we decided to focus on the most shared signals (mesenchymal and endothelial) in our manuscript. However, as the reviewer notes, there are other comparisons, such as the one between epithelial cells from kidney and lung, or in endothelial cells between heart and lung, that may be important to report. We plan to revise the text in section "Fibrotic disease programs within tissues" to explicitly discuss the observed similarity and plan to additionally show the shared genes driving these similarities in a supplementary Figure in the manuscript.
  
  Reviewer #1, major comment 8: TNC appears in the lower bottom of the list in Fig. 6C. It is unclear why TNC was chosen as a board therapeutic target in the end.
  
  We agree that the original wording may have implied that TNC was selected because it was the top-ranked candidate in Figure 6C. This was not our intention. Rather, we chose TNC as an illustrative example because it emerged from our analysis without prior manual prioritization, has already been linked to fibrosis in specific disease contexts, and has been explored experimentally as a therapeutic target. At the same time, its role has not been investigated broadly across fibrotic diseases, making it a useful example of how the presented framework can identify candidates that may have relevance beyond the settings in which they were originally studied.
  
  We will revise the text to clarify that TNC is presented as one representative example from the set of prioritized candidates rather than as the single most highly ranked therapeutic target.
  
  Reviewer #1, minor comment 1: Is there additional measure that account for the datasets with lower RNA counts shown in Fig. S1?
  
  We thank the reviewer for highlighting this potential source of technical variation. We did not apply an additional correction specifically to account for datasets with lower RNA counts. Instead, to minimize the impact of differences in sequencing depth and cell-level sparsity across datasets, the majority of our analyses were performed on pseudobulk profiles rather than individual cells. Pseudobulk aggregation substantially reduces the influence of variation in RNA counts between cells and datasets, providing more robust estimates of gene expression. We therefore believe that differences in RNA counts had a limited impact on the main conclusions of the study. To illustrate this point, we plan on showing additional quality control summary plots for our pseudobulked data.
  
  Reviewer #2
  
  Reviewer #2, major comment 3: Fig. 4B-C: the full list of organ-specific and overlapping genes should be given in a supplemental table.
  
  We thank the reviewer for this suggestion. We agree that providing the complete lists of organ-specific and overlapping genes improves the transparency and utility of the analysis. We will provide the full gene lists underlying Figures 4B-C as supplementary tables in the revised manuscript. These tables will provide the complete set of genes used for the reported overlap analyses and allow readers to further explore the identified organ-specific and shared fibrotic programs.
  
  Reviewer #2, major comment 6: Cell-cell communications analysis: It would be informative to add a circosplot highlighting the best cell-cell communication candidates in each organ. The authors should also provide the full list of predicted interactions in a supplementary table, including scores for each organ for each interaction. Additionally, it would be important to focus specifically on ligand-receptor pairs associated with growth factors and cytokines. While incorporating Visium data is very interesting and challenging, it may reduce sensitivity due to its relatively poor capture efficiency. This could particularly overemphasize the importance of collagens and other ECM-related factors, which are highly expressed.
  
  We agree that additional visualization and data availability would improve the presentation of the cell-cell communication analysis. Therefore, we will add additional organ-specific visualizations highlighting the highest-confidence cell-cell communication candidates within each organ, providing a more intuitive overview of the predicted interactions. Second, we plan to include the complete list of predicted ligand-receptor interactions as supplementary tables, including the corresponding scores for each organ and gene annotations (i.e. cytokine, growth factor, etc.), allowing readers to explore the full set of predictions underlying the analyses.
  
  We also agree that highly expressed extracellular matrix components, such as collagens and proteoglycans, can dominate CCC analyses, especially when investigating fibrotic diseases. Indeed, this consideration motivated our final therapeutic target prioritization strategy (Figure 6). In this analysis, we specifically excluded collagens and proteoglycans, thereby enriching for extracellular signaling molecules that are more likely to represent biologically informative and therapeutically actionable cell-cell communication events. We will modify the results section to clarify our rationale for this analysis.
  
  Reviewer #2, major comment 8: Visium Dataset Analysis: It would be interesting to compare fibrotic areas across different organs by performing niche or topic analyses using supervised deconvolution approaches (such as RCTD). This would allow for a better estimation of cell composition and functional annotations of fibrotic and inflammatory areas.
  
  We agree that a cell type deconvolution would provide an informative framework for characterizing the cellular composition of fibrotic niches and its association with the fibrotic signatures we derived from single-cell data. We plan to address the reviewer's suggestion by running a cell type deconvolution analysis of the Visium datasets to estimate the enrichment of major cell populations within scar regions and compare them across organs. We hope that these additional analyses will provide complementary information on the cellular composition of these areas.
  
  Reviewer #2, minor comment 1: p11: the authors conclude that "cell proportions differed not only between patients and organs, but also that there was no uniform abundance change in disease". This result may reflect technical variability, particularly due to dissociation biases from very different organs or the use of different platforms. This limitation should be discussed.
  
  We agree that differences in cell type proportions may not only reflect biological variation but can also be influenced by technical factors, including organ-specific dissociation biases, differences in tissue processing, and the use of distinct sequencing platforms. We will expand the text to explicitly acknowledge these potential confounding factors and to emphasize that the observed differences in cell abundances should be interpreted with appropriate caution.
  
  Reviewer #2, minor comment 3: Panel E in Fig. 5 is difficult to read and needs to be improved.
  
  To improve the readability of the figure, will include fewer ligand-receptor pairs and additionally add grey boxes in the background to help the reader to better distinguish the ligand-receptor pairs from each other.
  
  Reviewer #3
  
  Reviewer #3, minor comment 1: P5: Some context regarding expected differences between single cell and single nuclei datasets here would be good (especially if some differences are potentially important).
  
  We agree that adding context regarding the expected differences between single cell and single nuclei datasets would add value to the manuscript. These differences have been investigated in the past and were shown to have an impact on the RNA-sequencing results and their interpretations (Van Melkebeke et al. 2024; Lake et al. 2023; Feng et al. 2026; Denisenko et al. 2020; Litviňuková et al. 2020; Koenitzer et al. 2020). We therefore plan to include more background information, including the distinct capture biases and transcriptomic characteristics, to highlight that these differences should be considered when comparing datasets generated using different protocols.
  
  Reviewer #3, minor comment 6: *P12: Please clarify whether the multicellular factor model is fit jointly across all datasets within an organ, or separately per dataset followed by comparison. If fit jointly, how are batch/study effects handled? If fit separately, how are factors aligned across invocations? *
  
  Is it possible to say how much of this consistency across datasets is due to non-fibrotic or non-disease state regulation? Are the disease-associated factors driven by coordinated changes across multiple cell types, or primarily by one dominant cell type? And if the latter, is this related to expression magnitude, or cell type abundance?
  
  We agree that the description of the multicellular factor model in the original manuscript did not provide sufficient methodological detail.
  
  The multicellular factor model was fitted jointly across all datasets within each organ, resulting in one model per organ (four models in total). Following the strategy proposed in the MOFA+ framework (Argelaguet et al. 2020), individual studies were treated as groups within the model, allowing the integration of multiple datasets while accounting for study-specific effects. Because the model uses cell type-specific pseudobulk profiles as separate views, the inferred factors reflect coordinated transcriptional changes across cell types rather than differences in single-cell abundance. Pseudobulk aggregation substantially reduces the influence of cell number variation, and we applied quality control thresholds to ensure that only samples with sufficient counts for each cell type were included.
  
  To further clarify the relationship between latent factors and fibrosis, we plan to add an additional analysis showing the proportion of variance explained (R²) by each factor across studies and cell types. The R² can be used as a proxy of the importance of a cell-type in defining the latent factor. Whereas many latent factors capture sources of biological or technical variation unrelated to disease, only a subset consistently separates fibrotic from reference samples. These disease-associated factors therefore represent fibrosis-specific variation rather than general transcriptional structure and are the factors we highlighted in the manuscript text to support that different studies had a consistent disease signal.
  
  We will incorporate these clarifications into the manuscript to make the modeling framework and its interpretation more transparent and add additional analyses showing the variance explained as extra insights into the models.
  
  Reviewer #3, minor comment 12: *P23: What conclusions should be drawn from the broad cell-type communication comparisons between organs in Fig. 5A? The text reports which broad cell-type pairs account for many upregulated ligand-receptor interactions, but it is not clear whether these comparisons identify fibrosis-specific communication or mainly reflect broad tissue architecture, cell-type abundance, etc. *
  
  If the broad categories were chosen because finer cell-state annotations are not consistently available across studies, it would be helpful to state this limitation explicitly.
  
  We agree that the rationale and interpretation of the broad cell-cell communication analysis should be described more clearly in the manuscript.
  
  The analysis shown in Figure 5A is based on the organ-specific mixed-effects differential expression models and therefore reflects disease-associated changes in ligand and receptor expression between fibrotic and reference samples, rather than absolute expression levels. Therefore, Figure 5A shows which cell type pairs increase their communication in fibrosis, based on the amount of ligand-receptor pairs that are differentially expressed above a threshold. As the mixed-effects models run per cell type separately, it is unlikely that an increase in cell type proportion causes more upregulated communication events to another cell type with this type of analysis. Overall, we do not see a correlation between increase in cell type proportion in the tissue (Figure 2A) and number of upregulated genes with the mixed effect models (Figure 4A). Therefore, we do not think that cell type proportions have a high effect on this particular analysis.
  
  We also agree that the use of broad cell type categories warrants clarification. These categories were chosen because they can be robustly harmonized across the diverse datasets included in this meta-analysis, whereas finer cell-state annotations are not consistently available or comparable across studies and organs. We plan to revise the manuscript to clarify both the interpretation of Figure 5A and the rationale for using broad cell type categories in this analysis.
  
  Reviewer #3, minor comment 14: P31: The therapeutic suggestions should come with some discussion that this is association rather than causation, as it's not established that these are causal drivers. MOXD1 seems compelling, especially if this has been observed to have a potential therapeutic effect in other fibrotic diseases, and this is an excellent outcome that justifies the meta-analysis approach. TNC is somewhat more speculative in this regard, so if there is any mechanistic or other motivations, it would be good to include them here.
  
  We agree that the therapeutic implications of our findings should be interpreted with appropriate caution, as our analyses identify associations rather than causal drivers of fibrosis.
  
  These candidates were selected based on the combination of our computational prioritization results and the existing literature, rather than a causal role that has been established by our analysis. Our intention was to provide representative examples of how the presented framework can recover biologically plausible candidates with existing experimental support while simultaneously suggesting their potential relevance across a broader range of fibrotic diseases. We plan to revise the discussion to more clearly emphasize that the proposed therapeutic candidates represent hypothesis-generating observations that require experimental validation.
  
  Reviewer #3, minor comment 16: P31: It would be nice to have what you think the issues are with the lack of patient metadata, and how these issues might manifest in the analyses (this links with the previous comment regarding disease stage).
  
  The lack of detailed clinical and histological metadata substantially limits the range of biological and clinical questions that can be addressed, thereby reducing the value that can be extracted from the considerable effort and cost associated with large-scale tissue sequencing studies. In the current study, we are mostly restricted to comparing fibrotic and reference samples because information such as disease stage, fibrosis severity, time since diagnosis, medication, treatment history, tissue sampling location, and other clinical covariates is largely unavailable or inconsistently reported across studies. If these metadata were available, they could be explicitly incorporated into the statistical models, allowing analyses that relate transcriptional changes to clinically relevant variables such as fibrosis severity or disease progression rather than simply disease status.
  
  Furthermore, additional patient metadata would allow potential confounding factors to be accounted for or controlled in the analysis. For example, treatment effects or other clinical characteristics could be modeled directly or specific patient groups could be excluded where appropriate, leading to a clearer separation of disease-associated biology from technical or clinical confounders.
  
  We will expand the Discussion to more explicitly describe these limitations and their potential impact on the interpretation of our results.
  
  Description of the revisions that have already been incorporated in the transferred manuscript
  
  To facilitate review of the revised manuscript, we have grouped our responses into two categories. First, we address comments that resulted in substantial new analyses, figures, or modifications to the interpretation of the results. Second, we address minor and editorial comments, which have already been directly incorporated into the revised manuscript.
  
  3.1 Comments requiring additional analyses or substantial revisions
  
  Reviewer #1
  
  Reviewer #1, major comment 2: The authors have pooled the data from at least five different disease per organ to identify the pan-fibrosis signature across diseases. Some of the diseases, e.g., pneumonitis, ICM, MI, MCD, ALD) may present more acute remodeling compared to the rest, which might exhibit distinct features that mask the analysis. The extent of fibrosis also varies very significantly. A correlation with histological data is required.
  
  We agree that fibrotic diseases differ substantially with respect to disease etiology, disease stage, extent of remodeling, and the degree of fibrosis present in the tissue. We had highlighted this as a key limitation of the study in the discussion:
  
  "Second, the limited availability of patient metadata leaves many aspects unresolved, including the exact diagnosis, disease severity, tissue sampling location, and the extent of fibrosis. If these aspects were better documented, they could be accounted for in the analysis and could allow a clearer distinction of physiological from pathophysiological fibrotic processes. Third, we treated all disease etiologies collectively under the term "fibrosis". However, the degree of fibrotic remodeling likely varies between conditions, and the dataset remains imbalanced in terms of sample representation across organs."
  
  While comprehensive histological and disease severity information was not consistently available across the published datasets included in our meta-analysis, we were able to further investigate this question in the subset of studies for which fibrosis-related metadata were available. Specifically, we derived organ-specific fibrosis signatures, scored these signatures across patients, and performed a per-study normalization. In these datasets, our derived organ fibrosis scores correlated with available fibrosis severity measurements, supporting the biological relevance of the identified programs (Figure S5A-D).
  
  In addition, these analyses indicate that fibrosis signature scores vary across disease etiologies, consistent with the reviewer's suggestion that different diseases may exhibit distinct degrees of fibrotic remodeling (Figure S5E). However, given that most of the etiologies are covered by a single study, it is not possible to disentangle these results from the type of controls used by each study and technical variability.
  
  Nevertheless, because detailed histological and clinical metadata are available only for a limited subset of studies, we believe that a comprehensive analysis of fibrosis severity, disease chronicity, and etiology-specific remodeling is not possible with the currently available data. Future studies with more uniformly annotated patient cohorts will be well-positioned to address these questions in greater depth. Our findings should therefore be interpreted as identifying molecular programs consistently associated with fibrotic disease across diverse conditions, rather than as a direct measure of fibrosis severity itself. We have included these observations in the results section "Identification of shared gene programs per tissue":
  
  "As multiple disease etiologies and disease stages were integrated in each organ, we asked whether the extracted organ-consensus genes were associated with fibrosis severity. However, fibrosis severity measurements were unavailable for the majority of studies, preventing a systematic assessment of severity across the integrated dataset. To nevertheless evaluate whether the identified programs captured biologically meaningful aspects of fibrosis, we derived organ-specific fibrosis signatures, scored these signatures across patients, and performed a per-study normalization. In datasets containing fibrosis severity measurements, our derived fibrosis signature scores correlated with fibrosis severity, supporting the biological relevance of the identified programs (Figure S5A-D). Furthermore, we observed differences in signature scores across disease etiologies (Figure S5E). However, because disease etiologies were unevenly distributed across studies, it remains difficult to distinguish true biological differences from study-specific technical effects. Overall, these results suggest that there is a part of the fibrotic program that appears to be shared within most tissues, primarily found in endothelial, mesenchymal, and epithelial cells. Furthermore, our findings indicate that the identified organ-consensus programs capture biologically meaningful aspects of fibrosis."
  
  To explain our methodology, we further added this section to our methods:
  
  "Fibrosis severity scoring
  
  To associate the organ-consensus gene signature with fibrosis severity, we first extracted an organ-consensus gene set per organ from the organ-specific gene ranking. Specifically, for each cell type and organ, genes were ranked based on the random-effects meta-analysis estimate obtained from differential expression analyses across studies. Only genes detected in at least three studies were considered for downstream analyses. Positively associated genes were required to have a non-negative upper confidence interval bound and were ranked by decreasing effect size, whereas negatively associated genes were required to have a non-positive upper confidence interval bound and were ranked by increasing effect size. The top 200 positively associated genes and the top 100 negatively associated genes were retained for each cell type-organ combination.
  
  To give each sample a fibrosis score, pseudobulk profiles were generated for each study by aggregating raw counts across all annotated cells per sample, excluding samples with fewer than three annotated cell types. Pseudobulk count matrices were normalized to 10,000 counts per sample, followed by log-transformation. Gene set activities were inferred per sample using decoupler's (124) (v1.9.0) univariate linear model (ULM) with curated organ-consensus gene sets, yielding enrichment scores for each sample.
  
  Finally, these enrichment scores were normalized per study: For each study, the mean and standard deviation of enrichment scores were calculated for all control samples. Sample-level scores were then centered against the corresponding study-specific control mean and additionally converted to standardized scores by dividing by the control standard deviation."
  
  Reviewer #1, major comment 7: The graphs in Fig. S6A do not clearly present how the disease-associated fibroblasts are identified. The true identities of disease should also be plotted in these UMAPs. The results indicating these cells expressed myofibroblast signature should also be shown confirming that these cells are not other mesenchymal cells, e.g., pericytes or smooth muscle cells.
  
  We agree that the original supplementary figures did not sufficiently illustrate how disease-associated fibroblast populations were identified and distinguished from other mesenchymal cell types. To improve transparency, we have substantially expanded the original Figures S6A-C with four organ-specific supplementary figures (Figures S6-S9). For each organ, we now provide:
  
  Cluster-level compositional analyses showing changes in abundance between healthy and fibrotic samples. (A) Percentage of mesenchymal cell labels as disease-associated fibroblast (blue) and "rest" per study. (B) Expression of canonical marker genes for myofibroblasts, pericytes, and smooth muscle cells across clusters. (C) The top marker genes for the cluster(s) selected as disease-associated fibroblasts. (C) UMAP visualizations colored by disease etiology and disease condition (fibrosis vs. control), the study, and the original author-provided cell state annotations, including myofibroblast/activated fibroblast annotations where available. (D - G) UMAP visualizations colored by the final annotations used in the subsequent analysis. (H) These additions make the selection procedure substantially more transparent and provide multiple independent lines of evidence supporting the identification of disease-associated fibroblast populations.
  
  The rationale for the selected clusters is now evident from the revised supplementary figures. In the lung, the selected cluster 3 exhibits a clear increase in abundance in fibrotic samples, expresses canonical myofibroblast markers, and corresponds closely to activated fibroblast/myofibroblast annotations provided in the original studies. In the heart, the selected cluster 1 was the only population showing a robust disease-associated expansion together with strong myofibroblast marker expression and agreement with published annotations. Although another small cluster (cluster 4) displayed partial myofibroblast characteristics, its very low abundance would have a negligible impact on our pseudobulk-based analyses. In the liver, the selected cluster showed consistent expansion across studies and expressed canonical myofibroblast markers, although author-provided annotations were not available for direct comparison. Finally, the kidney datasets presented the greatest integration challenges, likely due to differences between single-cell and single-nucleus protocols. Here, we selected two clusters (cluster 0 and cluster 4) that increased in fibrosis and expressed fibroblast-associated markers, while excluding another expanding cluster (cluster 2) that showed a pericyte-like expression profile. Overall, our final annotations were broadly consistent with the original study annotations wherever such information was available.
  
  Changes in the manuscript:
  
  "We integrated the mesenchymal cell population per organ and identified a disease-associated cluster by compositional analysis (Figure 4A, Figures S7-Figure S10)."
  
  Furthermore, we added the following section to our methods to clarify our methodology:
  
  "Candidate clusters were required to show consistent enrichment in fibrotic samples and a transcriptional profile characteristic of activated fibroblasts/myofibroblasts. In cases where multiple candidate populations were present, clusters with low abundance or expression profiles inconsistent with myofibroblast identity (e.g., pericyte-like populations) were excluded. Final cluster assignments were validated against the original study annotations whenever available."
  
  Reviewer #2
  
  Reviewer #2, major comment 1: Fig.4A: Fibroblast Population Analysis. The authors integrated the fibroblast populations per organ to identify a disease-associated cluster by compositional analysis. In some models, more than one pathological clusters are revealed by the analysis. Shouldn't they be included as pathological, or at least excluded, from the reference population used as a control for differential expression?
  
  We thank the reviewer for this important comment. We agree that, in some organs, more than one cluster shows features associated with disease and that the selection of disease-associated fibroblast populations should therefore be carefully justified. To improve transparency, we have substantially expanded the supplementary analyses and replaced the original Figures S6A-C with four organ-specific supplementary figures (Figures S7-S10), as described in our answer to Reviewer #1, major comment 7.
  
  Regarding the reviewer's suggestion to exclude additional potentially pathological clusters from the reference population, we chose not to do so. In many cases, the identity of these secondary clusters is less clear, and excluding them would introduce an additional layer of subjective decision-making that may not necessarily improve robustness. Instead, we used a conservative strategy in which only well-supported disease-associated fibroblast populations were explicitly selected. Furthermore, all downstream analyses of disease-associated fibroblasts were performed using pseudobulk profiles. Because pseudobulk aggregation emphasizes broad transcriptional trends, we expect the resulting signatures to be relatively robust to the inclusion or exclusion of small, ambiguously annotated subpopulations. For these reasons, we believe that retaining the remaining mesenchymal populations in the reference group provides the most objective and reproducible framework for the differential expression analysis.
  
  For changes in the manuscript associated to this comment, please see our answer to Reviewer #1, major comment 7.
  
  Reviewer #2, major comment 7: Scar-specific cell-cell communication: Using only COL1A1 as a marker may not be the best option, as this gene is also expressed in normal areas. Suggestion: Use a score combining the best fibrosis-associated genes across the four organs to define fibrotic areas more accurately?
  
  We thank the reviewer for this suggestion. We agree that COL1A1 is not exclusively expressed in fibrotic regions and can also be detected in normal tissue. To make the analysis more robust, we revised our approach and no longer rely on a single marker gene. Instead, we now compute an enrichment score based on a broader set of established extracellular matrix components, including all collagens and proteoglycans collected by Naba et al. (2012), thereby identifying regions characterized by active matrix deposition rather than expression of COL1A1 alone.
  
  We then assess the spatial colocalization of candidate ligands and receptors with these ECM-enriched regions across the entire tissue section and focus on the strongest colocalization signals. Importantly, this spatial analysis is subsequently integrated with the disease-associated fibroblast analysis, allowing us to prioritize genes that are both enriched in disease-associated fibroblasts and localized to ECM-rich regions.
  
  We acknowledge that ECM-rich regions are not necessarily equivalent to fibrotic scar tissue and that some physiologically matrix-producing regions may also be captured by this approach. However, because the analysis is performed across entire tissue sections and multiple independent samples, we expect such regions to contribute primarily as background signal for fibrotic slides. By focusing on the strongest and most consistently colocalizing ligands and receptors across samples, the analysis is designed to identify signals robustly associated with ECM-rich regions rather than being driven by isolated areas of physiological matrix expression.
  
  We considered the reviewer's suggestion of defining fibrotic regions using fibrosis-associated genes derived from our single-cell analyses. However, we chose not to pursue this strategy because it would introduce a degree of circularity into the analysis. Specifically, the same fibrosis-associated genes would first be used to define fibrotic regions and evaluate for spatial association with candidate ligands and receptors. They would naturally be used again in the gene expression ranking of disease-associated fibroblasts. However, we would like to compare those genes we have found in our meta-analysis with an independent data-modality. Therefore, by instead using an independent ECM-based definition of scar regions, we avoid this potential bias and maintain a clearer separation between the identification of fibrotic regions and the prioritization of disease-associated signaling molecules.
  
  We compared the results from before (COL1A1-to-gene colocalization) to our results now (ECM enrichment-to-gene colocalization) and found high correlation values between both results for each organ (Review Plan Figure 1). To further show that we expect the pathophysiological ECM signature to largely overshadow physiological ECM expression, we quantified their scores per slide (Figure 6B). We think that our new analysis method is more robust than before, as we now combine several genes into one score.
  
  We have updated Figure 6 and its text with these new results in our manuscript:
  
  "To refine these insights, we next focused on identifying ligands and receptors that are specifically expressed in actively scarring regions. We prioritized these molecules because, as extracellular signaling factors and cell-surface proteins, they are directly accessible to therapeutic intervention and therefore represent particularly attractive candidate targets. Structural extracellular matrix molecules were excluded as candidate genes in this analysis and were used instead for the identification of fibrotic scar regions.
  
  Accordingly, we calculated an ECM enrichment score for each spatial spot, based on a broad set of established structural extracellular matrix components, consisting of all collagens and proteoglycans collected by Naba et al.(Naba et al. 2012). We then computed the spatial colocalization of all remaining ligands and receptors with the identified scarring regions (see methods). Finally, we compared the scar-localization of each gene per organ to the organ-consensus scores of disease fibroblasts (Figure 6A). ECM enrichment scores were significantly elevated in fibrotic compared with control samples across all four organs (Wilcoxon rank-sum test: heart p = 0.005; lung p = 0.002; liver p = 0.014; kidney p = 4e-6, Figure 6B), indicating that pathological extracellular matrix production substantially exceeds physiological ECM turnover. We overall observed a low correlation between scar localization of ligands and receptors and organ effect size in each organ (R in heart = 0.32, liver = 0.38, lung = 0.09, kidney = 0.11), suggesting several cell types and states to be involved in scar-tissue gene expression or a fibrotic gene expression change that goes beyond the scar area (Figure 6C). When comparing the overlap between top ranked genes per organ (upper 20th percentile in gene regulation and colocalization), we observed 8 genes that were identified in 3 out of 4 organs (VIM, TIMP1, FSTL1, CCN2, ANXA2, FBN1, FN1, THBS2), and 2 genes (TIMP2, MRC2) that were identified in all four organs (Figure 6D)."
  
  Furthermore, we updated Supplementary Figure 12 to include ECM enrichment scores instead of COL1A1 expression.
  
  Finally, we updated the methods section:
  
  "To identify actively scarring regions, we performed an enrichment analysis of the geneset consisting of Collagens and Proteoglycans using decoupler's (124) (v1.9.0) univariate linear model (ULM). The spatial colocalization of scarring regions and targets of interest was estimated with the bivariate Moran's R metric implemented in LIANA+ (130) v1.5.0 per target and Visium slide."
  
  Reviewer #3
  
  Reviewer #3, minor comment 13: P30: The staging or severity of each of the diseases seems like quite a strong confounder, especially if there is a bias for sampling tissues that are late stage. It would be nice to see this addressed more explicitly in the results, perhaps with some comparisons between those that are identified as earlier and later stage in the respective fibrotic diseases (if these annotations exist).
  
  We thank the reviewer for raising this important point. We agree that disease stage and severity are potential confounding factors in any meta-analysis of fibrotic diseases and that a bias toward sampling late-stage disease could influence the molecular programs identified.
  
  Unfortunately, disease staging and fibrosis severity annotations were not consistently available across the published datasets included in our analysis. As a result, we were unable to systematically stratify samples into early- and late-stage disease groups across all organs and disease etiologies. We have therefore highlighted this limitation in the discussion:
  
  "Second, the limited availability of patient metadata leaves many aspects unresolved, including the exact diagnosis, disease severity, tissue sampling location, and the extent of fibrosis. If these aspects were better documented, they could be accounted for in the analysis and could allow a clearer distinction of physiological from pathophysiological fibrotic processes."
  
  Nevertheless, we sought to address this concern in the subset of studies for which fibrosis-related severity measurements were available. Specifically, we derived organ-specific fibrosis signatures, scored these signatures across patients, and performed per-study normalization. In these datasets, fibrosis signature scores correlated with available fibrosis severity measurements, supporting the biological relevance of the identified programs (Figure S5A-D). In addition, these analyses indicate that fibrosis signature scores vary across disease etiologies, consistent with the reviewer's suggestion that different diseases may exhibit distinct degrees of fibrotic remodeling (Figure S5E).
  
  Nevertheless, because detailed histological and clinical metadata are available only for a limited subset of studies, we believe that a comprehensive analysis of fibrosis severity, disease chronicity, and etiology-specific remodeling is beyond the scope of the currently available data and that the currently available metadata are insufficient to robustly compare early- and late-stage disease across the full collection of datasets. We agree that a systematic investigation of stage-specific fibrotic programs would be highly valuable and represents an important direction for future studies using more comprehensively annotated patient cohorts.
  
  For changes in the manuscript associated to this comment, please see our answer to Reviewer #1, major comment 2.
  
  3.2 Editorial corrections or clarity improvements
  
  Reviewer #1
  
  Reviewer #1, major comment 6: The authors focused on the common functions between mesenchymal and endothelial cells among organs in Fig. 3H and I. Are there cell type specific effects here but shared across organs?
  
  We thank the reviewer for this question. The results shown in Figures 3H and 3I already represent cell type-specific functional enrichments, as the analyses were performed independently for each cell type before identifying pathways that are consistently altered across organs. Thus, the reported enrichments correspond to cell type-specific effects that are shared across fibrotic diseases in different tissues.
  
  At the same time, we agree with the reviewer that an interesting observation emerging from these analyses is the overlap in the enriched biological processes identified across different cell types. This suggests that, despite clear cell type-specific transcriptional responses, multiple cell populations converge on a common set of fibrosis-associated pathways. To avoid potential confusion, we have revised the text to clarify that Figures 3H and 3I display cell type-specific enrichments and that the overlap between cell types reflects convergence on shared biological processes rather than identical gene-level responses. Furthermore, we pointed out one difference shown in the plots: the enrichment of neuronal development and axonogenesis pathways in mesenchymal cells.
  
  "This association with development was further supported by the functional characterization of upregulated genes per organ and cell type."
  
  [...] "In addition, enrichment of neuronal development and axonogenesis pathways points to activation of projection-related programs, which were not present in the endothelial cell population (Figure 3I). "
  
  Reviewer #1, major comment 9: It is unclear why only known ligands and receptors are included in the therapeutic target identification analysis in Fig. 6B.
  
  Our intention was to focus the therapeutic target identification analysis on known ligands and receptors, while excluding major extracellular matrix (ECM) components, because ligands and receptors are generally more amenable to therapeutic intervention and therefore represent particularly attractive candidate targets. To clarify this rationale, we have revised the manuscript text to explicitly describe the criteria used for target selection and the motivation for restricting the analysis to this subset of genes. The corresponding clarification has been added to the results section "Scar-specific cell-cell communication":
  
  "To refine these insights, we next focused on identifying ligands and receptors that are specifically expressed in actively scarring regions. We prioritized these molecules because, as extracellular signaling factors and cell-surface proteins, they are directly accessible to therapeutic intervention and therefore represent particularly attractive candidate targets. Structural extracellular matrix molecules were excluded as candidate genes in this analysis and were used instead for the identification of fibrotic scar regions."
  
  Reviewer #1, minor comment 2: The description or legend for the colors is missing in Fig. 3A
  
  We thank the reviewer for this comment. The color legend was included in the original version of Figure 3A; however, we agree that its placement did not make it sufficiently prominent and may have reduced its visibility. To improve clarity, we have revised the figure layout and repositioned the legend of Figure 3A above the plot so that the color annotation is more readily identifiable.
  
  Reviewer #1, minor comment 3: FAP appears to be the top gene with robust upregulation in fibrotic heart, lung, liver, and kidney in Fig. 3E, which is also a well-establish surrogate of fibroblast activity and tissue fibrosis in clinical settings (for instance, PMID: 38279381) but not mentioned anywhere in the text.
  
  We thank the reviewer for highlighting the upregulation of FAP across fibrotic organs. We agree that FAP is a well-established marker of activated fibroblasts and tissue fibrosis and therefore deserves explicit mention at this stage of the analysis. We have revised the text accompanying Figure 3E to highlight FAP as one of the most consistently upregulated genes across organs and to note its established relevance in fibrotic disease:
  
  "One of the most robustly upregulated genes across organs was prolyl endopeptidase FAP (FAP), a well-established marker gene of activated fibroblasts that has been shown to be functionally relevant in fibrotic diseases in several clinical settings."
  
  Reviewer #1, minor comment 4: Although it is clear that this study was performed at a much larger scale, the additional gain compared to the previous attempt on identification of shared feature in fibrotic heart, lung, liver, and kidney should be mentioned (PMID: 41752153).
  
  We thank the reviewer for pointing out this relevant study (PMID: 41752153). We agree that it represents an important previous effort to identify shared features across fibrotic diseases and should be discussed. We have therefore revised the Introduction to acknowledge this work and clarify how the present study extends beyond it. Specifically, while the previous study compared fibrotic heart, lung, liver, and kidney tissues, it was based on a limited number of studies and disease contexts per organ. In contrast, our analysis integrates a substantially larger collection of datasets spanning multiple disease etiologies within each organ, enabling a more systematic assessment of conserved and tissue-specific fibrotic programs across diverse fibrotic diseases.
  
  "Recent studies have sought to define shared molecular features across fibrotic diseases affecting the heart, lung, liver, and kidney (15). However, these analyses were based on one study and limited disease contexts per organ, restricting their ability to systematically assess the robustness and generalizability of shared fibrotic programs across diverse disease etiologies."*
  
  Reviewer #2
  
  Reviewer #2, major comment 4: Fig.4D: Among this top list, DNM3OS has been indeed characterized as a regulator of the TGF-β pathway in lung fibrosis and should be cited (PMID: 30964696). Interestingly, this lncRNA encodes a cluster of miRNA, including miR-199a-5p, that has been found deregulated in various fibrotic models including lung, kidney and liver (PMID: 23459460).
  
  We thank the reviewer for highlighting the functional relevance of DNM3OS in fibrosis to improve the manuscript. We checked the literature and agree that its role as a regulator of TGF-β signaling and the involvement of its associated miRNA cluster, including miR-199a-5p, provide important context for interpreting our findings.
  
  We have therefore expanded the discussion of Fig. 4D and DNM3OS in the manuscript and added the suggested references. Specifically, we now note that DNM3OS was consistently upregulated across organs and that both DNM3OS and its associated miRNA miR-199a-5p have been implicated as downstream effectors of TGF-β signaling involved in myofibroblast activation in lung fibrosis, as well as in experimental models of liver and kidney fibrosis.
  
  "Furthermore, long noncoding RNA dynamin 3 opposite strand (DNM3OS) was consistently upregulated across organs. DNM3OS and its associated miRNA, miR-199a-5p, have been identified as downstream effectors of TGF-β signaling and implicated in myofibroblast activation in lung fibrosis (76), as well as in experimental mouse models of liver and kidney fibrosis (77)."
  
  Reviewer #2, major comment 5: Fig. 3F-I and Fig. 4E: the list of the predicted downstream genes for each TF should be provided in a supplemental table
  
  The transcription factor target gene sets used in these analyses were not generated as part of this study but were obtained from previously published and publicly available regulatory network resources. Because these target gene lists are extensive and already available through the original resource, we did not include them as supplementary tables. To improve transparency and reproducibility, we have revised the manuscript to clearly state the source of these regulatory networks and provide the corresponding reference(s) and access information, allowing readers to retrieve the complete target gene sets used in our analyses. Therefore, in the section "Common aspects of fibrosis across tissues in endothelial and mesenchymal cells", we now state that the collection is publicly available and refer to the methods section:
  
  "From organ effect sizes, we also inferred transcription factor (TF) activities per organ using CollectTRI (54), a curated publicly available collection of TF-targets, and identified the most commonly upregulated TFs based on the up- or downregulation of the genes they regulate across organs (see methods)."
  
  In addition, we specifically state in the methods section how the regulons can be accessed:
  
  "CollecTRI regulons are publicly accessible as described in the original publication (64), for instance at https://zenodo.org/records/8192729?preview_file=CollecTRI_regulons.csv."
  
  Reviewer #2, minor comment 2: Several panels (Fig.3F-I, Fig.4E-F) need to be improved, in particular the dot plots. with the same order for organs than for the other panels and another range for the size of the dots (-log10 pvalue) to reduce the max size of the dot as well as the enrichment score to expand the value of the z-score.
  
  We thank the reviewer for these suggestions regarding figure presentation. To improve the readability and consistency of the dot plots, we have made several changes to the figures. We believe these changes substantially improve the interpretability of the figures while preserving the underlying biological signal.
  
  First, we reordered the organs in Figures 3F-I and 4E-F (see above, in answer to Reviewer #1, minor comment 2 and below, respectively) to match the ordering used throughout the remainder of the manuscript. Second, we expanded the displayed enrichment score range from −2 to 2 to −4 to 4. While many values remain relatively homogeneous, this reflects the fact that these panels were specifically designed to highlight the most consistently shared and strongly regulated signals across organs. Third, we adjusted the dot size scaling for the adjusted p-values. To further improve the visualization of statistical significance, we now explicitly indicate significance using circle outlines: features with an adjusted p-value
  
  Reviewer #2, minor comment 4: The study is meticulously designed and clearly presented, employing a robust combination of computational approaches. To the reviewer's knowledge, this is the first systematic, cross-organ meta-analysis of fibrosis, offering a comprehensive characterization of both organ-specific and shared gene programs associated with fibrotic processes. A particularly commendable aspect of this work is the provision of a rich and accessible dataset through an interactive data browser, which will serve as a valuable resource for the scientific community at large. The impact of this study is broad and multidisciplinary, benefiting not only computational biologists but also experimental biologists and clinicians working in the field of fibrosis.
  
  We appreciate the positive assessment of our work and would like to thank the reviewer for recognizing the value of the systematic cross-organ analysis and the interactive data browser. We are pleased that the reviewer considers the study to be a useful resource for the fibrosis research community and appreciates its potential relevance to computational and experimental researchers, as well as clinicians.
  
  Reviewer #3
  
  Reviewer #3, minor comment 2: P6: 43 {plus minus} 9 % - this looks a little strange as a percentage, leaving it as a count would probably be clearer as its quite a small number. Please clarify here what 'feature count' here refers to.
  
  We agree that the notation "43 {plus minus} 9%" may be less intuitive. However, we chose to retain the percentage because it summarizes the proportion of female samples across datasets rather than the total number of samples, which varies substantially between studies. To improve clarity, we removed the variability term and now report only the percentage of samples in the section Data curation for a cross-organ comparison of fibrotic diseases (p.6):
  
  "In studies with available gender information (16/22 datasets), 43 % of samples were female on average (Figure 1D)."
  
  In addition, we clarified the meaning of "feature count" by replacing this term with "gene count" throughout the text and in Suppl. Figure 1B.
  
  Reviewer #3, minor comment 3: P8: Caption: Could you expand a bit upon this 'molecular change severity' in the text?
  
  We thank the reviewer for pointing this out. We agree that at this point in the manuscript, the concept of "molecular change severity" is not clear yet. It is described later in the manuscript at the beginning of the section "Fibrotic disease programs within tissues" and refers to our analysis with scDist.To make this clearer at its first mention, we have revised the caption to explicitly direct readers to the relevant section and figures. The caption now states:
  
  "Studies displayed in grey were excluded after an initial assessment of molecular change severity between patient groups, as discussed in the section 'Fibrotic disease programs within tissues' (Figure S2A-D & methods)."
  
  We believe this addition improves clarity while avoiding duplication of the more detailed explanation provided later in the manuscript.
  
  Reviewer #3, minor comment 4: Do the author annotated cell types correspond reasonably well with your cell type labels, in those datasets where its present?
  
  We would like to clarify that we did not perform de novo cell type annotation in the studies except for two. Instead, we used the cell type annotations provided by the original study authors and harmonized them into broader cell type categories based on their names to enable comparisons across studies and organs. The mapping between the original study annotations and these harmonized categories is already provided in Supplementary Table 1. To make this more explicit, the text now states:
  
  "To enable a comparison across tissues, we grouped cells into five broad categories based on the author's annotations: endothelial-, epithelial-, mesenchymal-, lymphoid-, and myeloid cells (mappings available in Suppl. Table 1)."
  
  Furthermore, the consistency of these annotations was assessed by examining the expression of cell type marker genes, as shown in Figure 1F, which supports the validity of the harmonized cell type labels used throughout the study.
  
  Reviewer #3, minor comment 5: P11: A little more information on scDist and what the distances are calculated based on would be good here.
  
  We thank the reviewer for this suggestion. We agree that the original description did not sufficiently explain how ScDist quantifies molecular differences between conditions. We have therefore expanded the text to clarify that ScDist is a mixed-effects modeling framework and that larger distances correspond to stronger disease-associated transcriptional perturbations:
  
  "To do so, we applied ScDist (36), a mixed-effects modeling framework that quantifies transcriptomic differences between conditions while accounting for donor-to-donor variability (see methods). For each cell type, ScDist estimates a distance in gene expression space between healthy and fibrotic cells, with larger values indicating stronger disease-associated transcriptional changes."
  
  Furthermore, we added to the methods:
  
  "To assess disease-associated transcriptional shifts within each cell type, we applied scDist (v1.1.2) (117) to estimate transcriptional distances between fibrotic and control samples. ScDist assesses disease-associated transcriptional shifts within each cell type by using a linear mixed-effects model that separates condition-associated transcriptional changes from inter-individual variability by including the disease condition as a fixed effect and donor-specific variation as a random effect."
  
  Reviewer #3, minor comment 7: P14: Are these genes known to be implicated in fibrotic diseases? I know that this is discussed further later, but a few words here would be good.
  
  We added some context to some of the mentioned genes into the text:
  
  ** "Notably, several of the highest-ranked genes by our analysis are well-established stress-response and fibrosis markers, such as POSTN38,39, SPP140, VCAN41,42, COL15A121, C343,44, FABP445, and VWF46,47, providing confidence that the identified signatures capture true disease processes instead of study-specific occurrences."
  
  Reviewer #3, minor comment 8: P17: Fig 3H: enrichment -> enrichment score? (same elsewhere)
  
  We thank the reviewer for noting this ambiguity. We agree that the term "enrichment" was imprecise in this context. To improve clarity and consistency, we have revised the figure legends of Fig 3 F-I and Fig 4 E-F to explicitly refer to the reported metric as the enrichment score rather than simply enrichment. The updated figure 3 can be found in our answer to Reviewer #1, minor comment 2, the updates to Figure 4 in our answer to Reviewer #2, minor comment 2.
  
  Reviewer #3, minor comment 9: P19: ULM is used a few times in the captions, but only ever defined in the methods.
  
  We agree that the abbreviation ULM was not sufficiently defined in the main text and figure legends. To improve readability, we now define the term ULM as univariate linear model at its first occurrence in the figure legends (Figure 3I, page 18).
  
  The figure caption now reads:
  
  "For F-I: Dots show the enrichment score (positive: upregulated in fibrosis), while sizes show the -log10 of the adjusted p-values of univariate linear model (ULM) enrichments."
  
  Reviewer #3, minor comment 10: P20: 'disease relevant cell states' - this might need rewording to better reflect the compositional analysis, and not imply that this identifies cell states rather than clusters of cells.
  
  We agree that compositional analysis formally identifies cell clusters enriched in disease rather than directly establishing biological cell states. We have revised the text to refer to disease-associated mesenchymal populations/clusters identified through compositional analysis rather than "disease-relevant cell states":
  
  "To identify disease-associated mesenchymal subpopulations in our datasets, we integrated the mesenchymal cell population per organ and identified a disease-associated cluster by compositional analysis"
  
  "We also explored disease-associated mesenchymal subpopulation-specific gene expression and the spatial localization of ligands and receptors."
  
  Reviewer #3, minor comment 11: P22: Fig 4D: This could do with more dynamic range on the colour axis, as most things are near or above the scale.
  
  We thank the reviewer for this suggestion & agree that the original color scale provided limited visual separation between highly concordant features. We note that this is, in part, a consequence of the panel's design, as Figure 4D specifically highlights genes that are consistently and strongly regulated across organs and therefore exhibit relatively similar effect sizes. Nevertheless, to improve visual discrimination, we have adjusted the color scale of Figure 4D (and similarly, Figure 3 D and E) to provide greater dynamic range and enhance the visibility of differences between genes while preserving the underlying data. We believe this modification improves the interpretability of the figures. The new figures 3 and 4 can be found in our answers to Reviewer #1, minor comment 2 and Reviewer #3, minor comment 8, respectively.
  
  Reviewer #3, minor comment 15: It would be nice to keep the gene naming schemes consistent (i.e., MOXD1 and TNC), especially within the same discussion.
  
  We thank the reviewer for this suggestion and agree that consistent gene nomenclature improves readability. We have therefore revised the discussion text to use a consistent naming.
  
  Reviewer #3, minor comment 17: 'some studies have highlighted the disease-relevance of specific cell states' -> please cite
  
  To support this statement, we have added the appropriate references describing disease-relevant cell states in fibrotic tissues:
  
  "Lastly, with exception to the mesenchymal cell population, our analysis primarily focused on broad cell type categories, even though some studies have highlighted the disease-relevance of specific cell states (22, 73,74,7,75,33)".
  
  Reviewer #3, minor comment 18: Code availability: I think the 'fi' digraph in the link for https://github.com/saezlab/organfibrosis breaks it, but after correcting it manually I can access the repository.
  
  We thank the reviewer for noting this issue. The hyperlink functions correctly in the submitted manuscript PDF, but we are not sure in which format the reviewer received the manuscript. We will work with the editorial team during the publishing process to ensure that the repository link will be displayed correctly and remains fully accessible in the published version.
  
  Description of analyses that authors prefer not to carry out
  
  Reviewer #1
  
  -
  
  Reviewer #2
  
  Reviewer #2, major comment 2: Myeloid Cell Analysis: given the importance of myeloid cells in fibrotic processes, particularly the origin of pathological cells (often monocyte-derived macrophages), it would be highly informative to adopt a similar approach to determine whether myeloid subpopulations differ depending on the affected organ.**
  
  We thank the reviewer for this suggestion and agree that myeloid cells play a critical role in fibrosis. A systematic comparison of disease-associated myeloid states across organs would therefore be highly valuable. In the present study, however, we chose to focus our state-level analysis on mesenchymal cells because they represent the principal effector population responsible for extracellular matrix deposition and scar formation across fibrotic diseases and because they showed a promising overlap between tissues at the broad cell type level. In contrast, our cross-organ analyses indicate weaker transcriptional conservation among myeloid cells (highest cross-organ disease score prediction AUROC mesenchymal: 0.88; myeloid: 0.72), suggesting that organ-specific immune responses may contribute more strongly than shared fibrosis-associated programs.
  
  Moreover, our integrated dataset combines both single-cell and single-nucleus sequencing studies, which are known to differ in transcript capture and cell type recovery, especially in immune cells (Feng et al. 2026; Van Melkebeke et al. 2024b; Denisenko et al. 2020). These technical differences already complicated the robust comparison of mesenchymal populations, and we expect they would present an even greater challenge for the identification and comparison of fine-grained myeloid cell states across studies and organs. We therefore chose to focus our detailed state-level analysis on mesenchymal populations, where the biological question was most directly aligned with the central objective of identifying conserved fibrogenic programs across organs.
  
  Therefore, extending the same analysis to myeloid populations would require a comprehensive integration, annotation, and validation effort that would substantially expand the scope of the current study. We therefore chose to focus our in-depth state-level analysis on the mesenchymal compartment, which is most directly aligned with the central objective of identifying conserved fibrogenic programs across organs.
  
  Reviewer #3
  
  -
  
  References
  
  Argelaguet, Ricard, Damien Arnol, Danila Bredikhin, et al. 2020. "MOFA+: A Statistical Framework for Comprehensive Integration of Multi-Modal Single-Cell Data." Genome Biology 21 (1): 111. https://doi.org/10.1186/s13059-020-02015-1.
  
  Denisenko, Elena, Belinda B. Guo, Matthew Jones, et al. 2020. "Systematic Assessment of Tissue Dissociation and Storage Biases in Single-Cell and Single-Nucleus RNA-Seq Workflows." Genome Biology 21 (1): 130. https://doi.org/10.1186/s13059-020-02048-6.
  
  Feng, Xue, Yu Feng, Sayed Haidar Abbas Raza, Yun Ma, and Hongyu Deng. 2026. "Single Cell and Single Nucleus RNA Sequencing in Liver Tissues: Applications and Prospects in Model and Non-Model Organisms." Frontiers in Genetics 17 (April): 1781941. https://doi.org/10.3389/fgene.2026.1781941.
  
  Koenitzer, Jeffrey R., Haojia Wu, Jeffrey J. Atkinson, Steven L. Brody, and Benjamin D. Humphreys. 2020. "Single-Nucleus RNA-Sequencing Profiling of Mouse Lung. Reduced Dissociation Bias and Improved Rare Cell-Type Detection Compared with Single-Cell RNA Sequencing." American Journal of Respiratory Cell and Molecular Biology 63 (6): 739-47. https://doi.org/10.1165/rcmb.2020-0095MA.
  
  Lake, Blue B., Rajasree Menon, Seth Winfree, et al. 2023. "An Atlas of Healthy and Injured Cell States and Niches in the Human Kidney." Nature 619 (7970): 585-94. https://doi.org/10.1038/s41586-023-05769-3.
  
  Litviňuková, Monika, Carlos Talavera-López, Henrike Maatz, et al. 2020. "Cells of the Adult Human Heart." Nature 588 (7838): 466-72. https://doi.org/10.1038/s41586-020-2797-4.
  
  Naba, Alexandra, Karl R. Clauser, Sebastian Hoersch, Hui Liu, Steven A. Carr, and Richard O. Hynes. 2012. "The Matrisome: In Silico Definition and In Vivo Characterization by Proteomics of Normal and Tumor Extracellular Matrices*." Molecular & Cellular Proteomics 11 (4): M111.014647. https://doi.org/10.1074/mcp.M111.014647.
  
  Van Melkebeke, Lukas, Jef Verbeek, Dora Bihary, et al. 2024a. "Comparison of the Single-Cell and Single-Nucleus Hepatic Myeloid Landscape within Decompensated Cirrhosis Patients." Frontiers in Immunology 15 (February). https://doi.org/10.3389/fimmu.2024.1346520.
  
  Van Melkebeke, Lukas, Jef Verbeek, Dora Bihary, et al. 2024b. "Comparison of the Single-Cell and Single-Nucleus Hepatic Myeloid Landscape within Decompensated Cirrhosis Patients." Frontiers in Immunology 15 (February). https://doi.org/10.3389/fimmu.2024.1346520.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.03.09.709232
www.biorxiv.org www.biorxiv.org

TSvelo: Comprehensive RNA velocity by modeling cascade of gene regulation, transcription and splicing

1
1. Public_Reviews 19 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In the paper, the authors propose a new RNA velocity method, TSvelo, which predicts the transcription rate linearly based on the expression of RNA levels of transcription factors. This framework is an extension of its recent work TFvelo by including unspliced reads and designing a coherent neuralODE framework. Improved performance was demonstrated in six diverse datasets.
  
  Strengths:
  
  Overall, this method introduces innovative solutions to link cell differentiation and gene regulation, with a balance between model complexity (neuralODE) and interpretability (raw gene space).
  
  We thank the reviewer for the positive evaluation of our work and for recognizing the novelty of the proposed framework. We appreciate the reviewer’s summary highlighting that TSvelo extends our previous method TFvelo by incorporating unspliced reads and introducing a coherent neuralODE framework to model transcription dynamics.
  
  We are encouraged that the reviewer recognizes the potential of our approach to link cell differentiation with gene regulatory mechanisms, while maintaining a balance between model expressiveness and interpretability in the gene expression space. In the revised manuscript, we have further clarified several methodological details and strengthened the presentation to better highlight these aspects.
  
  Weaknesses:
  
  While it seems to provide convincing results, there are multiple technical concerns for the authors to clarify and double-check.
  
  (1) The authors should clarify and discuss the TF-target map: here, the TF-target genes map is predefined by the TF binding's ChIP-seq data. This annotation is largely incomplete and mostly compiled from a set of bulk tissues. Therefore, for a certain population, the TF-target relation may change. This requires clarification and discussion, possibly exploring how to address this in the model. In addition, a regulon database could be added, e.g., DoRothEA?
  
  We thank the reviewer for this important comment. The TF–target maps used in TSvelo (e.g., derived from ChIP-seq-based resources such as ENCODE) reflect aggregated TF binding evidence collected across diverse bulk cell types and experimental conditions. As such, they are inherently incomplete and do not capture fully context-specific regulatory activity in a given primary tissue. In TSvelo, we therefore do not treat these annotations as fixed or cell-type-specific ground truth regulatory relationships. Instead, they are used as a permissive prior that encodes a broad set of potential regulatory interactions.
  
  Within the TSvelo framework, the contribution of each TF–target interaction is learned from data through weight estimation, allowing the model to down-weight or effectively ignore prior edges that are inconsistent with the observed single-cell expression dynamics. This design enables TSvelo to remain robust even when the prior TF–target map is noisy, incomplete, or derived from heterogeneous bulk contexts.
  
  Following the reviewer’s suggestion, we additionally incorporated the DoRothEA regulon database as an alternative prior with confidence-level filtering. We further performed ablation studies on the pancreas dataset and the gastrulation erythroid dataset using different TF–target resources, including ChEA, ENCODE, and their combinations with DoRothEA.
  
  The results on the pancreas dataset and the gastrulation erythroid dataset are shown in Figure S13 and Figure S14 respectively, which come up with the same conclusion. We observed highly consistent results across most TF–target prior combinations, including ChEA, ENCODE, ChEA+ENCODE, ChEA+DoRothEA, ENCODE+DoRothEA, and ChEA+ENCODE+DoRothEA. Using the pancreas dataset as example, the mean velocity consistency ranged from 0.985 to 0.995, the mean in-cluster coherence ranged from 0.983 to 0.992, and the mean cross-boundary direction correctness ranged from 0.719 to 0.740 across all settings. These consistently high and tightly bounded metrics indicate that TSvelo is largely insensitive to the specific choice of TF–target prior.
  
  The only configuration showing reduced stability was the use of DoRothEA alone, particularly in terms of cross-boundary direction correctness. This is likely due to its comparatively limited coverage of TF–target interactions. For instance, in the pancreas dataset, only 81 out of 2000 highly variable genes (HVGs) could be associated with TFs based on DoRothEA, corresponding to 102 TF–target links in total, which may restrict downstream regulatory modeling. In contrast, ChEA covered 1793 genes with 13,976 TF–target links, and ENCODE covered 1854 genes with 33,076 links. These results further suggest that integrating multiple TF–target resources could improve performance, likely due to increased coverage and complementary regulatory information.
  
  We further acknowledge that regulatory interactions are inherently context-dependent, and that no static TF–target resource can fully capture tissue-specific regulatory programs. In the revised Discussion, we explicitly clarify this limitation and highlight that incorporating context-specific regulatory data (e.g., single-cell chromatin accessibility or perturbation-based regulatory maps) represents an important direction for future improvement.
  
  (2) The authors should clarify how example genes are selected. This is particularly unclear in Figure 2d.
  
  We thank the reviewer for raising this point. The example genes shown in Fig. 2d were selected to illustrate representative scenarios where our method provides advantages, particularly cases in which the unspliced–spliced 2D phase portrait exhibits mixed or overlapping patterns that are difficult to model using conventional RNA velocity approaches. These examples are therefore intended to demonstrate the types of transcriptional dynamics that TSvelo is designed to better capture.
  
  To avoid the impression of selective presentation, we note that our conclusions are based on systematic evaluation across all genes and datasets. Additional visualizations for a broader set of genes on this dataset are provided in Fig. S3. We have clarified the example gene selection criteria in the revised manuscript.
  
  (3) The authors should clarify confidence in the statement in lines 179-180, that ANXA4 should initially decrease. This is particularly concerning, as TSvelo didn't capture the cell cycle transitions well during the initial part.
  
  We thank the reviewer for raising this point. The statement that ANXA4 initially decreases is based on the observed expression pattern in the dataset rather than on cell-cycle–related dynamics inferred by the model. Specifically, ANXA4 shows higher expression in Ductal cells compared to Ngn3 EP cells, and Ductal represents an earlier stage in the developmental trajectory. Therefore, along the Ductal to Ngn3 EP transition, ANXA4 naturally exhibits an initial decrease in expression. We have clarified this point in the revised manuscript.
  
  (4) A support reference should be added for the statement in line 260 that "neuron migrations are inside-out manner". There is no reference supporting this, and this statement is critical for the model assessment.
  
  We thank the reviewer for this suggestion. This pattern has been reported in previous studies [1,2], which have been added into the revised manuscript.
  
  To Improve clarity, we have also revised the statement in the manuscript as follows:
  
  “During cortical development, neurons follow an inside-out layering pattern in which earlier-born neurons populate the deep cortical layers, whereas later-born neurons migrate past them to occupy more superficial layers.”
  
  (1) Nadarajah, B., Parnavelas, J. Modes of neuronal migration in the developing cerebral cortex. Nat Rev Neurosci 3, 423–432 (2002).
  
  (2) Li, C., Virgilio, M.C., Collins, K.L. et al. Multi-omic single-cell velocity models epigenome–transcriptome interactions and improves cell fate prediction. Nat Biotechnol 41, 387–398 (2023).
  
  (5) The comparison to scMultiomics data is particularly interesting, as MultiVelo uses ATAC data to predict the transcription rate. It would be very insightful to add a direct comparison of the estimated transcription rate between using ATAC and directly using TFs' RNA expressions.
  
  We thank the reviewer for suggesting this highly interesting comparison between ATAC-derived regulatory activity and TF RNA-based proxies for transcription rate estimation.
  
  We have conducted the requested analysis by computing gene-wise chrome accessibility rate used in MultiVelo and the learned transcription rate from TSvelo, and evaluated their correlation across genes. As shown in Figure S15, the two estimates exhibit almost no global correlation across genes, indicating that they capture substantially different aspects of regulatory information.
  
  This discrepancy is not unexpected and reflects the fundamental differences between these modalities. scATAC-seq measures chromatin accessibility, which provides a proxy for cis-regulatory potential of genomic regions. However, ATAC signals are inherently sparse and often exhibit a near-binary structure, limiting their ability to directly capture fine-grained temporal regulatory dynamics. In contrast, TF RNA expression reflects downstream transcriptional output, which is shaped by multiple regulatory layers, including post-transcriptional regulation, protein activity, temporal delays, and indirect regulation through intermediate transcriptional or signaling pathways. As a result, these two modalities are expected to capture complementary but not directly comparable aspects of gene regulation.
  
  Overall, this result suggests that ATAC-based and TF RNA-based signals capture distinct aspects of gene regulation. This further implies that integrating both modalities may be beneficial for future models that aim to more comprehensively characterize transcriptional regulation. We have added this discussion to the supplementary information.
  
  (6) In Figure 6g, it should be clarified how the lineage was determined. Did the authors use the LARRY barcodes, predicted cell fate, or any other methods? Here, the best way is probably using the LARRY barcodes for individual clones.
  
  We thank the reviewer for this suggestion. The lineage assignment used in Fig. 6g is described in the Methods section (“Lineage segmentation and pseudotime initialization”). Briefly, lineages are inferred from the transcriptomic structure of the data by performing Leiden clustering followed by PAGA-based connectivity analysis. Starting from an initial Leiden cluster, the filtered PAGA graph defines the shortest paths to other clusters, which are considered as the detected lineages, and diffusion pseudotime (DPT) is then used to initialize pseudotime along each lineage. Thus, in this analysis lineages are determined from the expression-derived trajectory structure. We have clarified this point in the revised manuscript and refer readers to the Methods section.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Li et al. propose TSvelo, a computational framework for RNA velocity inference that models transcriptional regulation and gene-specific splicing using a neural ODE approach. The method is intended to improve trajectory reconstruction and capture dynamic gene expression changes in scRNA-seq data. However, the manuscript in its current form falls short in several critical areas, including rigorous validation, quantitative benchmarking, clarity of definitions, proper use of prior knowledge, and interpretive caution. Many of the authors' claims are not fully supported by the evidence.
  
  We thank the reviewer for the careful evaluation of our manuscript and for the constructive comments. We appreciate the concerns regarding validation, benchmarking, methodological clarity, and interpretation. In the revised manuscript, we have carefully addressed these points by adding additional analyses, clarifying methodological details, and moderating several claims to ensure they are fully supported by the data. Detailed responses to each comment are provided below.
  
  Major comments:
  
  (1) Modeling comments
  
  (a) Lines 512-513: How does the U-to-S delay validate the accuracy of pseudotime? Using only a single gene as an example is not sufficient for "validation."
  
  We thank the reviewer for this important clarification. In the revised manuscript, we have rephrased this part to clarify that Fig. 1a serves only as an illustrative example showing the U-to-S delay for a single gene. Accordingly, we have corrected our statement to indicate that the U-to-S delay is used to infer trajectory orientation, rather than to validate the accuracy of pseudotime.
  
  In addition, we have expanded the description to explain that U-to-S delay signals are aggregated across all genes to provide a more robust and comprehensive assessment for this purpose. Additional analysis is provided in our response to the next comment.
  
  (b) Lines 512-518: The authors propose a strategy for selecting the initial state, but do not benchmark how accurate this selection procedure is, nor do they provide sufficient rationale. While some genes may indeed exhibit U-to-S delay during lineage differentiation, why does the highest U-to-S delay score indicate the correct initiation states? Please provide mathematical justification and demonstrate accuracy beyond using a single gene example. Maybe a simulation with ground truth could help here, too.
  
  We thank the reviewer for this insightful comment. In the revised manuscript, we have clarified both the intuition and justification of this approach. Briefly, along a correctly oriented trajectory, unspliced (U) expression is expected to precede spliced (S) expression due to transcriptional dynamics. Ideally, this U-to-S delay would be observable at the level of individual genes. However, due to the high noise inherent in scRNA-seq data, such delays are often not consistently detectable on a per-gene basis. To address this, we aggregate U-to-S delay signals across all genes and determine the lineage orientation by maximizing a global delay score. Under this criterion, the cluster from which all outgoing lineages exhibit the highest aggregated U-to-S delay is inferred to correspond to the initial state.
  
  We emphasize that this approach relies on genome-wide aggregation rather than any single gene. Moreover, the same strategy is applied uniformly across all six datasets using identical parameter settings, demonstrating its robustness and stability. To further address the reviewer’s concern, we additionally present the U-to-S delay scores for each Leiden cluster when treated as the initial state across all datasets (Author response image 1). The results on all datasets suggest that the highest U-to-S delay scores can be used to detect the initial cluster.
  
  Author response image 1.
  
  The U-to-S delay scores for each Leiden cluster when treated as the initial state across all datasets.
  
  Following your suggestions, we also add a simulation study. We generated synthetic single-cell RNA velocity datasets using a mechanistic transcriptional dynamics model with one or multiple developmental branches. The system included 200 genes, among which 30 were designated as transcription factors (TFs).
  
  For each branch, we independently sampled a TF–target regulatory matrix W ϵ R<sup>30×200</sup> from a standard normal distribution to simulate distinct GRN structures. Gene expression dynamics were modeled using a coupled ordinary differential equation (ODE) system describing unspliced and spliced RNA abundances:
  
  where u and s denote unspliced and spliced RNA levels, respectively. The transcription rate α was computed as a nonlinear function of TF expression, defined as a weighted sum of spliced TF abundance, followed by clipping to ensure bounded activation.
  
  Each branch is initialized from the same randomly sampled initial condition drawn from a gamma distribution, allowing controlled divergence of trajectories driven solely by branch-specific regulatory programs.
  
  To simulate observed sequencing counts, we introduced technical noise by scaling latent expression levels with cell-specific library sizes drawn from a log-normal distribution. The resulting expression counts were generated using a negative binomial sampling model:
  
  where θ controls over dispersion, with smaller values corresponding to higher noise levels. The final datasets consist of paired unspliced (U) and spliced (S) count matrices with realistic transcriptional stochasticity and branching gene regulatory dynamics. For each branch, cells were further divided into three developmental stages for downstream analysis.
  
  We evaluated TSvelo on multiple simulated datasets with varying numbers of branches and noise levels. There are two or three branches start from the same root cell groups in these datasets (Branch 1: stage 0 - stage 1 - stage 2. Branch 2: stage 0 - stage 3 - stage 4. Branch 3: stage 0 - stage 5 - stage 6). The results of initial state identification based on the unspliced-to-spliced (U-to-S) delay, along with the corresponding 2D velocity stream visualizations, are presented in Supplementary Figure S1. These results demonstrate that the U-to-S delay–based initialization is robust and consistently identifies cells corresponding to the earliest developmental stage (“stage 0”) across different simulation settings. All additional results have been included in the Supplementary Information.
  
  (c) Equation (8): The formulation looks to be incorrect. If $$W \in \mathbb{R}^{G\times G}$$ and $$W' - \Gamma' \in \mathbb{R}^{K\times K}$$, how can they be aligned within the same row? Please clarify.
  
  We thank the reviewer for pointing this out. This was a typographical error in the manuscript. In the third line of Equation (8), the term should be W’ instead of W. We have corrected this in the revised manuscript to ensure dimensional consistency.
  
  (d) The use of prior knowledge graphs from ENCODE or ChEA to constrain regulation raises concerns. Much of the regulatory information in these databases comes from cell lines. How can such cell-line-based regulation be reliably applied to primary tissues, as is done throughout the manuscript? Additional experiments are needed to test the robustness of TSvelo with respect to prior knowledge.
  
  We thank the reviewer for this important comment. In TSvelo, TF–target networks from resources such as ENCODE and ChEA are incorporated as priors that guide the model toward biologically plausible regulatory structures. Importantly, the contribution of each TF–target interaction is learned from the data, allowing the model to down-weight or override potentially inaccurate or context-mismatched regulatory links. By aggregating signals across a large number of genes, the model further reduces sensitivity to noise and incompleteness in any single prior network.
  
  To evaluate robustness with respect to prior knowledge, we incorporated the DoRothEA regulon resource as an alternative TF–target prior with confidence-level filtering. We further performed ablation studies on the pancreas dataset and the gastrulation erythroid dataset using different TF–target resources, including ChEA, ENCODE, and their combinations with DoRothEA.
  
  The results on the pancreas dataset and the gastrulation erythroid dataset are shown in Figure S13 and Figure S14 respectively, which come up with the same conclusion. We observed highly consistent results across most TF–target prior combinations, including ChEA, ENCODE, ChEA+ENCODE, ChEA+DoRothEA, ENCODE+DoRothEA, and ChEA+ENCODE+DoRothEA. Using the pancreas dataset as example, the mean velocity consistency ranged from 0.985 to 0.995, the mean in-cluster coherence ranged from 0.983 to 0.992, and the mean cross-boundary direction correctness ranged from 0.719 to 0.740 across all settings. These consistently high and tightly bounded metrics indicate that TSvelo is largely insensitive to the specific choice of TF–target prior. Notably, these results further suggest that even when the underlying regulatory resources differ in origin (e.g., cell-line-derived vs. curated or aggregated datasets), the inferred dynamics remain stable.
  
  The only configuration showing reduced stability was the use of DoRothEA alone, particularly for cross-boundary direction correctness. This is likely due to its comparatively limited coverage of TF–target interactions. For instance, in the pancreas dataset, only 81 out of 2000 highly variable genes (HVGs) could be associated with TFs based on DoRothEA, corresponding to 102 TF–target links in total, which may limit downstream regulatory modeling. In contrast, ChEA covered 1793 genes with 13,976 TF–target links, and ENCODE covered 1854 genes with 33,076 links. These results further suggest that integrating multiple TF–target resources can improve performance, likely due to increased coverage and complementary regulatory information.
  
  We agree that regulatory interactions derived from resources such as ENCODE and ChEA may not fully generalize to primary tissues due to their context-dependent nature. In the revised Discussion, we explicitly clarify this limitation, particularly their inability to capture tissue-specific regulatory programs. We further highlight that incorporating context-specific regulatory data, such as single-cell chromatin accessibility or perturbation-based regulatory maps, represents an important direction for future improvement.
  
  (e) Lines 579-580: How is the grid search performed? More methodological details are required. If an existing method was used, please provide a citation.
  
  The grid search for the time step means that the model evaluates the loss in equation (10) across all candidate values of t<sub>step</sub> in the set {0,1,2,...,999}. This strategy was originally adopted in scVelo for optimizing the time step parameter. We have now added the corresponding citation to scVelo in the revised manuscript.
  
  (2) Application on pancreatic endocrine datasets
  
  (a) Lines 140-141: What is the definition of the final pseudotime-fitted time t or velocity pseudotime?
  
  There is no distinction between “final pseudotime”, “fitted time t” and “velocity pseudotime”. All of them refer to the same quantity in our framework. To eliminate any potential ambiguity, we have standardized the terminology by replacing “final pseudotime” with “pseudotime”.
  
  (b) Lines 143-144: The use of the velocity consistency metric to benchmark methods in multi-lineage datasets is incorrect. In multi-lineage differentiation systems, cells (e.g., those in fate priming stages) may inherently show inconsistency in their velocity. Thus, it is difficult to distinguish inconsistency caused by estimation error from that arising from biological signals. Velocity consistency metrics are only appropriate in systems with unidirectional trajectories (e.g., cell cycling). The abnormally high consistency values here raise concerns about whether the estimated velocities meaningfully capture lineage differences.
  
  We thank the reviewer for raising this important point regarding the use of the velocity consistency metric in multi-lineage systems. Velocity consistency was initially introduced by scVelo [1] and implemented as scvelo.velocity_confidence() in its package. Velocity consistency provides one of the few widely adopted quantitative criteria for benchmarking RNA velocities [2]. We agree that it is especially suitable for single-lineage processes. For datasets with clear multi-lineage differentiation (Fig. 5 and Fig. 6), we do not use this metric, precisely to avoid the issue highlighted by the reviewer.
  
  However, the pancreatic endocrine dataset (Fig. 2) exhibits minimal branching, making velocity consistency be more appropriate. As introduced by veloVI study, RNA velocities are supposed to change smoothly over the phenotypic manifold [3]. Higher consistency indicates that neighboring cells show compatible velocity directions, reflecting stable and coherence of the inferred velocity field. Additionally, multiple previous studies used velocity consistency to evaluate model performance on this pancreas dataset [2,3,4], providing a standard point of comparison.
  
  To better address your concerns, we have replaced the corresponding panel in Fig. 2 of the main text with an evaluation of cell-type separability in both the traditional 2D (unspliced–spliced) phase portrait and the learned 3D (α–unspliced–spliced) phase portrait by TSvelo (Author response image 4 in our response to your subsequent question). We appreciate your suggestions, as the comparison more clearly highlights the novelty and contribution of TSvelo and helps explain its improved performance. Now, the velocity consistency panel has been moved to the Supplementary Information. In addition, we have added a clearer explanation of the cross-boundary correctness metric in the revised manuscript.
  
  (1) Bergen, V., Lange, M., Peidli, S., Wolf, F. A., & Theis, F. J. (2020). Generalizing RNA velocity to transient cell states through dynamical modeling. Nature Biotechnology, 38(12), 1408-1414.
  
  (2) Luo, Y., Ren, J., Yang, Q. ... & Li, Q. (2026). Benchmarking RNA velocity methods across 17 independent studies, Cell Reports Methods, 101367.
  
  (3) Gayoso, A., Weiler, P., Lotfollahi, M., Klein, D., Hong, J., Streets, A., ... & Yosef, N. (2024). Deep generative modeling of transcriptional dynamics for RNA velocity analysis in single cells. Nature Methods, 21(1), 50-59.
  
  (4) Li, J., Pan, X., Yuan, Y., & Shen, H. B. (2024). TFvelo: gene regulation inspired RNA velocity estimation. Nature Communications, 15(1), 1387.
  
  (c) The improvement of TSvelo over other methods in terms of cross-boundary direction correctness looks marginal; a statistical test would help to assess its significance.
  
  We thank the reviewer for this insightful comment. In the revised manuscript, we have added statistical tests for evaluated metrics, including velocity consistency, cross-boundary direction correctness, and in-cluster coherence.
  
  As shown in Author response image 2, TSvelo significantly outperforms all baseline methods in terms of velocity consistency across both datasets. For in-cluster coherence, TSvelo achieves significantly better performance on the gastrulation (erythroid) dataset, while on the pancreas dataset it performs comparably to the best-performing baselines (UniTVelo and TFvelo) and significantly outperforms several competing methods, including CellDancer, Dynamo, and scVelo.
  
  For cross-boundary direction correctness, TSvelo shows consistent improvements in mean performance on the pancreas dataset (Author response image 3), and significantly outperforms Dynamo and scVelo on the gastrulation dataset. Although not all pairwise comparisons on cross-boundary direction correctness reach statistical significance, this is likely influenced by the limited number of independent samples (n = 7 and n = 4 for the two datasets, respectively), which reduces statistical power for detecting differences. Importantly, TSvelo still achieves the best average performance among all methods, indicating a consistent overall trend in favor of TSvelo.
  
  We have added these results into the revised manuscript.
  
  Author response image 2.
  
  The quantitative comparison between TSvelo and baseline approaches on the pancreas dataset (panel a) and the gastrulation erythroid dataset (panel b). In each plot, methods are ranked in descending order of their mean values. Numbers at the bottom indicate the sample size for each metric. Significance is determined using a one-sided Mann–Whitney U test. *****, ***, ** and * represent p < 0.00001, 0.0001 ≤ p < 0.001, 0.001 ≤ p < 0.01, and 0.01 ≤ p < 0.05, respectively.
  
  Author response image 3.
  
  The comparison of mean cross-boundary direction correctness on the pancreas dataset.
  
  (d) Lines 177-178: Based on the figure, TSvelo does not appear to clearly distinguish cell types. A quantitative metric, such as Adjusted Rand Index (ARI), should be provided.
  
  We thank the reviewer for this helpful suggestion. To quantitatively assess whether TSvelo can distinguish cell types, we evaluated the separability of cell-type labels in both the 2D (unspliced–spliced) phase portrait adopted by previous RNA velocity approaches, and the 3D (α–unspliced–spliced, α denotes the transcriptional rate) phase portrait introduced by TSvelo.
  
  Specifically, we evaluated how well the embedding preserves cell-type information using a k-nearest neighbors (kNN) classification accuracy with 5-fold cross-validation. Given an embedding matrix in 2D or 3D space (X 𝛜 ℝ<sup>n*d</sup>, where n is the number of cells and d is 2 or 3) and corresponding cell-type labels (y 𝛜 {1, … ,C}, we partition the data into five folds. For each fold (k), a kNN classifier with K = 5, denoted asf<sup>(k)</sup>, is trained on the training subset and evaluated on the held-out test subset. The classification accuracy for the k-th fold is defined as ℝ
  
  where n<sub>k</sub> is the number of samples in the test set and 1(.)is the indicator function. The final score is obtained by averaging across all folds:
  
  This metric directly assesses whether cells of the same type are positioned close to each other in the embedding space, and is widely used to quantify representation quality.
  
  Using this evaluation, we observed that the 3D phase portrait consistently achieves significantly higher accuracy than the 2D phase portrait (Author response image 4). The improvement is highly statistically significant (one-sided Mann–Whitney U test, p-value = 4.37 × 10<sup>-10</sup>), demonstrating that the 3D representation provides substantially better separation of cell types.
  
  We have added these quantitative results to the revised manuscript to complement the visual evidence and to clarify that TSvelo effectively distinguishes cell types in the learned representation.
  
  Author response image 4.
  
  The evaluation of the separability of cell-type labels in both the 2D (unspliced–spliced) phase portrait and the 3D (α–unspliced–spliced) phase portrait for the pancreas dataset.
  
  (e) Lines 179-183: The claim that traditional methods cannot capture dynamics in the unspliced-spliced phase portrait is vague. What specific aspect is not captured-the fitted values or something else? Evidence is lacking. Please provide a detailed explanation and quantitative metrics to support this claim.
  
  We thank the reviewer for this important comment. We have revised the text to more clearly illustrate this point using representative example genes as follows: “For instance, ANXA4 shows higher expression in Ductal cells compared to Ngn3 low EP cells, which mean its expression pattern exhibits an initial decrease followed by an increase. Such dynamics are not easily captured in the conventional unspliced–spliced phase portrait used by previous approaches, as many baseline methods implicitly assume a decreasing–then–increasing expression pattern. By comparison, TSvelo can still fit such expression pattern by using additional information from the 3D phase portrait.”
  
  In addition, we also clarify that the 2D u–s representation has limited capacity to separate heterogeneous dynamic cell states, which can affect downstream velocity field estimation. In the conventional 2D u–s phase portrait, cells from different dynamic regimes may overlap in the same region of the embedding space. This overlap reduces the identifiability of underlying transcriptional states and makes the inferred local dynamics more ambiguous. In contrast, TSvelo introduces an additional latent variable α, forming a 3D (α, u, s) phase portrait, which helps disentangle these mixed trajectories and yields a more structured and separable representation of cell dynamics. We have provided quantitative evidence in the previous response (Author response image 4). Briefly, the proposed 3D representation achieves consistently higher kNN classification accuracy (5-fold cross-validation, k=5) for cell state identification compared to the 2D u–s embedding.
  
  (3) Application to gastrulation erythroid datasets
  
  (a) Lines 191-194: The observation that velocity genes are enriched for erythropoiesis-related pathways is trivial, since the analysis is restricted to highly variable genes (HVGs) from an erythropoiesis dataset. This enrichment is expected and therefore not informative.
  
  We thank the reviewer for this comment and agree that such enrichment is expected given the use of HVGs from an erythropoiesis dataset. This analysis was included only as a preliminary sanity check to support the plausibility of the inferred velocity genes, rather than as a main result. We have accordingly simplified the description and clarified that this analysis serves only as a preliminary check in the revised manuscript.
  
  (b) Lines 227-228: It remains unclear how TSvelo "accurately captures the dynamics." What is the definition of dynamics in this context? Figure 3g shows unspliced/spliced vs. fitted time plots and phase portraits, but without a quantitative definition or measure, the claim of superiority cannot be supported. Visualization of a single gene is insufficient; a systematic and quantitative analysis is needed.
  
  We thank the reviewer for this important comment. We have revised the text to more clearly illustrate this point using representative example genes as follows: “For HSP90AB1, which exhibits a counter-clockwise pattern in the unspliced–spliced phase portrait, in contrast to the clockwise dynamics typically assumed by most baseline approaches, it is difficult for previous methods to capture this behavior, whereas TSvelo can still faithfully model such patterns. For genes such as RPS26, which have critical roles in the development in blood progenitors to erythroid40, the unspliced-spliced data is so noisy that cells of different types overlap in phase portrait. TSvelo can still captures the gene dynamics and reveals differences in transcription rates across cell types.”
  
  In addition, we explicitly emphasize the role of the 3D (α, u, s) phase portrait, which provides a more structured and separable representation of transcriptional states compared to the conventional 2D u–s space. This improved representation is the key factor underlying the advantages of TSvelo in modeling transcriptional processes. In the conventional 2D u–s phase portrait, cells from different transcriptional states may overlap, leading to reduced separability. In contrast, introducing the latent variable α expands the representation to a 3D space, which helps disentangle these mixed states and yields a clearer phase structure. Similar to our previous response in Author response image 4, we provide quantitative evidence on this gastrulation erythroid dataset in Figure S7, showing that the 3D representation achieves consistently higher kNN classification accuracy for cell state separation compared to the 2D u–s embedding (one-sided Mann–Whitney U test, p-value = 0.002).
  
  (4) Application to the mouse brain and other datasets
  
  (a) Lines 280-281: The authors cannot claim that velocity streams are smoother in TSvelo than in Multivelo based solely on 2D visualization. Similarly, claiming that one model predicts the correct differentiation trajectory from a 2D projection is over-interpretation, as has been discussed in prior literature see PMID: 37885016.
  
  We thank the reviewer for this important comment. Consistent with other RNA velocity studies, TSvelo employs the 2D UMAP stream plot for visualizing the results. We agree that conclusions based solely on 2D visualizations may lead to over-interpretation. Our intention was to provide an intuitive visualization rather than a rigorous quantitative comparison. Accordingly, we have revised the text to avoid making definitive claims about smoothness or correctness of differentiation trajectories based solely on 2D projections.
  
  (b) Lines 304-306: Beyond transcriptional signal estimation, how is regulation inferred solely from scRNA-seq data validated, especially compared with scATAC-seq data? Are there cases where transcriptome-based regulatory inference is supported by epigenomic evidence, thereby demonstrating TSvelo's GRN inference accuracy?
  
  We thank the reviewer for this important question regarding the validation of regulatory inference derived from scRNA-seq data and its comparison to scATAC-seq-based evidence.
  
  We would like to first clarify the scope of TSvelo. Similar to existing RNA velocity methods, the primary goal of TSvelo is to model transcriptional dynamics and accurately infer cell state transitions and cell fate trajectories. In this context, gene regulatory information is not inferred de novo from data, but incorporated as prior knowledge from curated TF–target databases to guide and constrain the dynamics modeling process, as described in our Introduction.
  
  We have conducted the requested analysis by computing gene-wise chrome accessibility rate used in MultiVelo and the learned transcription rate from TSvelo, and evaluated their correlation across genes. As shown in Figure S15, the two estimates exhibit almost no global correlation across genes, indicating that they capture substantially different aspects of regulatory information.
  
  This discrepancy is not unexpected and reflects the fundamental differences between these modalities. scATAC-seq measures chromatin accessibility, which provides a proxy for cis-regulatory potential of genomic regions. In contrast, TF RNA expression reflects downstream transcriptional output, which is shaped by multiple regulatory layers, including post-transcriptional regulation, protein activity, temporal delays, and indirect regulation through intermediate transcriptional or signaling pathways. As a result, these two modalities are expected to capture complementary but not directly comparable aspects of gene regulation.
  
  We acknowledge that scATAC-seq provides valuable complementary information on chromatin accessibility and regulatory potential, and will consider incorporating matched multi-omics data in future work. In the revised manuscript, we further clarify that TSvelo is an RNA velocity method that incorporates prior knowledge from curated TF–target databases, and we have added a discussion on the potential use of scATAC-seq data for future extension of our framework.
  
  (c) The claim that TSvelo can model multi-lineage datasets hinges on its use of PAGA for lineage segmentation, followed by independent modeling of dynamics within each subset. However, the procedure for merging results across subsets remains unclear.
  
  We thank the reviewer for pointing out that the merging step was not sufficiently described. After modeling dynamics independently within each lineage-specific subset, TSvelo integrates the results via a weighted aggregation procedure at the cell level.
  
  For each cell and each inferred quantity (e.g., velocity or other dynamic variables), we collect the estimates obtained from different lineage-specific models and combine them using a weighted average. The weights are defined by the size of each lineage, reflecting its statistical support. We have clarified details about this merging procedure in the Methods section.
  
  This aggregation reconciles multiple lineage-specific estimates for the same cell into a single value and mitigates discontinuities that could arise from directly combining independent lineage analyses. The resulting values define a unified set of dynamics for each cell across lineages.
  
  Reviewer #3 (Public review):
  
  Despite the abundance of RNA velocity tools, there are still major limitations, and there is strong skepticism about the results these methods lead to. In this paper, the authors try to address some limitations of current RNA velocity approaches by proposing a unified framework to jointly infer transcriptional and splicing dynamics. The method is then benchmarked on 6 real datasets against the most popular RNA velocity tools.
  
  While the approach has the potential to be of interest for the field, and may present improvements compared to existing approaches, there are some major limitations that should be addressed, particularly concerning the benchmark (see major comment 1).
  
  Major comments:
  
  (1) My main criticism concerns the benchmarking: real data lack a ground truth, and are absolutely not ideal for comparing methods, because one can only speculate what results appear to be more plausible.
  
  A solid and extensive simulation study, which covers various scenarios and possibly distinct data-generating models, is needed for comparing approaches. The authors should check, for example, the simulation studies in the BayVel approach (Section 4, BayVel: A Bayesian Framework for RNA Velocity Estimation in Single-Cell Transcriptomics). Clearly, all methods should be included in the simulation.
  
  Following your recommendation, we have added the simulation analysis to compare TSvelo with existing RNA velocity approaches. We generated synthetic single-cell RNA velocity datasets using a mechanistic transcriptional dynamics model with one or multiple developmental branches. The system included 200 genes, among which 30 were designated as transcription factors (TFs).
  
  For each branch, we independently sampled a TF–target regulatory matrix W ϵ ℝ<sup>30×200</sup> from a standard normal distribution to simulate distinct GRN structures. Gene expression dynamics were modeled using a coupled ordinary differential equation (ODE) system describing unspliced and spliced RNA abundances:
  
  where u and s denote unspliced and spliced RNA levels, respectively. The transcription rate α was computed as a nonlinear function of TF expression, defined as a weighted sum of spliced TF abundance, followed by clipping to ensure bounded activation.
  
  Each branch is initialized from the same randomly sampled initial condition drawn from a gamma distribution, allowing controlled divergence of trajectories driven solely by branch-specific regulatory programs.
  
  To simulate observed sequencing counts, we introduced technical noise by scaling latent expression levels with cell-specific library sizes drawn from a log-normal distribution. The resulting expression counts were generated using a negative binomial sampling model:
  
  where θ controls over dispersion, with smaller values corresponding to higher noise levels. The final datasets consist of paired unspliced (U) and spliced (S) count matrices with realistic transcriptional stochasticity and branching gene regulatory dynamics. For each branch, cells were further divided into three developmental stages for downstream analysis.
  
  We evaluated TSvelo and those splicing-based RNA velocity approaches on multiple simulated datasets with varying numbers of branches and noise levels. There are one, two or three branches start from the same cell group in these datasets (Branch 1: stage 0 - stage 1 - stage 2. Branch 2: stage 0 - stage 3 - stage 4. Branch 3: stage 0 - stage 5 - stage 6). We primarily assessed performance using the cross-boundary direction correctness (CBDir) metric, as it directly evaluates inferred trajectories against ground-truth cell stage annotations, which have been widely adopted in RNA velocity studies such as VeloAE and UniTvelo. In detail, Cross-boundary direction correctness assesses the accuracy of transitions from a source cluster to a target cluster by examining the boundary cells, and requires ground truth annotations. We directly run the function unitvelo.evaluate() provided in UniTVelo to obtain the Cross-boundary direction correctness. In detail, the CBDir is calculated as follows:
  
  where θ controls over dispersion, with smaller values corresponding to higher noise levels. The final datasets consist of paired unspliced (U) and spliced (S) count matrices with realistic transcriptional stochasticity and branching gene regulatory dynamics. For each branch, cells were further divided into three developmental stages for downstream analysis.
  
  where C<sub>A</sub> denotes the set of cells in the target cluster A, and N(c) represents the neighboring cells of a given cell c v<sub>c</sub> and x<sub>c</sub> denote the low-dimensional velocity and state vectors of cell c, respectively, and x<sub>c’</sub> denotes the state vector of its neighboring cell.
  
  As shown in Figure S2, TSvelo consistently achieves the highest accuracy across all simulation settings, particularly in scenarios with complex branching structures, which pose significant challenges for baseline methods.
  
  (2) Related to the above: since a ground truth is missing, the real data analyses need to be interpreted with caution. I recommend avoiding strong statements, such as "successfully captures the correct gene dynamics", or "accurately infer", in favour of milder statements supported by the data, such as "... aligns with the biological processes described" (as in page 12), or "results are compatible with current biological knowledge", etc...
  
  We thank the reviewer for this helpful comment. We agree that analyses on real datasets should be interpreted with appropriate caution because definitive ground truth is typically unavailable. Following the reviewer’s suggestion, we have revised the wording throughout the manuscript to avoid overly strong claims. For example, statements such as “successfully captures the correct gene dynamics” and “accurately infer” have been replaced with more cautious descriptions such as “consistent with known biological processes”.
  
  (3) Many methods perform RNA velocity analyses. While there is a brief description, I think it'd be useful to have a schematic summary (e.g., via a Table) of the main conceptual, mathematical, and computational characteristics of each approach.
  
  We thank the reviewer for this insightful suggestion. We agree that a structured summary of existing RNA velocity methods would improve clarity and accessibility. We have added a new summary table (Table S1) that systematically compares representative RNA velocity approaches in the supplementary information.
  
  (4) Related to the above: I struggled to identify the main conceptual novelty of TSvelo, compared to existing approaches. I recommend explaining this aspect more extensively.
  
  We thank the reviewer for this insightful comment. We agree that the conceptual novelty of TSvelo can be more clearly articulated.
  
  In the revised manuscript, we have expanded the discussion at the beginning of the Results section to explicitly highlight the key distinctions between TSvelo and existing approaches. Specifically, we now clarify that most existing RNA velocity methods predominantly focus on splicing dynamics and typically operate in a gene-wise manner, without capturing coordinated dynamics across genes. In contrast, TSvelo models the full cascade of transcriptional regulation, transcription, and splicing within a unified framework, and estimates RNA velocity jointly across all genes, thereby capturing their coordinated dynamics at the system level.
  
  (5) A computational benchmark is missing; I'd appreciate seeing the runtime and memory cost of all methods in a couple of datasets.
  
  We thank the reviewer for this helpful suggestion regarding computational benchmarking. In the revised manuscript, we have added a systematic comparison of runtime and GPU memory usage across TSvelo and ba methods using simulated datasets of increasing scale (600, 1200, and 1800 cells) on our NVIDIA GeForce RTX 3090 device with 24 GB memory.
  
  Table S2 shows differences in computational efficiency and resource requirements among methods. Specifically, classical methods such as scVelo and Dynamo exhibit very fast runtimes (10–24 seconds) and do not rely on GPU acceleration, reflecting their relatively lightweight modeling strategies. In contrast, deep learning–based approaches, including UniTVelo, cellDancer, and TSvelo, have higher computational costs due to their increased model complexity.
  
  TSvelo exhibits a stable GPU memory footprint (~1.26 GB) across different dataset sizes, indicating that its memory usage is primarily determined by model architecture rather than the number of cells. This level of memory consumption is well within the capacity of modern GPUs and does not pose practical limitations. In terms of runtime, TSvelo scales approximately linearly with dataset size. The higher computational cost of TSvelo is mainly due to its EM-style optimization procedure, where each M-step also involves multiple optimization updates to infer gene regulatory effects in a global model. This design enables TSvelo to explicitly incorporate regulatory priors and jointly model gene interactions, which is not supported by these baseline methods.
  
  To further improve runtime efficiency, TSvelo allows flexible control of the number of EM iterations. As shown in Figure S16 and Table S3, we evaluated performance under different iteration settings on the simulation dataset. The early stopping strategy employed in the EM framework of TSvelo, which will stop modeling if the loss is not further reduced in the last 3 iterations. Results show that convergence is typically achieved within 3 iterations for this dataset, and increasing the maximum number of iterations beyond this does not further change the results. Notably, even a single iteration already yields competitive performance, likely benefiting from the strong initialization based on unspliced-to-spliced temporal delay.
  
  Overall, these results highlight a trade-off between computational efficiency and modeling expressiveness. While TSvelo is more computationally demanding than classical approaches, it provides a more flexible framework for incorporating regulatory information and capturing complex gene interactions, which we believe justifies the additional computational cost in scenarios requiring accurate dynamical inference.
  
  (6) I think BayVel (mentioned above) should be added to the list of competing methods (both in the text and in the benchmarks). The package can be found here: https://github.com/elenasabbioni/BayVel_pkgJulia.
  
  We thank the reviewer for suggesting BayVel and for providing the repository link. We carefully review the available resources, including both the BayVel_pkgJulia and the BayVel_notebooks, and we appreciate the authors’ efforts in making their code and data publicly available.
  
  We note that BayVel repositories primarily provide scripts and data for reproducing the figures and results reported in their manuscript. However, at present, the available resources do not yet provide a complete guideline or standardized pipeline for applying BayVel to new datasets. To ensure a fair and reproducible comparison, we therefore tend to use BayVel results officially provided by the authors. We are grateful that the BayVel results on the pancreas dataset is released at BayVel_notebooks page: https://github.com/elenasabbioni/BayVel_notebooks/tree/main/real%20data/Pancreas/moments/output.
  
  Based on these results, we conducted comparisons across all methods on the pancreas dataset, with quantitative evaluations shown in Author response image 55. In each plot, methods are ranked in descending order of their mean values. Numbers at the bottom indicate the sample size for each metric. Statistical significance is assessed using a one-sided Mann–Whitney U test, where *****, ***, **, and * denote p < 0.00001, 0.0001 ≤ p < 0.001, 0.001 ≤ p < 0.01, and 0.01 ≤ p < 0.05, respectively.
  
  BayVel has now been included in the Introduction, and corresponding comparisons have been added in the revised manuscript.
  
  Author response image 5.
  
  The quantitative comparison between TSvelo and baseline approaches on the pancreas dataset. In each plot, methods are ranked in descending order of their mean values. Numbers at the bottom indicate the sample size for each metric. Significance is determined using a one-sided Mann–Whitney U test. *****, ****,***, ** and * represent p < 0.00001, 0.00001 ≤ p < 0.0001, 0.0001 ≤ p < 0.001, 0.001 ≤ p < 0.01, and 0.01 ≤ p < 0.05, respectively.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Please carefully proofread the text. Some typos:
  
  (1) Line 110: differentia -> differential.
  
  (2) Line 280: ".," to be corrected.
  
  (3) Line 566: optimize -> optimizes.
  
  We thank the reviewer for carefully proofreading the manuscript and for pointing out these typographical errors. We have corrected the identified typos in the revised manuscript.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Regarding Major Comment 1 in the Public Review, I contacted BayVel authors, who told me that they'll upload all their scripts here within a few days: https://github.com/elenasabbioni/BayVel_notebooks
  
  Thank you very much for reaching out to the BayVel authors. We sincerely appreciate the BayVel authors’ efforts to make their scripts and results publicly available through BayVel_notebooks. We believe this is a valuable contribution that will greatly benefit the community.
  
  We have followed the repository and have now included BayVel in the revised manuscript, with corresponding comparisons added to both the main text and the benchmarking results.
  
  (2) Page 9 mentions "consistency", "coherence", and "correctness". Instead of these qualitative (and potentially subjective) evaluations, I'd appreciate using quantitative metrics or visual descriptions when differences are visually clear.
  
  We thank the reviewer for this insightful comment. The terms “velocity consistency,” “in-cluster coherence,” and “cross-boundary correctness” used in our manuscript are not intended as subjective descriptions. They correspond to commonly used evaluation criteria in this field and have been adopted as quantitative metrics in previous studies, such as VeloAE[1] and UniTVelo[2]. We have incorporated the following updated definition into the Methods section.
  
  (1) Velocity consistency (VCon). We used the scvelo.velocity_confidence() function from scVelo to evaluate velocity consistency, interpreting the results as a measure of how consistent velocities are within neighboring cells. Velocity consistency is especially suitable for evaluating the RNA velocity modeling on single lineage. For each cell , the velocity consistency is calculated as follows:
  
  Where N (c) represents the neighboring cells of a given cell c v<sub>c</sub> v<sub>c’</sub> denote the low-dimensional velocity vectors of cell cand its neighboring cell c’.
  
  (2) Cross-boundary direction correctness (CBDir). Cross-boundary direction correctness assesses the accuracy of transitions from a source cluster to a target cluster by examining the boundary cells, and requires ground truth annotations. We directly run the function unitvelo.evaluate() provided in UniTVelo to obtain the Cross-boundary direction correctness. In detail, the CBDir is calculated as follows:
  
  Where C<sub>A</sub> denotes the set of cells in the target cluster A, and represents the neighboring cells of a given cell c v<sub>c</sub> v<sub>c’</sub> denote the low-dimensional velocity and state vectors of cell cand its neighboring cell c’.
  
  (3) Within-cluster velocity coherence (ICCoh). Within-cluster velocity coherence measures the coherence of velocities within a single cluster using a cosine similarity score between cell velocities. We applied the function unitvelo.evaluate() provided by UniTVelo to directly compute the within-cluster velocity coherence. Using the same notation as defined above, the CBDir is calculated as follows:
  
  (1) Qiao, C. & Huang, Y. Representation learning of RNA velocity reveals robust cell transitions. Proceedings of the National Academy of Sciences 118, e2105859118 (2021).
  
  (2) Gao, M., Qiao, C. & Huang, Y. UniTVelo: temporally unified RNA velocity reinforces single-cell trajectory inference. Nature Communications 13, 6586 (2022).
  
  (3) At page 3, some objects are not defined after formula (3):
  
  ReLU finction, and w_gi
  
  Additionally, parenthesis of ReLU function should be bigger.
  
  We thank the reviewer for pointing this out. In the revised manuscript, we have explicitly defined the ReLU activation function and clarified that w<sub>gi</sub> represents the regulatory weight of TF i on the target gene g. In addition, we have adjusted the formatting of Eq. (3) by enlarging the parentheses in the ReLU function to improve readability.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.24.630058v4
www.biorxiv.org www.biorxiv.org

Clonal stochasticity in early NK cell response to mouse cytomegalovirus is generated by mature subsets of varying proliferative ability

1
1. Public_Reviews 19 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The objective of this study was to infer the population dynamics (rates of differentiation, division and loss) and lineage relationships of NK cell subsets during an acute immune response and under homeostatic conditions.
  
  Strengths:
  
  A rich dataset and a detailed analysis of a particular class of stochastic models.
  
  Weaknesses: (relating to initial submission)
  
  The stochastic models used are quite simple; each population is considered homogeneous with first-order rates of division, death, and differentiation. In Markov process models such as these there is no dependence of cellular behavior on its history of divisions. In recent years models of clonal expansion and diversification, in the settings of T and B cells, have progressed beyond this picture. So I was a little surprised that there was no mention of the literature exploring the role of replicative history in differentiation (e.g. Bresser Nat Imm 2022), nor of the notion of family 'division destinies' (either in division number, or the time spent proliferating, as described by the Cyton and Cyton2 models developed by Hodgkin and collaborators; e.g. Heinzel Nat Imm 2017). The emerging view is that variability in clone (family) size arises may arise predominantly from the signals delivered at activation, which dictate each precursor's subsequent degree of expansion, rather than from the fluctuations deriving from division and death modeled as Poisson processes.
  
  As you pointed out, the Gerlach and Buchholz Science papers showed evidence for highly skewed distributions of family sizes, and correlations between family size and phenotypic composition. Is it possible that your observed correlations could arise if the propensity for immature CD27+ cells to differentiate into mature CD27- cells increases with division number? The relative frequency of the two populations would then also be impacted by differences in the division rates of each subset - one would need to explore this. But depending on the dependence of the differentiation rate on division number, there may be parameter regimes (and timepoints) at which the more differentiated cells can predominate within large clones even if they divide more slowly than their immature precursors. One might not then be able to rule out the two-state model. I would like to see a discussion or rebuttal of these issues.
  
  Comments on revisions:
  
  (1) The authors have put in a lot of effort to address the reviews and have explored alternative models carefully.
  
  We appreciate the reviewers’ comments.
  
  (2) In the sections relating to homeostasis and the endogenous response, as far as I can tell you are estimating net growth rates (the k parameters) throughout - this is to be expected if you're working with just cell numbers and no information relating to proliferation. In these sections there are many places where you refer to proliferation rates and death rates when I think you just mean net positive or net negative growth rates. It's important to be precise about this even if the language can get a bit repetitive. (These net rates of growth or loss relate to clonal rather than cellular dynamics, which may be worth explaining). Later, you do use data relating to dead cells, which in principle can be used to get independent measures of death rates, but these data were not used in the fitting.
  
  We have modified the main text to address the comment.
  
  (3) There is so much evidence that T and B cell differentiation are often contingent on division that it would be very reasonable to consider it as a possibility for NK cells too. (Differentiation could be asymmetric, as you explored, or simply symmetric with some probability per division). These processes can be cast into simple ODE models but no longer allow you to aggregate division and death rates - so for parameter estimation you need to add measures of proliferation (Ki67 or similar) or death. This may be worth some discussion?
  
  We have modified the main text (lines 242-245) to address the comment.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Wethington et al. investigated the mechanistic principles underlying antigen-specific proliferation and memory formation in mouse natural killer (NK) cells following exposure to mouse cytomegalovirus (MCMV), a phenomenon predominantly associated with CD8+ T cells. Using a stochastic modeling approach, the authors aimed to develop a quantitative model of NK cell clonal dynamics during MCMV infection. Starting from a single immature Ly49+CD27+ NK cell, a two-state linear model (with a death variant) explained the negative correlation between clone size at 8 dpi and the CD27+ fraction, but failed to reproduce the first and second moments of CD27+ and CD27− NK cell populations at 8 dpi. To address this limitation, the authors added an intermediate maturation state, yielding a three-stage model (CD27+Ly6C− → CD27−Ly6C− → CD27−Ly6C+) that fits the first and second moments under two constraints: CD27+ NK cells proliferate faster than CD27− NK cells, and clone size is negatively correlated with the CD27+ fraction (upper bound of −0.2). The model predicts high proliferation in the intermediate state and high death in mature CD27−Ly6C+ cells, and it was validated using Adams et al. (2021) NK reporter mice tracking CD27+/− populations after tamoxifen, allowing discrimination between bone marrow-derived and pre-existing peripheral NK cells. To test the prediction that mature CD27− NK cells have a higher death rate, the authors measured Ly49H+ NK cell viability in the mouse spleen at different time points post-MCMV infection. Data confirmed lower viability of mature (CD27−) than immature (CD27+) cells during days 4-8 post-infection, and a model variant supported that higher CD27− death increases their proportion in the dead cell compartment. Altogether, the authors propose a three-stage quantitative model of antigen-specific expansion and maturation of naïve Ly49H+ NK cells with the trajectory CD27+Ly6C− (immature) → CD27−Ly6C− (mature I) → CD27−Ly6C+ (mature II), highlighting high proliferation in the mature I state and increased death in the mature II state.
  
  Strengths:
  
  Models explaining correlations and first and second moments, supported by analytical investigations, stochastic simulations, and model selection, identify key processes in antigen-specific NK expansion and maturation. The work distinguishes expansion, contraction, and memory in NK cells from CD8+ T cells and informs NK therapy development.
  
  Weaknesses (relating to initial submission):
  
  The conclusions of this paper are largely supported by the available data. However, a comparative analysis with more recent works in the field would be desirable. Clarifications:
  
  (1) Initial Conditions and Grassmann Data: The Grassmann data is used solely as a constraint, while the simulated values of CD27+/CD27− cells could have been directly fitted to the Grassmann data, which assumes a 1:1 ratio of CD27+/CD27− at t = 0. This would allow an alternative initial condition rather than starting from a single CD27+ cell.
  
  (2) Correlation Coefficients in the Three-State Model: Although the parameter scan of the three-stage model (Figure 2) demonstrates the potential for negative correlations between colony size and the fraction of CD27+ cells, the calculated correlation coefficients using the fitted parameter values are not shown. Including these would validate that the fitted parameters lie in the negative-correlation regime.
  
  (3) Viability Dynamics and Adaptive Response: The authors measured the time evolution of CD27+/− dynamics and viability over 30 days post-infection (Figure 4). It would be valuable to test whether the three-state model can reproduce the adaptive response of CD27− cells to MCMV infection, particularly the observed drop in CD27− viability at 5 dpi and its rebound at 8 dpi. Demonstrating this would test whether the model can simultaneously explain viability dynamics and moment dynamics, and would enable sensitivity analysis of CD27− viability with respect to model parameters.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Minor points:
  
  (1) line 175 - Here I think you have only ruled out the two state model with no death, and not the two state model in general?
  
  Edited the sentence to address the comment.
  
  (2) Figures 2 and 5 - the phenotypes (CD27+ Ly6C-, etc.) should be clearly labeled above each cell type. Fig 1 could be improved in the same way.
  
  Done.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.07.556760v4
www.biorxiv.org www.biorxiv.org

Decoding spine nanostructure in cultured neurons derived from mouse models of mental disorder reveals a schizophrenia-linked role for Ecrg4

1
1. Public_Reviews 19 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Kashiwagi et al. undertook a population analysis of dendritic spine nanostructure applied to the objective grouping of 8 mouse models of neuropsychiatric disorders. They report that spine morphology in cultured hippocampal neurons shows a higher similarity among schizophrenia mouse models (compared with autism spectrum disorder (ASD) mouse models), and identify an effect of Ecrg4 (encoding small secretory peptides) on spine dynamics and shape in these models.
  
  Strengths:
  
  The study developed a method for objectively comparing spine properties in primary hippocampal neuron cultures from 8 mouse models of psychiatric disorders at the population level using high-resolution structured illumination microscopy (SIM) imaging. This novel technique identified two distinct groups of mouse models according to the population-level spine properties: those with ASD-related gene mutations and those with schizophreniarelated gene mutations. Functional studies, including gene knockdown and overexpression experiments, identified an effect of Ecrg4 on the spine phenotype of the schizophrenia model mice.
  
  We thank the reviewer for finding our strategy novel and useful for identifying molecules associated with the spine phenotype in schizophrenia-related mouse models.
  
  Weaknesses:
  
  The main weakness is that the study is wholly in vitro, using cultured hippocampal neurons. The authors present this as an advantage, however, arguing that spine morphology as measured in a reduced culture system can demonstrate direct effects of gene mutations on neuronal phenotypes in the absence of indirect influences from non-neuronal cells or specific environments.
  
  We appreciate this reviewer's concern about the limitation of cultured hippocampal neurons in extracting disease-related spine phenotypes. While we fully recognize this limitation, we consider that this in vitro system has several advantages that contribute to translational research on mental disorders.
  
  First, our culture system has been shown to support the development of spine morphology similar to that of the hippocampal CA1 excitatory synapse in vivo. High-resolution imaging techniques confirmed that the in vitro spine structure was highly preserved compared with in vivo preparations (Kashiwagi et al., Nature Communications, 2019). The present study used the same culture system and SIM imaging. Therefore, the difference we detected in samples derived from disease models is likely to reflect impairment of molecular mechanisms underlying native structural development in vivo.
  
  Second, super-resolution imaging of thousands of spines in tissue preparations under precisely controlled conditions cannot be practically applied using currently available techniques. The advantage of our imaging and analytical pipeline is its reproducibility, which enabled us to compare the spine population data from eight different mouse models without normalization.
  
  Third, a reduced culture system can demonstrate the direct effects of gene mutations on synapse phenotypes, independent of environmental influences. This property is highly advantageous for screening chemical compounds that rescue spine phenotypes. Neuronal firing patterns and receptor functions can also be easily controlled in a culture system. The difference in spine structure between ASD- and schizophrenia-related mouse models is valuable information to establish a drug screening system.
  
  Fourth, establishing an in vitro system for evaluating synapse phenotypes could reduce the need for animal experiments. Researchers should be aware of the 3Rs principles. In the future, combined with differentiation techniques for human iPS cells, our in vitro approach will enable the evaluation of disease-related spine phenotypes without the need for animal experiments. The effort to establish a reliable culture system should not be eliminated.
  
  We modified our text to have a balanced discussion on both advantages and disadvantages of the in vitro culture system in the study of mental disorder mouse models, as follows:
  
  "Finally, while the spine phenotype identified in the human postmortem brain undoubtedly resulted from complex interactions among genetic background, environmental influences, and regulation by non-neuronal cells, data from pure neuronal cultures are more likely to reflect the direct effects of schizophrenia-related gene mutations on synaptic functions. This property may be advantageous for identifying synaptic molecules that regulate synapse phenotypes in schizophrenia-related mouse models. However, the phenotype observed in the culture system requires confirmation using in vivo experiments of mouse models or human tissue samples. Efficient in vitro screening combined with reliable in vivo evaluation of synapses will facilitate translational research on mental disorders."
  
  Another weakness is that CaMKIIαK42R/K42R mutant mice are presented as a schizophrenia model, the authors justifying this by saying that "CaMKII-related signaling pathway disruption has been implicated in the working memory deficits found in schizophrenia patients". Since mutations in CAMK2A cause autosomal dominant intellectual developmental disorder-53 (OMIM 617798) and autosomal recessive intellectual developmental disorder-63 (OMIM 618095), and mice carrying the CAMK2A E183V mutation exhibit ASD-related synaptic and behavioral phenotypes (PMID: 28130356), I think it's stretching credibility to refer to the CaMKIIαK42R/K42R mice as a schizophrenia model.
  
  We agree with this reviewer that CAMK2A mutations in humans are linked to multiple mental disorders, including developmental disorders, ASD, and schizophrenia. Association of gene mutations with the categories of mental disorders is not straightforward, as the symptoms of these disorders also overlap with each other. For the CaMKIIα K42R/K42R mutant, we considered the following points in its characterization as a model of mental disorder. Analysis of CaMKIIα +/- mice in Dr. Tsuyoshi Miyakawa's lab has provided evidence for the reduced CaMKIIα in schizophrenia-related phenotypes (Yamasaki et al., Mol Brain 2008; Frankland et al., Mol Brain Editorial 2008). It is also known that the CaMKIIα R8H mutation in the kinase domain is linked to schizophrenia (Brown et al., 2021). Both CaMKIIα R8H and CaMKIIα K42R mutations are located in the N-terminal domain and eliminate kinase activity. On the other hand, the representative CaMKIIα E183V mutation identified in ASD patients exhibits unique characteristics, including reduced kinase activity, decreased protein stability and expression levels, and disrupted interactions with ASD-associated proteins such as Shank3 (Stephenson et al., 2017). Importantly, reduced dendritic spines in neurons expressing CaMKIIα E183V is a property opposite to that of the CaMKIIα K42R/K42R mutant, which showed increased spine density (Koeberle et al. 2017).
  
  References related to this discussion.
  
  (1) Yamasaki et al., Mol Brain. 2008 DOI: 10.1186/1756-6606-1-6
  
  (2) Frankland et al. Mol Brain. 2008 DOI: 10.1186/1756-6606-1-5
  
  (3) Stephenson et al., J Neurosci. 2017 DOI: 10.1523/JNEUROSCI.2068-16.2017
  
  (4) Koeberle et al. Sci Rep. 2017 DOI: 10.1038/s41598-017-13728-y
  
  (5) Brown et al., iScience. 2021 DOI: 10.1016/j.isci.2021.103184
  
  We fully agree with the reviewer that different CAMK2A mutations likely cause distinct phenotypes observed in the broad spectrum of mental disorders. In the revised manuscript, we include a discussion of the relevant literature to categorize this mouse model appropriately.
  
  "CaMKII-related signaling pathway disruption has been implicated in the working memory deficits found in schizophrenia patients [45,46]. CAMK2A mutations in humans are linked to multiple mental disorders, including developmental disorders, ASD, and schizophrenia [47]. The K42R mutation of CAMK2A does not correspond to any known human genetic variant, but the CAMK2A R8H mutation is linked to schizophrenia [48]. Both R8H and K42R mutations in the N-terminal domain of CaMKIIα eliminate kinase activity; these mutations may have a similar impact on human mental disorders."
  
  Although the manuscript is largely well written, there are some instances of ambiguous/unspecific language. This extends to the title (Decoding Spine Nanostructure in Mental Disorders Reveals a Schizophrenia-1 Linked Role for Ecrg4), which gives no indication that the work was in vitro on cultured neurons derived from mouse models.
  
  We appreciate the reviewer for pointing out the lack of information about the experimental system in the title of this manuscript. According to the suggestion of the reviewer, we modified the title as "Decoding spine nanostructure in cultured neurons derived from mouse models of mental disorder reveals a schizophrenia-linked role for Ecrg4".
  
  Reviewer #2 (Public review):
  
  Okabe and colleagues build on a super-resolution-based technique that they have previously developed in cultured hippocampal neurons, improving the pipeline and using it to analyze spine nanostructure differences across 8 different mouse lines with mutations in autism or schizophrenia (Sz) risk genes/pathways. It is a worthy goal to try to use multiple models to examine potential convergent (or not) phenotypes, and the authors have made a good selection of models. They identify some key differences between the autism versus the Sz risk gene models, primarily that dendritic spines are smaller in Sz models and (mostly) larger in autism risk gene models. They then focus on three models (2 Sz - 22q11.2 deletion, Setd1a; 1 ASD - Nlgn3) for time-lapse imaging of spine dynamics, and together with computational modelling provide a mechanistic rationale for the smaller spines in Sz risk models. Bulk RNA sequencing of all 8 model cultures identifies several differentially expressed genes, which they go on to test in cultures, finding that ecgr4 is upregulated in several Sz models and its misexpression recapitulates spine dynamics changes seen in the Sz mutants, while knockdown rescues spine dynamics changes in the Sz mutants. Overall, these have the potential to be very interesting findings and useful for the field. However, I do have a number of major concerns.
  
  We thank the reviewer for evaluating our findings as potentially very interesting and useful.
  
  (1) The main finding of spine nanostructure changes is done by carrying out a PCA on various structural parameters, creating spine density plots across PC1 and PC2, and then subtracting the WT density plot from the mutant. Then, spines in the areas with obvious differences only are analyzed, from which they derive the finding that, for example, spine sizes are smaller. However, this seems a circular approach. It is like first identifying where there might be a difference in the data, then only analyzing that part of the data. I welcome input from a statistician, but to me, this is at best unconventional and potentially misleading. I assume the overall means are not different (although this should be included), but could they look at the distribution of sizes and see if these are shifted?
  
  We appreciate the reviewer's concern regarding our analysis of spine population data. The intention of pre-selecting the areas showing differences between wild-type and mutant was to make a direct comparison between two subareas (one is enriched with wild-type spines and the other is enriched with mutant spines) and clarify that the spines of schizophreniarelated mouse models were smaller than wild-type spines. Conventional methods of comparing the total spine population using simple size parameters are not useful for this purpose, as shown in Supplementary Figure 2.
  
  To clarify the reviewer's concern, we revised the analysis of the spine population data for both Figure 3 and Figure 8.
  
  Figure 3: We first divided the feature space projected onto PC1 and PC2 into four areas with distinct structural properties: (1) small and short, (2) small and long, (3) large and short, and (4) large and long. Next, we calculated the normalized spine counts in the four areas for both wild-type and mutant spines and obtained the relative ratio (mutant/wild-type) for each area. As we performed three independent SIM imaging experiments (in one, we imaged both wild type and mutant culture dishes prepared from the same pregnant mouse), there are three independent datasets from 8 mouse models.
  
  We found that the spine ratio (mutant/wild-type) only in area 2 (small and long spines) differed significantly between genotypes. This result is shown in Fig. 3 and explained in the text. The spine ratios in areas 1 and 3 did not show a clear relationship to the genotypes, while the ratio in area 4 showed the opposite trend to that in area 2. The opposite trend between areas 2 and 4 indicates enrichment of both small and long spines in schizophrenia-related mouse models, consistent with our previous analysis.
  
  Figure 8: In this analysis, we aimed to evaluate the rescue effect of Ecrg4 shRNA relative to that of control shRNA. If Ecrg4 shRNA is effective, the spine population enriched in the control shRNA condition should be reduced in the Ecrg4 shRNA condition. To confirm this point in the revised manuscript, we first defined areas in the projected PC1-PC2 plane showing either enrichment or depletion of spines in the control shRNA condition (spine numbers increasing or decreasing by more than 3 × SD). We next measured the difference in spine numbers between the control and Ecrg4 shRNA conditions in either enriched or depleted areas. The expectation is that Ecrg4 shRNA treatment reduces the extent of both enrichment and depletion. The effect was significant in both the 22qdel and Setd1a mouse models, as indicated by permutation tests. This analysis was explained in the revised manuscript.
  
  (2) Despite extracting 64 parameters describing spine structure, only 5 of these seemed to be used for the PCA. It should be possible to use all parameters and show the same results. More information on PC1 and PC2 would be helpful, given that the rest of the paper is based on these - what features are they related to?
  
  We thank the reviewer for the advice on providing the rationale for parameter selection in PCA. We divided spines into 160-nm segments along their long axis, and the spine segments were used to calculate the 64 parameters, which include volume of each spine segment (20 segments), convex hull volume of each spine segment (20 segments), and convex hull ratio of each spine segment (20 segments). As most spines are shorter than 0.16 × 20 =3.2 μm, these segment-related parameters contain a large fraction of zero values, which affect the proper calculation of principal components. Therefore, we selected two parameters that reflect the principal structural features (length and volume), together with three other parameters that were mutually independent and also independent from the first two parameters (pairwise correlation coefficients < 0.3). These selection criteria were described in the original manuscript. We also confirmed that PCA using all 64 parameters yields a cross correlation map similar to that shown in Fig. 2B.
  
  Author response image 1.
  
  We provided additional information in the Materials and Methods section of the revised manuscript.
  
  As described previously, the pattern of four areas with distinct spine structures (1. small and short, 2. small and long, 3. large and short, 4. large and long) supports the idea that the PC1PC2 plane reflects the relationship between spine volume and length (Fig. 3A and B).
  
  These specific features could then be analyzed in the full dataset, without doing the cherry picking above.
  
  We provided the dataset for the relative enrichment of spine counts across four areas of the PC1-PC2 plane in Fig. 3A and B. This analysis provides a comprehensive view of spine population properties related to spine volume and length, without relying on a pre-set region of interest.
  
  It would also be helpful to demonstrate whether PC1 and 2 differ across groups - for example, the authors could break their WT data into 2 subsets and repeat the analysis.
  
  We noticed differences in the pattern of spine distribution across the PC1-PC2 planes in each experiment. The subtraction of the distributional data between wild-type and mutant samples effectively cancels out such differences. In general, the difference between two wild-type samples is smaller than that between wild-type and mutant samples, as shown in Author response image 2.
  
  Author response image 2.
  
  We added a description of variation across groups to the revised manuscript.
  
  (3) Throughout the paper, the 'n' used for statistical analysis is often spine, which is not appropriate. At a minimum, cell should be used, but ideally a nested mixed model, which would take into account factors like cell, culture, and animal, would be preferable. Also, all of these factors should be listed, with sufficient independent cultures.
  
  We agree that nested mixed models are more appropriate for evaluating genotype effects in most of our datasets. We confirm that the results of statistical analysis using nested mixed models were consistent with our previous conclusions in most cases.
  
  Figure 3: We performed three independent primary cultures of embryonic hippocampal tissue with genotypes of both wild-type and mutant from the same pregnant mice for each mouse model. In our new Figure 3, each data point represents an independent culture experiment, and group comparisons were performed using one-way ANOVA followed by Tukey's post hoc test. In this analysis, statistical analysis using neurons as units of 'n' is not possible, as the number of spines measured from a single neuron is insufficient to generate the density map shown in Figure 3. The statistical analysis was described in the revised text. The details of experimental conditions related to Figure 3 are provided in Supplementary Table 1.
  
  Figure 5A-C: We analyzed spine turnover rate using a linear mixed-effects model with genotype as a fixed effect and plate, cell, and dendrite as nested random effects. In both 22q deletion model and Setd1a model, there were significant effects of genotype (F(1,25) = 5.79, p = 0.024 for 22q deletion model and F(1,22) = 7.33, p = 0.013 for Setd1a model). In contrast, Nlgn3 mutant neurons did not show a significant difference (F(1,14) = 1.35, p = 0.26). This analysis was described in the revised text.
  
  Figure 5D-F: Spine lifetime was analyzed using a linear mixed-effects model accounting for the hierarchical structure of the data (spines nested within dendrites, cells, and culture plates). The analysis revealed a significant effect of genotype in both 22q deletion mutant and Setd1a mutant (22qdel mutant; F(1,336) =5.33, p=0.022, Setd1a mutant; F(1,282)=6.38, p=0.012 ). The neurons of both mutants exhibited significantly longer spine lifetimes compared with wild-type neurons (22qdel mutant; ratio = 1.28, 95% CI 1.04–1.58, Setd1a mutant; ratio = 1.35, 95% CI 1.07–1.70). In contrast, Nlg3 mutation did not significantly alter spine lifetime (ratio = 0.86, 95% CI 0.61–1.22; F(1,220)=0.69, p=0.41). This analysis was described in the revised text.
  
  Figure 5G-I: Spine volume trajectories were analyzed using linear mixed-effects models incorporating nested random effects (spine/dendrite/cell/culture plate) to account for the hierarchical structure of the data. In the 22q deletion model, newly formed spines were significantly smaller than those in wild-type neurons (genotype effect: p < 0.001). The spines in Setd1a mutant neurons also displayed significantly smaller volume than those in wild-type neurons (p < 10<sup>-7</sup>). There were also differences in the temporal profiles of spine growth in these two mutants (p < 0.001). In contrast, newly formed spines in the Nlgn3 mutant neurons were significantly larger than those in wild-type neurons (p < 10<sup>-4</sup>) with preserved time-course of spine growth. This analysis was described in the revised text.
  
  Figure 5J-L: Similar analyses using linear mixed-effects models incorporating nested random effects (spine within dendrite within cell within culture plate) identified significantly smaller initial spine size in the 22q deletion model (p < 10<sup>⁻6</sup>), while no significant differences in the initial spine volume were found for Setd1a mutants. The temporal trajectories of spine shrinkage before their loss were also not significantly altered in both 22qdel and Setd1a mutants. The Nlg3 mutant showed a significantly different time-course of spine shrinkage (p < 0.05), while the initial spine size was not altered. This analysis was described in the revised text.
  
  Figure 7A overexpression dataset: We analyzed plate-averaged lifetime values using a linear mixed-effects model with treatment as a fixed effect. There exists a significant main effect of treatment (F(3,8) = 4.59, p = 0.038), with post hoc examination showing a significant increase in lifetime by Ecrg4 overexpression (β = 0.49 ± 0.16 SE, t(8) = 3.16, p = 0.013). Figure 7A shRNA dataset: We also applied a linear mixed-effects model for plate-averaged lifetime values with treatment as a fixed effect. The analysis revealed no significant effect of treatment (F(2,6) = 0.29, p = 0.76).
  
  The analyses of overexpression and shRNA datasets were described in the revised text.
  
  Figure 8: As in Figure 3, we performed three independent primary cultures of embryonic hippocampal tissue with genotypes of both wild-type and mutant from the same pregnant mice for each mouse model. The culture plates were transfected with either a control shRNA or an Ecrg4 shRNA construct. Each data point represents an independent culture experiment, and the effect of Ecrg4 shRNA relative to that of control shRNA was evaluated using a permutation test. The data analysis was described in the revised text. The details of experimental conditions related to Figure 8 are provided in Supplementary Table 1.
  
  (4) The authors should confirm that all mutants are also on the C57BL/6J background, and clarify whether control cultures are from littermates (this would be important). Also, are control versus mutant cultures done simultaneously? There can be significant batch effects with cultures.
  
  The mutant mice we used in this study are on C57BL/6J or C57BL/6N background. It is known that C57BL/6J or C57BL/6N mice exhibit distinct phenotypes across a range of physiological, biochemical, and behavioral systems. However, it is less likely that our analysis is affected by differences between C57BL/6J and C57BL/6N, as we compared wild-type and mutant littermates on the same genetic background. This experimental design can also reduce the batch effects with different culture preparations. This point was described in the revised text.
  
  (5) The spine analysis uses cultures from 18-22 DIV - this is quite a large range. It would be worth checking whether age is a confounder or correlated with any parameters / principal components.
  
  We described in the method sections that culture samples were processed for imaging at 18-22 DIV. However, all the SIM imaging experiments for eight mutant mouse models were performed on samples fixed at DIV 19. The wide range of imaging experiments (DIV 18-22) includes test samples we used to optimize imaging conditions. In the revised manuscript, we specified the timing of SIM imaging.
  
  (6) The computational modelling is interesting, but again, I am concerned about some circularity. Parameter optimization was used to identify the best fit model that replicated the spine turnover rates, so it is somewhat circular to say that this matched the observations when one of these is the turnover rate.
  
  We appreciate the reviewer's comment on some circularity of the argument. We agree that the turnover rate is already incorporated into the simulation model and is not an appropriate criterion for the evaluation. We modified the text accordingly.
  
  It is more convincing for spine density and size, but why not go back and test whether parameter differences are actually seen - for example, it would be possible to extract the probability of nascent spine loss, etc.
  
  We thank the reviewer for giving this important suggestion. The probability of nascent spine loss is an important parameter, and we initially attempted to estimate it from the original data set. However, the upper limit of our time-lapse imaging is 24 h, which is insufficient to distinguish stable and nascent spines clearly. The difficulty of extracting all the necessary parameters for spine remodeling is our motivation for starting this computational modelling.
  
  More compelling would be to repeat the experiments and see if the model still fits the data. In the interpretation (line 314-318) it is stated that '... reduced spine maturation rate can account for the three key properties of schizophrenia-related spines...', which is interesting if true, but it has just been stated that the probability of spine destabilization is also higher in mutants (line 303) - the authors should test whether if the latter is set to be the same as controls whether all the findings are replicated.
  
  As suggested by the reviewer, we set the probability of spine destabilization equal across wild-type and mutant models and repeated the simulations. The results indicate that this modification has small effects on spine density (0.61 vs 0.62), spine turnover rate (0.22 vs 0.21), fraction of small spines (0.21 vs 0.20), and mean spine size (0.37 vs 0.36). We described this point in the revised manuscript.
  
  (7) No validation for overexpression or knockdown is shown, although it is mentioned in the methods - please include.
  
  As suggested by the reviewer, we validated overexpression and knockdown. The results are summarized in Supplementary Figure 8.
  
  Supplementary Figure 8A-C shows the immunocytochemistry of anti-Ecrg4, anti-Cip4, and anti-NPAS4 for the confirmation of overexpression of these molecules.
  
  Supplementary Figure 8D-E shows the confirmation of the appropriate size of exogenously expressed Ecrg4, Cip4, and NPAS4 by immunoblotting. (previous Supplementary Figure 10F is now Supplementary Figure 8E).
  
  Supplementary Figure 8F-H indicates the efficient knockdown of exogenously expressed Met-GFP, ARHGAP15-GFP, and Ecrg4-HA by respective shRNA constructs in COS-7 cells. (previous Supplementary Figure 10G is now Supplementary Figure 8H)
  
  Also, for the knockdown, a scrambled shRNA control would be preferable.
  
  We used Stealth RNAi Negative Control Duplexes (Invitrogen) as the shRNA control in this study. To confirm that this RNAi sequence does not affect spine turnover, we performed timelapse imaging of neurons transfected with GFP alone or with GFP and the Stealth RNAi Negative Control. No detectable change in spine turnover was observed (Supplementary Figure 8I), indicating that this RNAi control sequence is suitable for our study.
  
  (8) The finding regarding ecgr4 is interesting, but showing that some ecgr4 is expressed at boutons and spines and some in DCVs is not enough evidence to suggest that actively involved in the regulation of synapse formation and maturation (line 356).
  
  To reveal the active roles of Ecrg4 in spine regulation, we exogenously applied a synthetic Ecrg4 peptide to wild-type neurons and monitored both spine density and turnover rate after Ecrg4 application. The Ecrg4 application increased the spine turnover rate, whereas samples treated with the scrambled peptide did not. This result supports the active role of Ecrg4 in regulating spine turnover. The data were added as Supplementary Figures 9F and G.
  
  (9) The same caveats that apply to the analysis also apply to the ecgr4 rescue. In addition, while for 22q the control shRNA mutant vs WT looks vaguely like Figure 2, setd1a looks completely different.
  
  We thank the reviewer for pointing out the apparent difference in the pattern of spine population data between Figure 2 and Figure 8. We performed SIM analysis using DiI-labeled neurons in Figure 2, whereas the data in Figure 8 are derived from GFP-expressing neurons. The images of cell-surface labeling and cytoplasmic labeling cannot be analyzed in the same way, as it is necessary to adjust parameters in SIM image processing and PCA-based dimensional reduction. Consequently, the distribution of the spine population projected onto the PC1-PC2 plane differs between DiI-labeled neurons and GFP-expressing neurons. To facilitate the comparison of PCA analysis applied to GFP-expressing neurons, we replaced the weight matrix for GFP-expressing neurons with that previously calculated for the DiIlabeled neurons. This adjustment increased the similarity of the data distributions shown in Figures 2 and 8. The explanation for the different patterns in the spine population map between Figure 2 and Figure 8 was added to the revised text. The related explanation for the data processing was described in the Materials and Methods.
  
  And if rescued, surely shRNA in the mutant should now resemble control in WT, so there shouldn't be big differences, but in fact, there are just as many differences as comparing mutant vs wild-type? Plus, for spine features, they only compare mutant rescue with mutant control, but this is not ideal - something more like a 2-way ANOVA is really needed. Maybe input from a statistician might be useful here?
  
  We appreciate the reviewer's important comment and agree that the analytical approach used in the original manuscript was not optimal. We therefore revised our analysis to examine whether the difference observed between wild-type and mutant neurons was reduced by suppression of Ecrg4 expression.
  
  To this end, we first identified two regions in the PC1–PC2 plane where mutant spines were either enriched or depleted relative to wild-type neurons (Areas A and B). We then counted the number of spines located in Areas A and B in control shRNA-treated mutant neurons (normalized spine counts XA and XB). Next, we quantified spine counts in the same areas using data from Ecrg4-suppressed mutant neurons (normalized spine counts YA and YB). If XA > YA and XB < YB, suppression of Ecrg4 would indicate a shift toward rescue of the phenotype observed in control shRNA-treated mutant neurons. Indeed, the datasets were consistent with this shift in relative spine counts.
  
  To determine whether these differences exceeded those expected from random variation in spine counts, we performed a permutation test. Specifically, spine identities were randomly shuffled between the two conditions while preserving the total number of spines in each dataset. The observed differences were then compared with the distribution obtained from the permuted datasets to assess statistical significance.
  
  We found that all three culture replicates showed statistical significance in both areas A and B for both the 22qdel and Setd1a mutations. This analysis is described in the Result section.
  
  (10) Although this is a study entirely focused on spine changes in mouse models for Sz, there is no discussion (or citation) of the various studies that have examined this in the literature. For example, for Setd1a, smaller spines or reduced spine densities have been described in various papers (Mukai et al, Neuron 2019; Chen et al, Sci Adv 2022; Nagahama et al, Cell Rep 2020).
  
  We appreciate the reviewer's suggestion to include a discussion of schizophrenia-related mouse models. We added more information related to the Setd1a mouse model to the Discussion section.
  
  "Population-level spine properties were more homogeneous in schizophrenia models (those with gene mutations implicated in schizophrenia) than in the other 4 models studied, in part due to a shared tendency for smaller spines. This observation is consistent with previous studies on Setd1a mutant mice, which showed reduced spine width, decreased mushroomtype spines, and lower spine density in the prefrontal cortex [43,56,57]. In contrast to these findings, several previous studies reported reduced numbers of small spines in the postmortem cortical tissues of schizophrenia patients [22,58]. "
  
  (11) There is a conceptual problem with the models if being used to differentiate autism risk from Sz risk genes. It is difficult to find good mouse models for Sz, so the choice of 22q11.2del and Setd1a haploinsufficiency is completely reasonable. However, these are both syndromic. 22qdel syndrome involves multiple issues, including hearing loss, delayed development, and learning disabilities, and is associated with autism (20% have autism, as compared to 25% with Sz). Similarly, Setd1a is also strongly associated with autism as well as Sz (and also involves global developmental delay and intellectual disability). While I think this is still the best we can do, and it is reasonable to say that these models show biased risk for these developmental disorders, it definitely can't be used as an explanation for the higher variability seen in the autism risk models.
  
  We appreciate the reviewer's suggestion for more careful consideration of the interpretation of phenotypes in mouse models, with regard to their relation to clinical phenotypes in human patients. According to the suggestion of the reviewer, we modified the relevant text as follows:
  
  "The nanoscale features of dendritic spines in ASD-associated mouse models were more variable than those in schizophrenia-associated mouse models. This difference may be related to the broader clinical spectrum of ASD, which ranges from mild impairments in social skills to severe intellectual disability. The four ASD-associated mouse models examined in this study, Nlgn3<sup>R451C/(y or R451C) , Syngap1<sup>+/-</sup>, POGZ<sup>Q1038R/+</sup>, and 15q11-13<sup>dup/+</sup>, may represent subgroups with different levels of hippocampal dysfunction. Among the four ASD-associated mouse models, 15q11-13<sup>dup/+</sup> showed population-level spine properties closer to those of the schizophrenia models. To understand this similarity, further analysis of neural circuit changes in both ASD- and schizophrenia-associated mouse models will be necessary. Analysis of the relationships between rare genetic variants and synapse phenotypes in mouse models may contribute to their eventual categorization. This information should be useful to understand the underlying mechanisms of the broader clinical spectrum of ASD."
  
  (12) I am not convinced that using dissociated cultures is 'more likely to reflect the direct impact of schizophrenia-related gene mutations on synaptic properties' - first, cultures do have non-neuronal cells, although here glial proliferation was arrested at 2 days, glia will be present with the protocol used (or if not, this needs demonstrating).
  
  In our culture system, the density of non-neuronal cells is low, and most neurons are not in direct contact with non-neuronal cells. We reported this method in Nat. Neurosci. 1999, where we utilized this culture system to visualize GFP-tagged PSD-95 in neurons using recombinant adenovirus. Because recombinant adenovirus shows higher infection efficiency in glial cells, it was essential for us to establish a culture condition that isolates neurons from glial cells.
  
  Second, activity levels will affect spine size, and activity patterns are very abnormal in dissociated cultures, so it is very possible that spine changes may not translate into in vivo scenarios. Overall, it is a weakness that the dissociated culture system has been used, which is not to say that it is not useful, and from a technical and practical perspective, there are good justifications.
  
  We appreciate the reviewer's comment on the advantages and disadvantages of using an in vitro culture system. This comment aligns with the first reviewer's. We modified our text to have a balanced discussion on the role of the in vitro culture system in the study of mental disorder mouse models as follows:
  
  "Finally, while the spine phenotype identified in the human postmortem brain undoubtedly resulted from complex interactions among genetic background, environmental influences, and regulation by non-neuronal cells, data from pure neuronal cultures are more likely to reflect the direct effects of schizophrenia-related gene mutations on synaptic functions. This property may be advantageous for identifying synaptic molecules that regulate synapse phenotypes in schizophrenia-related mouse models. However, the phenotype observed in the culture system requires confirmation using in vivo experiments of mouse models or human tissue samples. Efficient in vitro screening combined with reliable in vivo evaluation of synapses will facilitate translational research on mental disorders."
  
  (13) As a minor comment, the spine time-lapse imaging is a strength of the paper. I wonder about the interpretation of Figure 5. For example, the results in Figure 5G and J look as if they may be more that the spines grow to a smaller size and start from a smaller size, rather than necessarily the rate of growth.
  
  We thank the reviewer for the insightful comment. In the revised manuscript, we analyze the time-lapse data using linear mixed-effects models incorporating nested random effects (spine/dendrite/cell/culture plate). This analysis suggested the difference in the initial size of spines. This point is described in the revised manuscript as follows:
  
  "Schizophrenia-associated mouse models showed higher similarity in spine morphology, driven by reduced size and growth of nascent spines."
  
  "We further compared the initial increase in spine volume between genotypes (Figure 5G-I). Linear mixed-effects models incorporating nested random effects revealed significantly smaller initial spine volumes in both 22q11.2<sup>del/+</sup> and Setd1a<sup>+/-</sup> models (genotype effect: p < 0.001 for 22q11.2<sup>del/+</sup> and p < 10<sup>-7</sup> for Setd1a<sup>+/-</sup>). The spines in both mutants also displayed a significant reduction in spine volume increase (p < 0.001). In contrast, newly formed spines in the Nlgn3<sup>R451C/(y or R451C)</sup> neurons were significantly larger than those in wild-type neurons (p < 10<sup>-4</sup>) with preserved time-course of spine growth.”
  
  We tested whether the initial size difference in spines can be incorporated into the computational simulation. However, due to the large variability in the initial spine size, it was difficult to perform parameter optimization in the model with additional factors. Therefore, we did not further pursue this possibility in this revision. This point is described in the revised text.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  The manuscript would be strengthened if the following issues were adequately addressed:
  
  (1) It would be helpful to know more about the in/ex vivo dendritic spine phenotype of the mouse models of neuropsychiatric disorders, to allow readers to judge whether and how the in vitro spine phenotype in hippocampal neuronal cultures overlaps with/replicates the spine phenotype within the mouse brain.
  
  We appreciate this comment, but our currently available data is insufficient to specify the difference between in vitro and in vivo spine phenotypes. Our previous study, published in Nature. Comm. (2019), provided data showing that the overall distribution of spine size is similar between in vivo and in vitro conditions in the mouse hippocampus.
  
  (2) Although the manuscript is largely well written, there are instances of ambiguous language, particularly when describing the spine phenotypes. For example, we are told that "ASD mouse models showed a tendency of decreasing spine subpopulation with small volumes." This description and other examples should be expressed more clearly.
  
  Following the reviewer's suggestions, we revised the text to improve clarity. We modified the sentence "ASD mouse models showed a tendency of decreasing spine subpopulation with small volumes" to "ASD-related mouse models showed an opposite spine phenotype."To avoid possible confusion for readers, we have revised several sentences in the text to clarify the intended meaning.
  
  Also, I question whether the word "decoding", meaning to convert (a coded message) into intelligible language, is the most appropriate for the title and abstract.
  
  The original meaning of the word "decoding" is the conversion of a coded message into an intelligible form; however, in this study, we use the term in a broader sense, referring to the extraction of latent population-level properties of dendritic spines from multidimensional structural parameters. We believe this usage is consistent with its common use in neuroscience and systems biology, where "decoding" often refers to inferring underlying biological states or information from complex datasets.
  
  (3) The authors should reconsider whether CaMKIIαK42R/K42R mice should be described as a schizophrenia model, when mutations in CAMK2A are known to cause autosomal dominant intellectual developmental disorder-53 (OMIM 617798) and autosomal recessive intellectual developmental disorder-63 (OMIM 618095), and mice carrying the CAMK2A E183V mutation exhibit ASD-related synaptic and behavioral phenotypes (PMID: 28130356).
  
  We provided a detailed answer to this question in the previous part of the rebuttal.
  
  (4) The title doesn't adequately summarise the contents of the manuscript. It should mention mice/mouse models and cultured neurons.
  
  We also responded to this request in the previous part of the rebuttal.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Please provide a supplementary table with all DEGs. Also, DEGs are listed if present in 'more than 2' models - does this mean they had to be in 3 or more? Please clarify.
  
  According to the reviewer's suggestion, we added data on DEGs shared by >2 mouse models in Supplementary Figure 7. We also added Supplementary Tables 2 and 3 for all DEGs. The phrase "in more than 2 models" means "in 3 or 4 models".
  
  (2) There are several references to 'schizophrenia mouse models' - it is worth rephrasing this to make clear that these are not mice with schizophrenia.
  
  We replaced the expression "schizophrenia (or ASD) mouse models" with "schizophrenia (or ASD)-associated mouse models" or similar appropriate wording throughout the manuscript.
  
  (3) Line 66: 'a recent...' - 2014 is not really recent.
  
  We removed the word "recent" from the sentence.
  
  (4) Figure S1: The legend says A-D, but they are not on the figure. Also, make clear whether this data is only WT data - it seems to be from disorder models, with 4 colors for each model - please clarify.
  
  We changed the sentence from "shown as A to D" to "shown as A to C". The datasets in Supplementary Figure 1 are wild-type only. Each graph uses four colors to represent wildtype data from four imaging datasets obtained from different mouse models. Graphs A to C correspond to spine length, surface area, and volume, respectively.
  
  (5) Methods, line 680-4: More detail here would be helpful.
  
  We added more explanation for the generation of subtraction maps.
  
  (6) Line 193: Make it clear this is hippocampal in the main text.
  
  We added "cultures of embryonic hippocampi" to the text.
  
  (7) Figure 5, D-F: Make clear that these are transient spines (as per main text)
  
  We added "Lifetimes of transient spines" to both the main text and figure legend.
  
  (8) Figure 6B: More detail is needed; no idea what this is - no axis label. D - also not clear what numbers on the y-axis mean. E - color scale??
  
  We added details to the figure legend, the axis labels for Figures 6B and 6D, and the color scale for Figure 6E.
  
  (9) Supplementary Figure 9 - not clear what matrices are actually showing, nor what the scale refers to - is this the number of shared DEGs? If so, please make it clearer.
  
  The matrices show the shared DEG numbers, as shown in their titles. The scale indicates DEG numbers. We added the explanation of the color code to the figure legend.
  
  (10) Please make clear in the main text that ecgr4 affected the turnover rate. It would be good to measure other parameters as well.
  
  We added the phrase "a significant increase in spine turnover rate by Ecrg4 overexpression" to the main text.
  
  (11) Figure 7: Suggest to label C on images as well, so obvious which is GFP/anti-HA overlay (and respective colors) and which is anti-HA staining.
  
  We added the labels with respective colors to Figure 7.
  
  (12) Ecgr4 is a precursor protein that is cleaved to produce several hormone-like peptides. Where is the HA tag - so which cleavage products will it label? Any antibodies that work in immunocytochem?
  
  HA tag was attached to the C-terminal domain. We predict that anti-HA binds to four cleavage products (the full-length Ecrg4, Augurin, Argilin, and Δ16). Among several commercially available antibodies, only the SIGMA product could detect cells expressing Ecrg4-HA by immunocytochemistry.
  
  (13) Supplementary Figure 10: Synaptosome would be a good addition.
  
  We isolated the fraction of synaptosomes using Syn-PER™ Synaptic Protein Extraction Reagent in Supplementary Figure 9A. We added this explanation to the Materials and Methods section.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.10.675343v2
www.biorxiv.org www.biorxiv.org

Region-specific mechanosensation modulates Drosophila postural control behaviour

1
1. Public_Reviews 19 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Strengths:
  
  Strengths of this paper include the important question addressed and the elegant and innovative combination of methods, which led to clear insights into the sensory biology of self-righting, and that will be useful for others in the field. This is a substantial contribution to understanding how animals correct their body position. The manuscript is very clearly written and couched in interesting biology.
  
  Limitations:
  
  (1.1) The interpretation of functional experiments is complicated by the proposed excitatory and inhibitory roles of dorsal and ventral sensory neuron activity, respectively. So, while silencing of an excitatory (dorsal) element might slow righting, silencing of inputs that inhibit righting could speed the behavior. Silencing them together, as is done here, could nullify or mask important D-V-specific roles. Selective manipulation of cells along the D-V axis could help address this caveat.
  
  We highly appreciate the thoughtful comments by Rev1 pointing out the relative simplicity of our current inferences regarding the role of dorsal vs. ventral substrate contact, and agree with the suggestion that cells along the DV axis could have diverse roles in their contribution to self-righting. In this context, we wish to point out two aspects, one theoretical and one practical. Regarding theory, our view is that this may not be a simple case of “excitation vs. inhibition”, but rather one in which the coordinated and dynamic activity of distributed sensory neurons promotes differential action selection in alignment with environmental conditions – a framework that could involve many different behaviours with a still uncertain level of granularity (e.g., is self-righting different if the larva is rotated to 160º instead of exactly 180º?). Regarding the practical aspect, while this area represents a fascinating point for future investigation, it is currently limited by technological development, particularly in the context of this study where a relatively low-cost implementation has been used to probe the AP axis. Investigation of the DV axis would require further technological development, since optogenetic light would need to be precisely delivered from the side rather than from underneath, with a greater degree of resolution compared to the AP axis given the much smaller width of the larva (~120-140µm) relative to its length (~550-600µm). Therefore, whilst we appreciate these comments and suggestion, we believe this line of experiments is ideal for a follow-up investigation, rather than being implemented in the current study.
  
  (1.2) Prior studies from the authors implicated daIV neurons in the righting response. One of the main advances of the current manuscript is the clever demonstration of region-specific roles of sensory input. However, this is only confirmed with a general md driver, 190(2)80, and not with the subsetspecific Gal4, so it is not clear if daIV sensory neurons are also acting in a regionally-specific manner along the A-P axis.
  
  To address this interesting and important comment by Rev1 we have carried out a new experiment using an alternative driver to 109(2)80-Gal4 and testing the impact of these manipulations on larval behaviour. The revised version of our MS includes a new figure Supp Fig S3 which shows self-righting times when using the ppk-Gal4 driver with the opto-axial technique. As observed with the 109(2)80-Gal4 driver, self-righting was delayed in anterior but not posterior inhibition conditions, suggesting the daIV neurons act in a region-specific manner to trigger postural control behaviour.
  
  We have also conducted a head casting analysis in the ppk domain; in another new figure, Supp Fig S7, we also show that head casting behaviour is also increased in the same manner as with the 109(2)80-Gal4 driver.
  
  These new panels and figures are cited within the sub sections entitled “Optogenetic inhibition of anterior but not posterior multidendritic neurons delays self-righting” and “Inhibition of anterior multidendritic neurons is associated with increased head casting during self-righting”, on pages 25 and 28, respectively. We are grateful to Rev1 for this suggestion, which we consider qualitatively improves our paper.
  
  (1.3) The manuscript is narrowly focused on sensory neurons that initiate righting, which limits the advance given the known roles for daIV neurons in righting. With the suite of innovative new tools, there is a missed opportunity to gain a more general understanding of how sensory neurons contribute to the righting response, including promoting and inhibiting righting in different regions of the larva, as well as aspects of proprioceptive sensing that could be necessary for righting and account for some of the observed effects of 109(2)80.
  
  Once again, we appreciate this interesting comment by Rev1. We feel our study provides novelty in understanding how sensory neurons in different body regions contribute to the induction of the behaviour. We developed new technology to show that the activity of anterior sensory neurons is essential for normal righting and inhibiting this activity leads to a switch to a different behavioural regime. We feel this represents a substantial advancement in our understanding of how this behaviour is initiated that has not been previously described. Whilst we also appreciate there is likely to be a substantial role of proprioception in self-righting behaviour, our work here focuses on the external stimuli that elicit self-righting, as a detailed understanding of proprioception would be out of scope and require the development of further techniques to manipulate and measure larval posture. As detailed in the above comment, we feel that the more targeted investigation of daIV neurons can also shed some light on the cell-type specificity and inputs to the self-righting induction process.
  
  (1.4) Although the authors observe an influence of Hox genes in righting, the possible mechanisms are not pursued, resulting in an unsatisfying conclusion that these genes are somehow involved in a certain region-specific behavior by their region-specific expression. Are the cells properly maintained upon knockdown? Are axon or dendrite morphologies of the cells disrupted upon knockdown?
  
  We agree with this comment in that further investigating the effects of Hox expression on localised aspects of the sensory system poses an interesting line of investigation. Indeed, we are currently conducting a full scale analysis of Hox gene effects across the sensory field. As things stands, it is not clear how Hox gene expression could affect local sensory processes, a mechanism which could involve morphological changes, changes in neuronal excitability (e.g. due to changes in channel expression), synapse formation and/or efficiency, cell development and identity, and/or combinations of these effects, amongst other possibilities. It is clear that a complete and satisfying investigation of this mechanism for each of the Hox genes would pose a substantial amount of work so, while we acknowledge the merit of Rev1’s comment, we consider that adding a cellular-mechanistic analysis of Hox effects is out of scope for the present study and shall constitute a central matter for a followup study emerging from current projects. We think that our data on Hox expression/function as reported here should serve to open up the analysis of genetic regulation of local sensory function, an area in which we are currently working very actively.
  
  (1.5) There could be many reasons for delays in righting behavior in the various manipulations, including ineffective sensory 'triggering', incoherent muscle contraction patterns, initiation of inappropriate behaviors that interfere with righting sequencing, and deficits in sensing body position. The authors show that delays in righting upon silencing of 109(2)80 are caused by a switch to head casting behavior. Is this also the case for silencing of daIV neurons, Hox RNAi experiments, and silencing of CO neurons? Does daIII silencing reduce head casting to lead to faster righting responses?
  
  This is an insightful comment. In the revised version of the manuscript, we do indeed show that anterior inhibition of daIV neurons leads to the same head casting behaviour as with the 109(2)80 domain, which we interpret as an inability of the larvae to sense the underlying substrate (see page 28). We hope the new data addresses this comment, at least to an extent. While we acknowledge it would also be insightful to run this behavioural analysis for other experimental conditions, such as the daIII inhibition and Hox RNAi lines, these experiments pose a specific technical difficulty: the behavioural analysis relies on a deep neural network (DNN) which was trained solely on recordings of the opto-axial technique, meaning it does not translate well to other experimental situations. This problem is further compounded by the use of L1 larvae, which means recording resolution is insufficient to accurately define the body landmarks used in the posture tracking at a smaller scale. Therefore, the recourse for identifying behavioural changes is manual observation, which we feel is too inconsistent to address a quantitative question like this.
  
  (1.6) 109(2)80 is expressed in a number of central neurons, so at least some of the righting phenotype with this line could be due to silenced neurons in the CNS. This should at least be acknowledged in the manuscript and controlled for, if possible, with other Gal4 lines.
  
  We thank the reviewer for making this interesting comment. We have added a phrase to the section “Conditional inhibition of multidendritic neurons delays self-righting” (p21) which acknowledges the presence of 109(2)80 expression in the CNS (as reported by Hughes and Thomas). We agree that ideally, a variety of sensory Gal4 lines would be used to check for consistency of the effects. However, it is also important to note that 109(2)80 is one of the only available Gal4 lines with near sole md neuron expression, as other Gal4s also drive expression strongly in external sensory cells for example. Thus, re-running experiments with these other lines – which would involve a substantial investment of time and resources – would not be an ideal strategy. We feel that the new observation of (very) similar axial results using the ppk-Gal4, which does express solely in the daIV neurons, better helps to confirm the specificity of the findings to multidendritic neurons.
  
  Other points:
  
  (1.7) Interpretation of roles of Hox gene expression and function in righting response should consider previous data on Hox expression and function in multidendritic neurons reported by Parrish et al. Genes and Development, 2007.
  
  We thank Rev1 for pointing out this study, which is definitively important to discuss given our results on Hox genes. To address this gap, we have added an additional paragraph in the Discussion (p37) to discuss the documented effects of Hox genes on da neuron dendritic morphology and how our results can be interpreted in light of this.
  
  (1.8) The daIII silencing phenotype could conceivably be explained if these neurons act as the ventral inhibitors. Do the authors have evidence for or against such roles?
  
  This is another interesting suggestion. If the daIII neurons were to fulfil this role, then in theory, their inhibition would result in self-righting behaviour under conditions of combined dorsal and ventral substrate contact. This is not an experiment we performed, so we are currently unable to confirm or rule out this possibility. However, we note from casual observation that daIII inhibition does not cause larvae to spontaneously self-right. As mentioned above, our view is not one in which the system has “dorsal/ventral stimulators/inhibitors” for a given behaviour, but that action selection proceeds according to a coordination of many (dynamic) contextual clues. Given the new results with the axial inhibition of daIV neurons (see above) it might be more parsimonious to suggest that these “tiling” neurons are primarily responsible for detecting substrate contact around the full circumference of the animal, rather than this involving different cell types according to the different sides of the body.
  
  Reviewer #2 (Public review):
  
  Strengths:
  
  The work of Roseby et al. does what it says on the tin. The experimental design is elegant, introducing innovative methods that will likely benefit the fly behavior community, and the results are robustly supported, without overstatement.
  
  Weaknesses:
  
  The manuscript is clearly written, flows smoothly, and features well-designed experiments. Nevertheless, there are areas that could be improved. Below is a list of suggestions and questions that, if addressed, would strengthen this work:
  
  (2.1) Figure 1A illustrates the sequence of self-righting behavior in a first instar larva, while the experiments in the same figure are performed on third instar larvae. It would be helpful to clarify whether the sequence of self-righting movements differs between larval stages. Later on in the manuscript, experiments are conducted on first instar larvae without explanation for the choice of stage. Providing the rationale for using different larval stages would improve clarity.
  
  This is a very interesting point raised by Rev2. Most of our previous work on self-righting (e.g. PicaoOsorio et al. 2015 Science; Picao-Osorio, Baldaia et al. 2017 Genetics; Klann et al. 2021 Journal of Neuroscience) was focused on the first instar larva (L1) because this early stage: (i) represents the simplest form of all larval stages, (ii) allows meaningful comparisons with late embryonic processes guiding the development and physiology of the nervous system, (iii) captures the system in a relatively naïve state, that had limited if any exposure to external stimuli. Although these attributes remain valid for the investigation of the sensory stimuli that trigger self-righting, the implementation of the necessary regional physical measurements and manipulations used in this study (surface contact, opto-axial technique, deep neural network analysis) would be impossible to implement in the early forms of the larva simply due to its reduced size. Due to this, we employed L3s, which due to their larger dimensions enabled the development and use of the sophisticated regional stimulation techniques reported here. Yet, as Rev2 rightly points out, we return to the late embryo and early L1 at the point of conducting gene expression analyses as these are optimised for those early stages. The selection of larval stage according to experiment relies on the fact that all forms of the larva display self-righting (Issa, Picao-Osorio, et al. 2019 Current Biology), that SR does not differ according to larval stage and that the characterisation of the structure of the nervous system across larval stages has shown a large level of similarity and consistent topographically arranged connectivity between identified neurons (Gerhard et al. 2017 eLife).
  
  (2.2) What was the genotype of the larvae used for the initial behavioral characterization (Figure 1)? It is assumed they were wild type or w1118, but this should be stated explicitly. This also raises the question of whether different wild-type strains exhibit this behavior consistently or if there is variability among them. Has this been tested?
  
  Thank you to the reviewer for pointing this out. The genotype for Figure 1 was w<sup>1118</sup>; this has now been added to the figure legend and the results section – thank you to Rev2 for pointing this out. Although in this study we did not explicitly compare self-righting (SR) performance in wild type/control genotypes (as we are internally consistent in using w<sup>1118</sup>) based on previous data collected in our lab we know that self-righting times are similar and very consistent amongst inbred control lines such as w<sup>1118</sup>, yw, and Oregon Red. Furthermore, we can also add that when comparing SR times between these inbred populations with a highly polymorphic outbred Drosophila population (Martins et al. 2013 PLoS Pathogens) we observed that their SR time (i.e. 6.14s ± 1.06) was not significantly different from the inbred lines (p<0.05, U test) (Picao-Osorio, J. 2014 Doctoral Thesis, Chapter 4, p112).
  
  (2.3) Could the observed slight leftward bias in movement angles of the tail (Figure 1I and S1) be related to the experimental setup, for example, the way water is added during the unlocking procedure? It would be helpful to include some speculation on whether the authors believe this preference to be endogenous or potentially a technical artifact.
  
  This is an interesting comment, and we recognise that lateral manipulation biases in self-righting could indeed reflect experimental limitations or biological tendencies. At this point we cannot interpret these results as formal evidence of chirality, given that they may reflect subtle aspects of the micromanipulation of specimens. We are currently developing a motorised platform to conduct self-righting tests, which when fully developed, should help addressing the chirality question.
  
  (2.4) The genotype of the larvae used for Figure 2 experiments is missing.
  
  Thank you for pointing this out. These were again w<sup>1118</sup> larvae; this detail has now been added to the figure legend and the main text.
  
  (2.5) The experiment shown in Figure 2E-G reports the proportion of larvae exhibiting self-righting behavior. Is the self-righting speed comparable to that measured using the setup in Figure 1?
  
  Thank you for pointing this out. We have now added average self-righting times to the figure legends of figures 1 and 2. The self-righting times across for the dorsal + ventral contact conditions was notably longer than dorsal-only cases, which were also slightly longer than the “standard” case. This is perhaps to be expected, as the larvae are encountering unusual and ambiguous situations. We suggest the extra time could reflect an additional decision-making step or action flip-flopping process, or simply physical constraints on the movement (for example, not being able to use some parts of the body).
  
  (2.6) Line 496 states: "However, the effect size was smaller than that for the entire multidendritic population, suggesting neurons other than the daIVs are important for self-righting". Although I agree that this is the more parsimonious hypothesis, an alternative interpretation of the observed phenomenon could be that the effect is not due to the involvement of other neuronal populations, but rather to stronger Gal4 expression in daIVs with the general driver compared to the specific one. Have the authors (or someone else) measured or compared the relative strengths of these two drivers?
  
  We agree with this suggestion and to address this concern, we have added as part of our new figure Supp. Fig. S3, a dedicated panel S3C showing fluorescence measurements from ddaC using the 109(2)80-Gal4 and ppk-Gal4 lines. We found no difference in tdTomato fluorescence intensity, suggesting equal expression strength across the two Gal4 drivers. Our new results for axial daIV inhibition are also consistent with this effect size difference, further suggesting that inhibition of all md neurons poses stronger challenges for self-righting compared to the daIV neurons alone.
  
  (2.7) Is there a way to quantify or semi-quantify the expression of the Hox genes shown in Figure 6A? Also, was this experiment performed more than once (are there any technical replicates?), or was the amount of RNA material insufficient to allow replication?
  
  Unfortunately, we only had limited amounts of mRNA extracted from FACS-sorted 109(2)80>GFP cells to feed our reverse transcriptase reactions and used much of these samples for the experiment reported. After Rev2 suggestion we went back to our freezers, recovered traces of the samples used in the original experiment, and attempted a new amplification; despite this effort, this new experiment was unsuccessful. We feel that the main point deduced from the original experiment is valid in that we obtained amplicons of the expected size for all the Hox transcripts analysed and that for those cases in which we observed biological effects – i.e. Antp and Abd-B – we corroborated protein expression in the 109(2)80 domain using immunohistochemistry. We are currently expanding this project examining the roles of all Hox genes across the entire sensory system and shall report the expression patterns of all Hox genes in each of the subcomponents of the sensory system the future.
  
  (2.8) Since RNAi constructs can sometimes produce off-target effects, it is generally advisable to use more than one RNAi line per gene, targeting different regions. Given that Hox genes have been extensively studied, the RNAis used in Figure 6B are likely already characterized. If this were the case, it would strengthen the data to mention it explicitly and provide references documenting the specificity and knockdown efficiency of the Hox gene RNAis employed. For example, does Antp RNAi expression in the 109(2)80 domain decrease Antp protein levels in multidendritic anterior neurons in immunofluorescence assays?
  
  We used the TRiP RNAi lines, specifically the Valium10 selection available from the Bloomington Stock Centre. Unfortunately, there is not much information on how specific the Hox RNAi lines areor whether their might have off-target effects.
  
  (2.9) In addition to increasing self-righting time, does Antp downregulation also affect head casting behavior or head movement speed? A more detailed behavioral characterization of this genetic manipulation could help clarify how closely it relates to the behavioral phenotypes described in the previous experiments.
  
  This would be interesting line of investigation. As described in a previous comment, this is currently unfeasible for us given some important differences between experiments including larval stage and recording conditions. We have added some speculative comments to the manuscript describing the larval behaviour under Hox RNAi.
  
  (2.10) Does down-regulation of Antp in the daIV domain also increase self-righting time?
  
  Given the new results with axial effects of daIV neurons, we also sought to address this point with a new series of experiments expressing Hox RNAi constructs in the ppk-Gal4 domain. The new data is shown in a new figure (Figure S8) displaying self-righting times for ppk-Gal4-Hox-RNAi. Interestingly, we found no effect of any RNAi expression on self-righting times, suggesting that md types other than daIVs are under Hox regulation that is important for self-righting.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  The reviewers were enthusiastic about the value and quality of this study by Roseby and colleagues. There were two main issues that emerged from the reviews that we're highlighting for the authors to address, should they choose to:
  
  (1) A little more cell-type resolution of the anterior region
  
  The anterior region includes a lot of sensory neurons that may be contributing to the effect. Some sensory neurons (e.g., daIV) have been implicated in righting - are these the ones carrying the anterior signal? Are dorsal sensory neurons promoting righting and ventral ones stalling it?
  
  We are not suggesting a complete sensory-neuron mapping in the anterior region. Instead, we propose the authors conduct a focused check: repeat the axial inhibition with a daIV-specific driver (same photomask assay) to show the A-P effect within the implicated class, and, if possible, replicate one key result with an alternative broad md driver to address Gal4 strength/off-target expression.
  
  As mentioned above (see Rev1 comment) we have indeed carried out a new experiment using an alternative driver to 109(2)80-Gal4 and testing the impact of these manipulations on larval behaviour. The revised version of our MS includes a new figure Supp Fig S3 which shows self-righting times when using the ppk-Gal4 driver with the opto-axial technique. As with the 109(2)80-Gal4 driver, self-righting was delayed in anterior but not posterior inhibition conditions, suggesting the daIV neurons specifically act in a region-specific manner to trigger postural control behaviour.
  
  Furthermore, in another new figure, Supp Fig S7, we show that head casting behaviour is also increased in the same manner as with the 109(2)80-Gal4 driver. These new panels and figures are cited within the sub-sections entitled “Optogenetic inhibition of anterior but not posterior multidendritic neurons delays self-righting” and “Inhibition of anterior multidendritic neurons is associated with increased head casting during self-righting”, on pages 25 and 28, respectively. We are grateful to R1 for this suggestion, which we consider qualitatively improves the quality of our paper.
  
  (2) The Hox section to strengthen this section, we recommend:
  
  (a) Confirm specificity/efficacy of knockdown (e.g., Antp protein reduction in targeted md neurons and a second RNAi line if available).
  
  This is a reasonable comment. For our experiments, we selected a UAS-Antp<sup>RNAi</sup> line (Bloomington #27675) given that this construct has been: (i) utilised in several previous studies as the main and single line to interfere with Anpt expression (e.g. Baek et al. 2013 Development; Paul et al. 2021 Nature Comms) and (ii) shown to display a consistent reduction in Antp protein levels of approximately 50% (see Poliacikova et al. 2024 Science Adv.). Furthermore, previous work comparing #27675 with other UAS-Antp<sup>RNAi</sup> lines has demonstrated that all available lines lead to a similar level of reduction in protein expression, although the #27675 line exhibits the most consistent effects (lower variability) (Poliacikova et al. 2024 Science Adv.). Unfortunately, at this point in time, we do not have the capacity to conduct new experiments with other RNAi lines, but consider that the information and arguments mentioned above should be reassuring about our choice of a reasonable and previously validated method to interfere with Antp expression.
  
  (b) Perform one temporal control (GAL80^ts) or a simple rescue, to separate developmental vs acute roles.
  
  This is a good and interesting suggestion, but we consider that the discrimination between developmental and physiological effects falls outside the scope of this study. Indeed, experiments of this kind are currently being conducted in our lab as part of a wider examination of Hox gene roles in the sensory system.
  
  (c) Place the results clearly in the context of prior work (e.g., Parrish 2007), so the mechanism isn't left hanging.
  
  This is an important point, and we have now done this. Many thanks for pointing this out.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1.1) A Gal4 line for the pannier dorsal specification gene shows expression in dorsal sensory neurons, as described in Galindo et al., Development, 2023, and could help tease apart dorsal v. ventral contributions.
  
  This is an interesting suggestion. However, we understand that the pannier (pnr) Gal4 line mentioned in Galindo et al. 2023 is an enhancer trap inserted in the pnr locus which drives expression in neural as well as non-neural tissues such as the embryonic dorsal ectoderm (see: Calleja et al. 1996 Development; Stronach et al. 2014 Genetics). Although, as Rev1 rightly indicates, this line also labels dorsal cluster sensory neurons, including ddaC (cIV) and ddaF (cIII) neurons the fact that the line displays expression in non-neural tissues makes its use in behavioural experiments difficult as non-neural effects might affect the behavioural patterns studied. A possible way to instrument the pnrGal4 tool into behavioural analyses might involve the creation of the necessary variants to implement a split-Gal4 approach, but this, we believe, unfortunately falls out of the scope of this study.
  
  (1.2) Potential roles for daII neurons and daI neurons are not examined. Drivers have been described for daII neurons, and there are drivers that will target a majority of proprioceptive md neurons, so these could be examined to complete the analysis started here.
  
  This is another interesting suggestion by Rev1, but we consider that the fine-grain mapping of effects mediated by sensory neuron sub-clases falls outside the scope of this study aimed at mapping sensory regional effects on self-righting. This does not take the merit of the suggestion away, and indeed, experiments of this kind are currently being conducted in our lab as part of a comprehensive examination of Hox gene roles in the sensory system.
  
  (1.3) To account for 109(2)80 off targets, the authors could consider other lines that silence most or all md neurons (clh201-Gal4; 5-40-Gal4; 21-7-Gal4) that could at least have different central offtargets. Some other lines are broad somatosensory system drivers but sensory-specific (pebbledGal4).
  
  This is an interesting comment, and so are the suggestions made. Although to include this kind of verification would be interesting, when carrying out our experiments, we did not observe any central expression at all. Also, to repeat all our experiments in which we use the established and validated 109(2) 80 line using instead these four Gal4 lines, is unfortunately out of scope for us at this point in time. We will nonetheless consider these comments by Rev1 in future extensions of our work.
  
  (1.4) There is a typo on line 481; it should be "other".
  
  We are grateful to R1 for pointing this out. This has now been amended
  
  Reviewer #2 (Recommendations for the authors):
  
  (2.1) Lines 91-92 cite references describing self-righting behavior across different animal groups, which is illustrated in Figure 1B. It would be helpful to indicate these references directly in the figure. For example, instead of using dots to denote their presence (which are, in a way, redundant since the behavior is reported in all groups), numbers or letters could be used to refer to the specific papers describing them.
  
  Thank you for this suggestion. We have now replaced the original dots by an abridged citation of a key paper providing evidence in that specific animal group, e.g. Smith, et al. 1997; Rogers et al. 2015
  
  (2.2) In Figure 1A, the diagrams illustrate the two large dorsal tracheae, which nicely indicate the larva's orientation. However, since they are drawn in a very light gray, they can be difficult to distinguish without zooming in. It might improve clarity if the tracheae were made slightly more prominent.
  
  Thank you for this suggestion. We have now implemented this change.
  
  (2.3) In Figure 1E, the dotted line and green bar mark the segment of the recording corresponding to self-righting, which is then quantified in Figure 1G. Was the same procedure applied when analyzing tail speed, or was it limited to head speed? Figure 1F does not show a dotted line or green bar, which is confusing; it would be helpful to clarify the reason for this discrepancy. Also, in Figure 1G, there is an inset showing photos of the movement sequence with the green bar and the caption 'Trimmed to SR sequence,' which implies to me that for tail speed, the 0.75-1 segment of the recording was also used for quantification. I suggest adding the dotted line and green bar to Figure 1F and removing this inset from Figure 1G, as it appears quite small and disrupts the layout of the figure. If it is retained, the figure legend should explicitly refer to the inset.
  
  Thank you for pointing this out. We have amended these figures as suggested.
  
  (2.4) In Figures 1 and 2, the box plots include the individual data points, whereas Figures 3 and S2 do not. For data transparency, it would be important to show the individual measurements here as well. I strongly recommend adding them to the figure, or alternatively providing a clear rationale in the text for not doing so.
  
  Thank you for mentioning this. The reason data points are not shown in Fig 3 or S2 is because the variance extends the scale and compresses the box making it illegible. To make this clear we now explain this in the figure legends.
  
  (2.5) In Figures 4 and 5, the distribution of self-righting times from the optogenetic inhibition experiments is shown using bar graphs rather than box plots, as in the previous figures. This choice obscures the data distribution, since all bars reach down to zero. Replacing the bar graphs in Figures 4 and 5 with box plots would more clearly convey the experimental results.
  
  We thak Rev2 for this comment, which gives us an opportunity to clarify the matter. Distributions of SR times are drawn with bars because we compare means +/- variance in the analysis, and not medians +/- IQR as is done in the other experiments. The choice of visualisation reflects the analysis, which is what is recommended by statisticians. Plus, we also show the individual observations, meaning the distribution can be observed. We hope that it is now clear that we are not obscuring any distributions.
  
  (2.6) Figure 6 would benefit from some reorganization. Panel A is very small and dense with information, making it difficult to interpret without significant zooming. In particular, the FACS graph is nearly impossible to read, as the axes remain unclear even when enlarged. It might be best to either remove this graph and replace it with a cartoon version of FACS-sorted populations, and reorganize the figure to ensure legibility. Additionally, the current layout progresses from the bottom up, which takes time to follow. Comprehension could be improved if the sequence began with the larva dissection placed in the top left area of the figure, where readers typically look first (I appreciate that this is mentioned in the figure legend; however, a different layout might present the information more effectively).
  
  We appreciate the constructive spirit of this comment and have indeed considered Rev2 suggestions including drafting new layouts of this figure. After all this experimentation, we remain of the view that the original presentation is probably the best trade-off between size and clarity, offering more space for the appreciation of confocal imaging and its interpretation.
  
  Minor corrections:
  
  (1) Throughout the text, the word Drosophila appears sometimes in italics and sometimes in regular font; please standardize its formatting for consistency.
  
  Amended
  
  (2) Line 179: the use of three hyphens in the sentence "minimum --- in all cases < 30 s --- to avoid larval desiccation" is unusual; exchanging them for commas or brackets is advised.
  
  Amended
  
  (3) Line 183: in w1118, the numbers are usually in superscript (not subscript), and the w should be italicized.
  
  Amended
  
  (4) In line 783, there is an incorrect space between "is" and the comma in "...repertoire, which is , in...".
  
  Amended
  
  (5) In Figure 2G, the left panel appears partially cut off, which makes the text at the edges difficult to read. It might help to adjust the panel so that all labels are fully visible.
  
  Done
  
  (6) In the current version of the manuscript, Figure 5 is presented before Figure 4, which is confusing.
  
  This has been amended.
  
  (7) Two videos are included in the supplementary material, but I could not find any reference to them in the main text of the manuscript.
  
  This has been amended.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.22.645758v3
www.biorxiv.org www.biorxiv.org

Proteolytic remodeling by Yme1 enables mitochondrial-derived compartment formation

1
1. Public_Reviews 19 Jun 2026
  
  in eLife
  
  Author response:
  
  We thank the editors and reviewers for their thoughtful and constructive evaluation of our manuscript. We are pleased that the reviewers found the study valuable and the evidence supporting a role for Yme1 in MDC formation solid. As described below, we plan to modify the manuscript to clarify the lipid model, better explain the relationship between Ups-family proteins and MICOS, distinguish MDC formation from Atg32-dependent mitophagy, clarify metabolic conditions, add statistical analyses where missing, and strengthen Yme1 validation with immunoblotting.
  
  eLife Assessment
  
  This valuable study demonstrates that the inner membrane protease YME1 contributes to the formation of mitochondrial-derived compartments in yeast through the modulation of both the lipid transporter UPS2 and the MICOS complex. The evidence supporting this model is solid, although this manuscript could be improved by providing additional evidence supporting the independent roles for UPS2 and MICOS regulation in this process. This work will be of interest to cell biologists, biochemists, and geneticists interested in understanding the molecular basis of mitochondrial regulation and function.
  
  We appreciate this positive assessment and agree that the roles of Ups-family lipid transport and MICOS in MDC regulation could be expanded further. This will be an important topic for future studies, especially with regard to how MICOS contributes to MDC formation. In the current revision, we will add new genetic data focused on PA-linked lipid metabolism through the yeast Pah1/Lipin pathway, which we think will help strengthen and clarify the lipid arm of the model. Our current interpretation is that Yme1-regulated Ups-family lipid transport and MICOS may both influence a shared mitochondrial membrane state that permits MDC formation. This interpretation is consistent with our genetic data and with known connections between Ups proteins, MICOS, and mitochondrial membrane organization.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, Balasubramaniam and colleagues continue this group's efforts to understand mitochondrial-derived compartments (MDCs) that bud off from yeast mitochondria in response to metabolic stress. In a previous genetic screen, they identified Ups lipid transfer proteins and the AAA-protease Yme1 as components that modulate MDC formation. In this study, the authors link these observations by showing that Yme1 modulates levels of Ups1, Ups2, as well as MICOS complex members in the mitochondrial proteome. Using genetic approaches, they then show that Yme1's role on MDCs is dependent on its catalytic activity (via an inactive mutant) and that YME1 shows genetic interactions with UPS1/2 and MIC10/MIC60. The overall model is that Yme1 activity responds to metabolic cues and acts via proteolysis of these two distinct mitochondrial machineries to regulate MDC biogenesis.
  
  Strengths:
  
  The strengths of the study are its integration of mitochondrial proteomics with strong genetic approaches, as well as synergy with the authors' previous studies on the role of lipids in MD genesis. The work is overall well carried-out and experiments are thoughtfully discussed.
  
  Weaknesses:
  
  The major weaknesses are a lack of mechanistic resolution surrounding the model, e.g., proposed or tested mechanisms by which Yme1 activity is regulated by metabolic cues, or how Ups1/2 activity and the MICOS contribute to MDC generation. The authors acknowledge these as open questions, but addressing them would still enhance the significance of the study.
  
  We thank the reviewer for the positive assessment, and we agree that the upstream regulation of this response remains an important open question. Yme1-dependent MDC regulation could involve changes in Yme1 activity, substrate accessibility, or broader changes in mitochondrial lipid and protein organization. Fully resolving how metabolic state gates this response will require future work, likely outside the scope of the current study.
  
  We also agree that the manuscript would benefit from a more developed discussion of how lipid changes could contribute to MDC formation. Our prior work showed that reduced mitochondrial PE promotes MDC formation, whereas cardiolipin is required for MDC biogenesis (Xiao et al., 2024). We proposed that reduced PE changes the membrane environment of mitochondrial outer membrane proteins, potentially affecting their stability, abundance, insertion, or lateral organization within the membrane. Such changes could increase the pool of proteins available for sorting into MDCs or make the outer membrane more permissive for domain formation. In the revision, we will connect this model more directly to Yme1-dependent regulation of Ups-family lipid transport.
  
  We will also expand the model to incorporate PA-linked metabolism. We did not initially focus heavily on Ups1 because complete loss of UPS1, or loss of downstream cardiolipin synthesis through CRD1, blocks MDC formation because cardiolipin is required. Thus, complete disruption of Ups1-dependent lipid transport may obscure the effects of more moderate changes in PA flux. To address this, we will include additional lipid measurements and new genetic data targeting PA metabolism through the yeast Pah1/Lipin pathway. Because Pah1 converts PA to DAG, this provides a way to alter PA-linked metabolism without simply eliminating cardiolipin synthesis. Our new data suggest that PA accumulation or altered PA-linked lipid flux may also promote MDC formation. Together, these findings support a broader model in which reduced PE and increased PA alter both the organization of OMM proteins and the physical properties of the membrane, including curvature and domain formation, thereby creating a membrane state that is more permissive for MDC biogenesis.
  
  Reviewer #2 (Public review):
  
  In this manuscript, the authors report a novel regulation of the outer mitochondrial membrane remodeling domains called mitochondria-derived compartments, MDCs. The team has previously established the main principles behind this recently identified quality control pathway, but the mechanisms that control MDCs formation remain incompletely understood. Using the baker's yeast model, the authors identify the conserved mitochondrial protease Yme1 as a crucial factor that regulates MDC formation. Mechanistically, Yme1's proteolytic function controls the levels of Ups1 and Ups2 lipid transfer proteins and the components of the membrane organizing complex called MICOS, thus providing a plausible model as to how Yme1-dependent proteolysis permits MDC formation through the removal of lipid and MICOS-dependent constraints. Finally, the authors show that this Yme1-mediated activity is also defined by metabolic conditions. In principle, this study is interesting and novel, and holds potential to provide new insights into the regulation of the MDC pathway that emerged as a new fundamental mitochondrial quality control mechanism. However, the following points should be carefully addressed.
  
  Major points:
  
  (1) Yme1 has been previously shown to regulate mitochondria-specific autophagy through Atg32 processing. Given the high similarity of the MDC pathway to piecemeal autophagy and the fact that both pathways share some of the core components, the authors should address the involvement of Atg32 in their model. It would also be important to include a brief discussion addressing the differences between piecemeal autophagy and the MDC pathway.
  
  We agree that this is an important point. The reason we did not focus on Atg32 in the current manuscript is that we previously investigated the relationship between MDC formation and Atg32-dependent mitophagy and found that Atg32 is dispensable for MDC formation (Hughes et al., 2016). Based on that result, we do not anticipate that Atg32 is required for the Yme1-dependent MDC phenotypes described here. This is also consistent with the different growth conditions associated with these pathways: Atg32-dependent mitophagy is stimulated under respiratory or post-diauxic conditions, whereas MDCs do not form under the respiratory conditions that stimulate Atg32-dependent mitophagy (Hughes et al., 2016; Raghuram and Hughes, 2024).
  
  We will clarify this distinction in the revised manuscript. In addition, to be thorough, we plan to generate and test the Atg32-GFP variant previously shown to block Yme1-dependent Atg32 processing and mitophagy (Wang et al., 2013). This will allow us to test directly whether preventing Yme1-dependent Atg32 cleavage affects MDC formation. If successful and interpretable, we will include these data in the revised manuscript.
  
  (2) The Rpt3 (P215L) expression experiment is interesting, but appears to be somewhat superficial due to the unclear mechanism by which the mitochondrial network morphology is restored in these cells. Could this result be replicated in the dnm1∆ mgm1∆ double deletion mutant, which is a well-established model for mitochondrial network restoration?
  
  We agree that the Rpt3(P215L) experiment is best viewed as a morphology control. The purpose was to test whether abnormal mitochondrial morphology alone explains the MDC defect in yme1Δ cells. Because Rpt3(P215L) improved mitochondrial morphology but did not restore MDC formation, we interpret this as evidence that morphology alone is not sufficient.
  
  We attempted to generate the requested dnm1Δ mgm1Δ yme1Δ triple-mutant combination, but that strain combination has not been viable in our hands. However, we do have dnm1Δ data showing that altering mitochondrial structure can rescue some morphological features but does not restore MDC formation in yme1Δ cells. We will include these data where appropriate and clarify that this experiment is intended as a morphology control.
  
  (3) Figure 3E. The changes in PE levels appear to be minor. While statistically significant, the observed differences may not be physiologically relevant. More in-depth lipidomic analysis data should be presented to substantiate the authors' argument and better address the questions at hand. Related to that, could PE or PA supplementation stimulate MDC formation?
  
  We agree that additional lipid data would strengthen this part of the manuscript. We initially streamlined the lipid section because we had previously examined the lipid requirements for MDC formation in detail, showing that reduced mitochondrial PE can promote MDC formation, whereas cardiolipin is required (Xiao et al., 2024). However, the current study would benefit from a broader analysis of the lipid changes associated with Yme1-dependent regulation.
  
  In the revision, we will expand the lipid data to include additional lipid species and incorporate these results into the model. We will also add new genetic data targeting PA metabolism through the yeast Pah1/Lipin pathway. Together, these data suggest that PA accumulation or altered PA-linked lipid flux may also contribute to MDC formation. This supports a broader lipid-balance or lipid-shunting model in which reduced PE, increased PA, or altered lipid distribution between mitochondrial membranes could influence OMM remodeling through effects on membrane curvature, OMM protein organization, or mitochondrial membrane contacts.
  
  We agree that direct PE or PA supplementation would be a valuable experiment. We have attempted lipid supplementation but have not been able to deliver these lipids effectively to yeast cells in a way that produces interpretable results. We are therefore focusing on lipid profiling and genetic approaches that alter lipid metabolism inside the cell.
  
  (4) The connection between rapamycin treatment and Yme1-regulated MDC formation is unclear and puzzling and needs to be explained better.
  
  We agree that this connection is not fully clear. In this manuscript, rapamycin is used primarily as a robust MDC-inducing condition. Our data do not define the full pathway connecting TORC1 inhibition to Yme1-dependent mitochondrial remodeling.
  
  In the revision, we will either clarify this point or reduce the emphasis on rapamycin as a mechanistic entry point. Our current interpretation is that rapamycin creates a metabolic/mitochondrial state in which Yme1-dependent remodeling of lipid and membrane-organization pathways becomes important for MDC formation. Whether this involves direct regulation of Yme1, altered substrate availability, altered membrane composition, or a combination of these remains open.
  
  (5) The MICOS complex is clearly involved in the regulation of MDC, but the manuscript misses the mark on providing compelling evidence and a clear explanation as to how MICOS contributes to said regulation.
  
  We agree that the mechanism by which MICOS regulates MDC formation remains an important open question and will be a major focus of future work. Our current data show that MICOS perturbation can partially restore MDC formation in yme1Δ cells, supporting a role for MICOS in this pathway. This analysis was motivated in part by the incomplete genetic suppression achieved through the lipid pathway alone, which suggested that additional Yme1-regulated factors contribute to MDC formation.
  
  MICOS therefore represents a strong candidate for this additional regulatory input. However, defining whether MICOS acts through lipid distribution, OMM-IMM organization, membrane architecture, or another mechanism will require a deeper investigation than is possible within the scope of the current study. We will clarify this point in the revised manuscript and present the current findings as the beginning of a broader investigation into how MICOS contributes to MDC biogenesis.
  
  Minor points:
  
  (1) The authors should discuss potential reasons for the dramatically different rates of MDC formation in the S288C and W303 background cells. Does this have anything to do with generally more robust mitochondrial functions in the latter cells?
  
  We agree this is worth discussing. One likely explanation is that the difference reflects broader differences in mitochondrial activity and metabolic state between these strain backgrounds. We and others have shown that W303 cells have more robust respiratory mitochondrial function than BY/S288C-derived cells, and in our hands W303 also shows lower MDC formation. This fits our broader model that MDCs are favored in glucose-grown or metabolically perturbed cells and do not form under respiratory conditions (Raghuram and Hughes, 2024). We do not yet know the genetic basis for this difference, so we will present this as an interesting future direction.
  
  (2) Proper statistical analyses should be provided for all the graphs presented.
  
  We will add statistical analyses where missing.
  
  (3) The authors should include Yme1 immunoblots to confirm the identity of strains being studied and validate the presence or overexpression of Yme1 and its catalytic mutant in their experiments.
  
  We agree that direct validation of Yme1 protein levels will strengthen the manuscript. Our quantitative mitochondrial proteomics already confirms strong depletion of Yme1 in yme1Δ cells, and we will also include quantitative proteomics showing increased Yme1 abundance in the overexpression strain. In addition, we have now obtained a Yme1 antibody from a colleague and will include immunoblots validating Yme1 loss, re-expression, catalytic mutant expression, and overexpression where appropriate.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Since describing MDCs over a decade ago, the lab of the corresponding author, Hughes, has been at the forefront of further characterizing these structures. Here, they follow up on recent work (PMID: 38497895), where a screen identified Yme1 as a potential regulator of MDCs. After confirming that Yme1-ko prevents MDCs that are usually induced via various established treatments (Rapamycin, cycloheximide, Concanavalin A), the authors confirmed that the proteolytic activity of Yme1 is required. Next, using proteomics, they identified how loss of Yme1 impacts the mitochondrial proteome with and without Rapamycin treatment to induce MDCs. From this result and based on insight from other published data implicating lipids, the focused initially on the lipid transfer protein Usp2, a known target of Yme1. Here, they showed that loss of Usp2 could partially rescue MDC formation in Yme1-ko cells. To look for other Yme1 targets that might also be involved in MDC formation, next, they investigated the MICOS complex, which was also notable in their proteomics data. They then showed that inhibiting MICOS also partially restored MDC formation in Yme1-ko cells. They then tested the combined effects of Usp2 and MDC inhibition on MDCs, which was limited by the fact that the combination of full MICOS disruption, Usp2-KO, and Yme1-KO was not viable. To circumvent this limitation, they investigated the knockout of individual MICOS subunits in combination with Usp2 and/or Yme1. Finally, they showed that growth conditions also mediate MDC formation in the context of Yme1 overexpression. In rich media, Yme1 overexpression induces MDCs on its own. However, this induction is lost upon amino acid starvation, suggesting that there are still other as-yet-unidentified factors regulating the formation of MDCs.
  
  Strengths:
  
  The authors use unbiased approaches and genetic models to begin unraveling a novel regulatory role of Yme1 in the formation of MDCs.
  
  Weaknesses:
  
  (1) The authors find both Ups1 and Ups2 in their screens, but only focus on Ups2 in this paper. It would be good to know why they did not also investigate Ups1, and its other protease Atp23, which could potentially act similarly to Yme1, or even rescue the loss of Yme1.
  
  We agree that Ups1 and Atp23 are important to consider. We initially focused on Ups2 because its deletion partially restores MDC formation in yme1Δ cells and because of its connection to mitochondrial PE synthesis, which we had previously shown to regulate MDC formation (Xiao et al., 2024). Ups1 is more difficult to assess genetically because complete loss of UPS1, or of downstream cardiolipin synthesis through CRD1, blocks MDC formation due to the requirement for cardiolipin. Thus, an ups1Δ phenotype cannot readily reveal whether a more moderate reduction in Ups1 activity, and the resulting accumulation or redistribution of PA, might promote MDC formation.
  
  In the revision, we will explain this rationale and include new genetic data targeting PA metabolism through the yeast Pah1/Lipin pathway. This provides a way to test the contribution of PA accumulation without simultaneously eliminating cardiolipin synthesis, and our initial results support a role for PA-linked lipid remodeling in partially bypassing the requirement for Yme1. We will also discuss Atp23 as a potentially important regulator of Ups1 and PA metabolism. A full investigation of Atp23 will be an important direction for future work.
  
  (2) I'm not convinced that the data support the notion that Usp2 and MICOS have distinct effects on MDCs. In Figure S3C-D, there is no statistical analysis to indicate whether the small differences between the MICOS-ko and the double knockout are significant. If MICOS-ko and Ups2-ko were acting through different mechanisms, one would expect their combination to be additive; this does not appear to be the case, as both single deletions and the double deletion all cause similar levels of MDCs (~30-40%). Rather, this result is what you would expect if they were working through the same mechanism. There also does not appear to be an additive effect in Figure 4F-G, when using the mic60-ko rather than the complete MICOS-ko. In this regard, the authors note in their discussion that 'loss of MICOS may disrupt membrane associations or alter lipid distribution between mitochondrial subcompartments' (lines 390-392). The latter situation seems like it would be the same mechanism as Usp2 and would more accurately explain their findings.
  
  This is a very good point, and we agree with the reviewer’s interpretation. The lack of strong additivity is consistent with Ups2 and MICOS acting within the same pathway or converging on a shared mechanism, rather than representing two separate mechanisms of MDC regulation. We did not intend to imply that these must be independent pathways. In the revised manuscript, we will ensure that the text reflects this interpretation and will add statistical analyses to the relevant comparisons.
  
  (3) The manuscript is missing key data confirming the re-expression or overexpression of Yme1 protein (Figure 1 E/G and Figure 5A). It is important to know the relative levels of expression of the re-expressed proteins to each other and to endogenous Yme1.
  
  We agree that direct validation of Yme1 protein levels is important. Our quantitative mitochondrial proteomics already confirms strong depletion of Yme1 in yme1Δ cells, and we will also include quantitative proteomics showing increased Yme1 abundance in the overexpression strain. In addition, we have now obtained a Yme1 antibody from a colleague and will add immunoblots validating Yme1 loss, re-expression, catalytic mutant expression, and overexpression.
  
  (4) Some clarification of the details for metabolically restrictive conditions would be helpful.
  
  Thanks for this suggestion. We will clarify these conditions throughout the manuscript and figure legends and will define exactly what we mean by low-amino-acid, amino-acid-free, synthetic, and rich media conditions. More broadly, MDC formation is strongly influenced by media composition and mitochondrial metabolic state. MDCs form less efficiently in synthetic media and do not form under conditions that promote respiratory mitochondrial function (Raghuram and Hughes, 2024).
  
  (5) Beyond just the presence/absence of MDCs, does more detailed quantification of their size/shape reveal any subtle differences between conditions?
  
  This is an interesting question. In our hands, MDC size and shape are variable and appear strongly influenced by mitochondrial fission/fusion state. Conditions that favor more fused mitochondrial networks can produce larger MDC-like structures, whereas fragmented networks can produce smaller structures. So far, we have not found a simple size or shape metric that explains the Yme1/Ups2/MICOS phenotypes better than MDC frequency.
  
  We will clarify this point in the revised manuscript and avoid implying that MDC frequency captures every possible morphological difference. More detailed morphometric analysis of MDC size, topology, and maturation state will be an important future direction, especially as we connect lipid remodeling to membrane curvature and MDC biogenesis.
  
  References
  
  Hughes, A.L., Hughes, C.E., Henderson, K.A., Yazvenko, N., and Gottschling, D.E. 2016. Selective sorting and destruction of mitochondrial membrane proteins in aged yeast. eLife. 5. doi: 10.7554/eLife.13943.
  
  Raghuram, N., and Hughes, A.L. 2024. Amino acids trigger MDC-dependent mitochondrial remodeling by altering mitochondrial function. bioRxiv. 2024.07.09.602707. doi: 10.1101/2024.07.09.602707.
  
  Wang, K., Jin, M., Liu, X., and Klionsky, D.J. 2013. Proteolytic processing of Atg32 by the mitochondrial i-AAA protease Yme1 regulates mitophagy. Autophagy. 9(11):1828–1836. doi: 10.4161/auto.26281.
  
  Xiao, T., English, A.M., Wilson, Z.N., Maschek, J.A., Cox, J.E., and Hughes, A.L. 2024. The phospholipids cardiolipin and phosphatidylethanolamine differentially regulate MDC biogenesis. Journal of Cell Biology. 223(5). doi: 10.1083/jcb.202302069.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.04.03.716366v1
www.biorxiv.org www.biorxiv.org

Essential function reflected in the phylodynamics of a multigene family: the pir genes of malaria parasites

1
1. Public_Reviews 19 Jun 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The manuscript entitled "Essential function reflected in the phylodynamics of a multigene family - the pir genes of malaria parasites" by Jackson and colleagues investigates the global phylogeny of pir genes across 14 Plasmodium species and one Hepatocystis species. The authors also focus on the functional characterization of the conserved ortholog pirC1 and claim that pirC1 is not the founder of the family and that it plays an essential role in blood-stage growth.
  
  Strengths:
  
  Overall, the manuscript is well written and interesting, as it combines comparative genomics and evolutionary analysis with functional experiments. The phylogenetic analysis is rigorous and represents a major strength of the manuscript.
  
  Weaknesses:
  
  The general conclusions regarding the potential function of this gene family are not fully supported by the data presented. The manuscript moves too quickly from growth phenotype and localization studies to a specific mechanistic model. The discussion argues that PIRC1 may be involved in nutrient acquisition, host sensing, or metabolic support, but the data provided do not directly support these functions, and the manuscript in its present form remains speculative. Although the manuscript includes some experimental results, it lacks direct mechanistic validation of the specific functions of the pir genes, including pirC1. In its current form, the study does not yet establish a definitive role for pirC1 in metabolic processes.
  
  The reviewer is correct that there is no definitive proof for the function of the PIRC1 protein. We speculate that this protein is involved in a metabolic process based on mutant phenotype – small, poorly developed parasites that do not produce the same amount of DNA as wildtype parasites (and hence likely fewer merozoites). That this occurs in an in vitro culture of Plasmodium knowlesi rules out a role in the interaction with the host organism, such as sequestration or facilitating passage through the spleen. The localization of the protein outside of the parasite is consistent with a role in nutrient uptake, but we agree that additional experiments are required to determine the role of the protein definitively. We aim to look at the differences in the transcriptome and the metabolome to gain more insight into the pirC1 phenotype; this should reveal metabolic deficiencies in the mutant parasite.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This is an extensive study using phylogenetic comparison across multiple plasmodium species to gain new insights in relation to their evolutionary pathways and the potential function of pir. In addition to establishing a framework to identify related orthologues across species as well as expanding paralogues families within a species, the work also focuses on understanding loss and gain of different PIRs and how this indicates a relative lack of functional constraints and essentiality for most members of the gene family.
  
  The authors provide evidence that at least pirC has a conserved function and plays an important role in parasite growth in multiple species.
  
  While this study represents a significant effort and does provide interesting new insights that would help our understanding of this complex gene family in the future, it has a number of limitations.
  
  Strengths:
  
  Extensive and thorough phylogenetic analysis that is supported by some biological validation. Provides an indication that the PIR gene family has limited biological constraints and evolved independently across different species, leading to rapid expansion and deletion of orthologous groups. Identified pirC as a functional and important member of the family that is conserved across the species.
  
  Weaknesses:
  
  The phylogenetic tree is based on a truncated sequence that focuses on the more conserved parts of the pir sequence. This could potentially lead to missing the key functional drivers of evolution. The biological validation of the role of pirC has some inconsistencies that need to be addressed.
  
  The reviewer is correct. We do not use the repetitive parts of the pir gene sequences for the phylogeny. We define these as the ‘distal variable’ and ‘proximal’ domains of the protein in Fig. S1, results text and supplementary results. We remove these parts from the alignment because they are only nominally homologous (they cannot be aligned) and so break the basic assumption of phylogenetic analysis. Amino acid repeats evolve quickly and are homoplasic (their similarities do not reflect ancestry) so omitting them is correct and makes the phylogeny more reliable. While these features do not contribute to the phylogenetic estimate, we propose in the results text and Fig. S3, in agreement with the reviewer, that they are an important demonstration of how pirs have differentiated and what is different between the subfamilies. The reviewer is also correct that we have considered the whole gene sequence when comparing Alphafold predictions and in selection analyses of closely related sequences (in these cases, the repeat sequences can be aligned).
  
  A structural prediction for the sequence used in the alignment would mostly reflect the distal conserved domain but would be misleading because the alignment combines conserved regions that are not physically attached in reality. We will clarify these points.
  
  Reviewer #3 (Public review):
  
  This paper aims to classify, from an evolutionary perspective, the multigene family PIR found in malaria parasites infecting rodents and Old World monkeys, and to link this classification to functional diversification. The authors also hypothesize that PIR members conserved across species play important roles in parasite survival, and seek to clarify their functions.
  
  To achieve these aims, the authors comprehensively analyze the evolution of PIR genes using genomic and transcriptomic information from many malaria parasite species. They focus on PIRC1, a member conserved across species, and attempt to clarify its function in rodent and simian malaria parasites by examining the phenotypes of parasites in which the corresponding genetic locus has been disrupted. They also attempt to determine its localization using PIRC1 tagged with an epitope sequence. However, although the locus-disrupted parasites appear to show an approximately 50% reduction in growth rate, this effect seems to be overestimated. Another weakness is that the cause of the reduced growth rate has not been clarified. The localization analysis also remains insufficiently conclusive.
  
  Therefore, I consider that the first half of the paper, consisting of the bioinformatics analyses, achieves the objective of comprehensively summarizing PIR and may become a reference paper for discussing the evolution and function of the PIR gene family. On the other hand, regarding the function of PIRC1, no clear conclusion can be drawn from the results presented, and several additional experiments are necessary.
  
  My major comments are as follows.
  
  (1) The claim that the failure of eight disruption attempts indicates that pirC1 is essential is too strong.
  
  Lines 319-321: The authors argue that a total of eight failed attempts to disrupt the pirC1 locus using two different construct designs suggest that pirC1 is essential in P. berghei. However, the failure of these attempts could also reflect technical issues with the construct design itself, such as the length of the homologous regions used for recombination, which are approximately 650 bp. Therefore, it is an overstatement to conclude that "pirC1 is essential for P. berghei blood-stage growth." Given that parasites with disruption of the corresponding locus could be obtained in both P. chabaudi and P. knowlesi, a more appropriate statement would be that "pirC1 is important for P. berghei blood-stage growth."
  
  It is correct that we cannot rule out that the inability to delete the pirC1 gene is Plasmodium berghei is unrelated to an essential function. We are happy to change the text to the suggested description.
  
  (2) The data on the mCherry-expressing P. berghei line shown in Supplementary Figure 11 are insufficient.
  
  (a) Panel C: Southern blot analysis
  
  To conclusively identify the lower band in panel C as chromosome 1, additional probes specific to genes located on chromosomes 1 and 2 would be required. In addition, a parental parasite control should also be included. The Southern blot image of the parental parasite should show only a single band at the higher position, with no band at the lower position. Probes specific to chromosomes 1 and 2 would help demonstrate that the lower band corresponds to chromosome 1, rather than chromosome 2.
  
  To this end, the authors could describe the result as follows:
  
  "In the parental parasite, only a single band corresponding to chromosome 7 was detected, indicating that the smaller chromosome was genetically modified. The size of the lower band detected with the dhfr probe was identical to that of the band detected with the control chromosome 1 probe, but distinct from that detected with the chromosome 2 probe, indicating that chromosome 1 was modified."
  
  That said, this chromosome-level Southern blot analysis is not sufficient to demonstrate that the target PBANKA_0100500 locus was specifically modified. The authors should provide more direct evidence showing that the PBANKA_0100500 locus, rather than another genomic locus, was modified. For example, Southern blot analysis after restriction enzyme digestion would provide more definitive evidence. Diagnostic PCR may also provide more specific evidence.
  
  Although we are confident that the parasites has been modified in the expected way, we are planning to generate PCR data confirming that the mCherry tag is correctly integrated into PBANKA_010050.
  
  (b) Panel D: Flow cytometry analysis
  
  To allow a more accurate interpretation of the percentage of mCherry-positive cells, flow cytometry data for the parental parasite line should also be presented.
  
  We will repeat the flow cytometry experiments and include a wildtype strain in the analysis.
  
  (3) There are unclear points in the PCR results shown in Supplementary Figure 12.
  
  Supplementary Figure 12: In panel B, a PCR product should also be amplified from dPCHAS_0101200 using the P1-P3 primer pair. Why is this band absent? The authors should provide the uncropped electrophoresis image so that the larger band can be seen. In addition, if labels 1 and 2 indicate independent clones, this should be stated in the figure legend.
  
  We will gladly supply the full, uncropped electrophoresis image and we will clarify what the numbers indicate in the legend.
  
  (4) The growth rates of P. chabaudi and P. knowlesi parasites with disruption of the PIRC1 gene locus should be quantitatively analyzed.
  
  The growth rates of P. chabaudi and P. knowlesi are described only qualitatively, but they should be evaluated quantitatively. In Figure 4A, the parasitemia of wild-type P. chabaudi increases from approximately 6.1% on day 6 to approximately 15.6% on day 8, corresponding to a 3.8-fold increase. However, because parasite growth may already be affected by immune-mediated suppression at this stage, this value should be regarded as a minimum estimate. In contrast, the mutant increases from approximately 3.2% on day 8 to approximately 6.8% on day 10, corresponding to a 2.1-fold increase. Based on these values, the daily growth rate of the mutant appears to be reduced to at least approximately 56% of that of the wild type. Similarly, from the growth curve of P. knowlesi in Fig. 5A, the DMSO-treated group appears to increase approximately two-fold per day, whereas the rapamycin-treated group increases only approximately one-fold per day. Thus, P. knowlesi also appears to show an approximately 50% reduction in growth rate. Taken together, both P. chabaudi and P. knowlesi appear to reproducibly show an approximately 50% reduction in growth capacity. A reduction of this magnitude is difficult to describe as a "severe growth defect"; a more appropriate wording would be simply that the parasites "showed a growth defect." In addition, the terms "a severe growth defect" and "essential" appear to be overstated throughout the manuscript, and the wording should be toned down. Finally, I recommend presenting Figure 4A and Figure 5A on a logarithmic scale so that the trend in growth rates can be more intuitively appreciated from the graphs.
  
  It should be possible to determine the growth rate of the wildtype and mutant P. knowlesi parasites. In addition, we can change the text to reflect that although there is a growth phenotype in the two species in which we obtained mutants, the parasites do have the capacity to replicate. Note that in the case of P. knowlesi, the parasites numbers in vitro do not increase, hence any additional factors that decrease the growth rate, such as immune system and spleen, will lower the reproductive rate further and render the mutant parasite unable to proliferate.
  
  (5) The evidence that disruption of the PIRC1 gene locus in P. knowlesi does not affect erythrocyte invasion is weak.
  
  The authors describe that "the developmental cycle of the parasites lacking PIRCl is slightly longer than that of parasites that produce PIRCl (line 383-384)," and appear to support this interpretation with data showing that "mutant parasites are significantly smaller than wild-type parasites (line 414)" and that "the DNA content in ML10-arrested parasites lacking PIRCl is lower than that of DMSO-treated parasites (line 417-418)" at 24 hours after invasion. However, a slightly longer developmental cycle alone does not seem sufficient to explain a 50% growth reduction.
  
  I think the erythrocyte invasion capacity has not been quantitatively evaluated, and therefore, the evidence supporting the conclusion that the phenotype of P. knowlesi parasites with disruption of the PIRC1 gene locus is unrelated to erythrocyte invasion is weak. The authors should assess invasion efficiency using purified merozoites. For P. chabaudi, it should also be possible to apply an in vitro or in vivo erythrocyte invasion assay similar to that used for other rodent malaria parasites, and this should be evaluated as well.
  
  We can further investigate the invasion phenotype of the mutant P. knowlesi parasites. The presence of a clear phenotype during the intraerythrocytic stage indicates that the protein also has a role after invasion, but we agree that determining the effect on invasion directly will be useful.
  
  Alternatively, the reduced DNA content in ML10-arrested parasites lacking PIRC1 (lines 416-417) could suggest that the number of merozoites formed per schizont may be reduced. To clarify this point, the authors should assess whether the number of merozoites per schizont is altered in P. knowlesi (and P. chabaudi parasites lacking PIRC1).
  
  We aim to count merozoites and the level of invasion, which will allow us to determine the reproductive rate of the mutant parasites.
  
  (7) The authors propose the possibility that PIRC1 expressed in merozoites is released after invasion; however, the evidence that PIRC1 localizes to intracellular organelles is weak.
  
  Line 333: "a peripheral pattern around the parasite" is indicative of parasite plasma membrane, PV, or PVM. ", indicative of a parasitophorous vacuole (PV) or parasitophorous vacuole membrane (PVM) location" should be amended to ", indicative of parasite plasma membrane, a parasitophorous vacuole (PV) or parasitophorous vacuole membrane (PVM) location". In the Figure S14 image, red signals are uniformly detected from the merozoites formed in the schizont stage parasite (not really microorganelle patterns), but not from the PVM surrounding the schizont, suggesting parasite plasma membrane localization, not PVM. I agree that the signal is detected from the compartments extending into the iRBC cytosol, which may be difficult to explain if it is located on the parasite plasma membrane, but how frequently were such images seen?
  
  To determine the localization of the protein in the merozoite, we will image P. knowlesi merozoites.
  
  Figure 4D. In the images of liver-stage schizonts, AMA1 does not appear to localize to the micronemes in mature merozoites, suggesting this image is an immature schizont. Although PIRC1 appears to be expressed in liver-stage schizonts, it is difficult to clearly determine whether it localizes to intracellular organelles or to the parasite plasma membrane.
  
  This is a valuable comment. It is difficult to impossible to determine the exact localization of the protein at this stage, irrespective of the exact stage of the parasite. It is clear from the images is that the protein is not secreted at this stage. The main aim of the experiment was to determine whether the protein is produced by the parasite during the liver stage, which the results confirm.
  
  To clarify the above points, the authors should examine whether PIRC1 is detected in intracellular organelles or around the merozoites by analyzing its localization in purified merozoites.
  
  This we aim to do.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.23.697869v1
www.biorxiv.org www.biorxiv.org

The Crunchometer: A Low-Cost, Open-Source Acoustic Analysis of Feeding Microstructure

1
1. Public_Reviews 18 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This valuable manuscript presents an open-source and low-cost acoustic system for quantifying biting and chewing in mice. The approach is carefully validated against human observers, demonstrating strong methodological reliability and enabling high-resolution analysis of feeding microstructure. The tool has broad relevance for studies of appetite circuits and pharmacological interventions. An important contribution is the identification of previously unrecognized "meal-related" neurons in the lateral hypothalamus, providing novel biological insight into solid food consumption. While the support for the methodological advances is compelling and robust, some circuit-level conclusions are preliminary or incomplete, relying on small pilot samples and manual classification, and should be interpreted with caution. This paper will be of interest to those interested in ingestive behavior and/or the hypothalamus.
  
  We thank the reviewers for their careful reading and constructive comments, which have substantially strengthened the manuscript. In the revised version, we have addressed every suggestion and introduced the following major additions: New experiments. We added one additional Vglut2 mouse to the calcium imaging cohort, achieving 386 neurons (Figure 8), and three naive Vgat mice with unilateral DREADD injections (Supplementary Fig. 5-1). New analyses. We performed ROC analyses on all feeding- and licking-related responses of n = 79 LH GABAergic and n = 386 LH glutamatergic neurons (Figures 7D-F and 8D-F). We also characterized the robustness of the Crunchometer to additive white-noise injection (Supplementary Fig. 1-2). New supplementary material. Three new supplementary figures have been added in total (Supplementary Figs. 1-2, 5-1, and 6-1). Supplementary Fig. 6-1 provides instructions for building a 1-Hz pulse generator that blinks an LED in synchrony with the video. Software improvements. We upgraded the original MATLAB scripts to an App GUI version, migrated the full codebase from MATLAB to Python, and packaged it as fully standalone executables for macOS (Apple Silicon) and Windows both of which run without a MATLAB license.
  
  Our point-by-point responses to the reviewers' comments are in red below. Deletions are omitted for brevity. We hope that the revisions fully address the points raised and render the manuscript suitable for publication.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  This is an interesting and valuable paper by Gil-Lievana, Arroyo et al. that presents an open-source method (the "Crunchometer") for quantifying biting and chewing behavior in mice using audio detection. The work addresses an important and unmet need in the field: quantitative measures of feeding behavior with solid foods, since most prior approaches have been limited to liquids. The authors make a clear and compelling case for why this problem is important, and I fully agree with their motivation.
  
  The system is carefully validated against human-scored video data and is shown to be at least as accurate, and in some cases more accurate, than human observers. This is a major strength of the study. I also particularly appreciate the demonstration of the technology in the context of LHA circuitry, which nicely illustrates its utility and importance for mechanistic studies of feeding. I also appreciate the ability to readily time-lock neural data to individual crunches. Overall, the manuscript is well-executed and represents a useful contribution to the field.
  
  We thank you for your appreciation of the Crunchometer and its alignment with ephys:
  
  To further facilitate alignment with neuronal activity, we have now also included a schematic diagram of the pulse generator used to blink an LED in synchronization with the video (see the new Supplementary Fig. 6-1).
  
  The comments I have are largely minor and should be straightforward to address:
  
  (1) The authors should report sample sizes for all mouse cohorts, either alongside the statistics or in the figure legends for mean data.
  
  We apologize for this oversight. We have now included all sample sizes in the figure captions.
  
  (2) Clarification is needed as to whether crunch detection fidelity is influenced by the hardness or softness of the food. The focus here is on standard pellets, with some additional high-fat pellet data, but it would be useful to know how generalizable the method is across different textures.
  
  We thank the reviewer for this important observation. Because the Crunchometer depends on bites generating an audible acoustic signal, food hardness directly impacts detection fidelity. Hard, brittle foods are readily detected, whereas soft foods such as jelly, pudding, or peanut butter are unlikely to produce a reliably detectable signal. This is a genuine scope limitation of the method, and we now make it explicit in the manuscript (see below).
  
  Regarding the two diets used in our study, Chow and HFD pellets differ only slightly in consistency, with HFD being marginally softer. These differences proved too subtle to separate acoustically: the intensity (dB) and spectral content of bites on the two diets were closely overlapping. Accordingly, when we trained an SVM on audio features alone, it could not reliably discriminate Chow from HFD bites.
  
  Importantly, the Crunchometer does not need to resolve food identity from sound, because audio and video play complementary roles in the system: the acoustic channel confirms that a bite occurred, while the mouse's position within the food-specific ROI determines which food was consumed. This division of labor is what allows per-diet attribution despite acoustically similar pellets.
  
  We have added to the Result section:
  
  “The Crunchometer, therefore, does not need to infer food identity acoustically: audio confirms that a bite occurred, and the mouse's position within a food-specific ROI identifies which food was consumed. This design enables per-diet attribution even for pellets with indistinguishable crunch signatures.”
  
  We fully agree with the reviewer that the study of solid-food consumption should not be restricted to standard murine diets. Foods with naturalistic textures, for example, the Granny Smith apple, chocolate, and salted peanuts used by O'Connell et al. (2025), span a much wider range of hardness and elasticity than Chow vs. HFD, and would likely generate more clearly differentiated acoustic signatures. We hypothesize that the Crunchometer could generalize to such foods to the extent that each food produces a clear and distinct acoustic pattern, and even where acoustic signatures overlap, ROI-based spatial attribution would continue to resolve food identity as long as each food is presented at a separate, trackable location.
  
  To make this scope explicit for readers, we have added the following clarification to the Behavioral Protocol section:
  
  "Our study is limited to the acoustic detection of standard Chow and HFD pellets, both of which exhibit a firm, brittle consistency. Future work should evaluate the fidelity of the Crunchometer across a broader range of food textures, encompassing varying degrees of hardness and elasticity, as explored by O'Connell et al. (2025)."
  
  (3) The authors should comment on how susceptible the Crunchometer is to background noise. For example, how well does it perform in the presence of white noise, experimenter movement, or other task-related sounds?
  
  We thank the reviewer for this valuable comment. The Crunchometer performs reliably in controlled, low-noise environments, but like any acoustic detection system, it is vulnerable to interference from sounds whose spectral content overlaps with the bite-related frequency band (500–950 Hz). To quantify this vulnerability, we stress-tested both the threshold-based and SVM-based detection methods by adding white noise to the original audio recordings at progressively decreasing amplitudes and measuring how detection performance degraded as the signal-to-noise ratio decreased. We found that the threshold-based method was more robust to white-noise contamination than the SVM-based method, maintaining acceptable detection performance at lower SNR values before degrading [see the new Supplementary Fig. 1-2].
  
  First, the white noise amplitude is generated as follows:
  
  Where L<sub>𝑛𝑜𝑖𝑠𝑒</sub> is the desired amplitude of the White Noise in dB. Then, the audio signal was range-normalized to its absolute maximum value, and the white noise was added with its desired amplitude, as shown by the following formula:
  
  (4) Chemogenetic activation of LHA GABAergic neurons is used. DREADD-based activation may strongly drive these neurons in a way that is not directly comparable to optogenetic or more physiological manipulations. While I do not think additional experiments are required, it would strengthen the discussion to briefly acknowledge this limitation.
  
  We thank the Reviewer for this thoughtful observation, which we agree with. Chemogenetic activation of LHA GABAergic neurons via DREADDs does not reproduce the physiological firing dynamics of these neurons along several dimensions: it imposes a sustained, tonic drive lasting hours after CNO administration; it likely produces firing rates above the endogenous range; and it lacks the fine temporal structure, phasic bursts, behaviorally- phased locked activity that these neurons exhibit during natural feeding episodes.
  
  We recognize, however, that this limitation is not unique to chemogenetics. Optogenetic approaches likewise fail to reproduce endogenous activity, as they impose synchronous, high-frequency activation patterns on a single cell type that are unlikely to occur under physiological conditions. Moreover, as we previously described in a phenomenon our laboratory termed optoception (Luis-Islas et al., 2022), optogenetic stimulation can itself generate signals perceptible to the animal, adding a further interpretive caveat. Thus, both techniques depart from physiological activity.
  
  For these reasons, we interpret our findings as evidence that activation of LHA GABAergic neurons is sufficient to drive the observed behavioral effects, without claiming that the endogenous firing pattern encodes these behaviors in the same manner or with the same dynamics imposed by our manipulation. We have now added a brief statement to the Discussion acknowledging this limitation explicitly:
  
  “A methodological consideration is that chemogenetic activation via DREADDs imposes a sustained, supra-physiological drive that does not reproduce the temporal structure of endogenous LHA GABAergic activity during feeding; optogenetic manipulations share analogous limitations (see optoception; Luis-Islas et al., 2022). Our findings, therefore, establish that activation of this neuronal population is sufficient to produce uncontrolled feeding and gnawing, without implying that its endogenous firing encodes them in the same manner.”
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This manuscript introduces the Crunchometer, a low-cost, open-source acoustic platform for monitoring the microstructure of solid food intake in mice. The Crunchometer is designed to overcome the limitations of existing methods for studying feeding behavior in rodents. The goal was to provide a tool that could precisely capture the microstructure of solid food intake, something often overlooked in favor of liquid-based assays, while being affordable, scalable, and compatible with neural recording techniques. By doing so, the authors aimed to enable detailed analysis of how physiological states, drugs, and specific neural circuits shape naturalistic feeding behaviors.
  
  Strengths:
  
  The study's strengths lie in its clear innovation, methodological rigor in validation against human annotation, and demonstration of broad utility across behavioral and neuroscience paradigms. The approach addresses a significant methodological gap in the field by moving beyond liquid-based feeding assays and provides an accessible tool for precisely dissecting ingestive behavior. The system is validated across multiple contexts, including physiological state (fed vs. fasted), pharmacological manipulation (semaglutide), and circuit-level interventions (chemogenetic activation of LH neurons), and is further shown to integrate seamlessly with both electrophysiology and calcium imaging.
  
  (1) Introduces a low-cost, open-source acoustic tool for measuring solid food intake, filling a critical gap left by expensive and proprietary systems.
  
  (2) Makes the method easily adoptable across labs with detailed setup instructions and shared benchmark datasets.
  
  (3) Provides high temporal precision for detecting bite events compared to human observers.
  
  (4) Successfully distinguishes feeding microstructure (bites, bouts, IBIs, gnawing vs.
  
  consumption) with greater objectivity than manual annotation.
  
  (5) Demonstrates compatibility with electrophysiology and calcium imaging, enabling fine-scale alignment of neural activity with feeding behavior.
  
  (6) Effectively discriminates between fed vs. fasted states, validating physiological sensitivity.
  
  (7) Captures the pharmacological effects of semaglutide, although this is really just reduced feeding and associated readouts (bouts, latency, etc).
  
  (8) Has potential to distinguish consummatory vs. non-consummatory behaviors (e.g., food spillage, gnawing); however, the current SVM model struggles to separate biting from gnawing due to similar acoustic profiles, and manual validation is still required.
  
  (9) Provides potential for closed-loop experiments.
  
  Weaknesses:
  
  Several limitations temper the strength of the conclusions: the supervised classifier still requires manual correction for gnawing, generalizability across different setups is limited, and the neuroscience findings, particularly calcium imaging of GABAergic and glutamatergic neurons, are based on small pilot samples. These issues do not undermine the value of the tool, but mean that the neural circuit findings should be interpreted as preliminary.
  
  We sincerely thank the Reviewer for the careful and generous reading of our manuscript, and particularly for recognizing the methodological gap that the Crunchometer seeks to fill. We appreciate the acknowledgment that the tool's validation spans physiological, pharmacological, and circuit-level contexts, and that its integration with electrophysiology and calcium imaging was considered seamless. The Reviewer has also accurately identified the three main limitations of the current version of the platform, which we address in turn below:
  
  (1) The supervised SVM classifier still requires manual correction for gnawing.
  
  We agree with the Reviewer. The acoustic signatures of biting (consummatory) and gnawing (non-consummatory manipulation of the pellet) share overlapping linear spectrotemporal features that our SVM exploits for discrimination. This overlap reflects a genuine biomechanical similarity (both involve incisor contact with the pellet surface) rather than a shortcoming of the classifier per se. In ongoing work toward Crunchometer 2.0, we are addressing these limitations. The Crunchometer 2.0 will incorporate more sophisticated deep learning algorithms, such as ResNet, to better exploit non-linear features. Also, we are currently collecting a larger database of bite, gnawing, and environmental noise sounds across different setups, microphones, and conditions to build a more robust dataset for training new AI algorithms that can discriminate between gnawing and biting and generalize more robustly across microphones and behavioral setups. This effort will also be important for developing a closed-loop version of the Crunchometer to detect bites in real time and trigger an actuator (e.g., a laser). But we agree that, for the present manuscript, gnawing classification remains the weakest link in the pipeline.
  
  Nevertheless, we think that having a human in the loop is an advantage (not a disadvantage) of the equipment, as it improves the quality of database curation. No matter how sophisticated future algorithms become, human intervention will remain essential. To this end, we have now developed a human-validation GUI that further facilitates human revision of snippets through an intuitive, easy workflow, reducing human effort (Author response image 1).
  
  Author response image 1.
  
  The visual validator GUI allows a human to verify and reclassify snippets into the correct category in a friendly interface.
  
  (1) Generalizability across different setups is limited.
  
  This is a fair concern and one we have taken seriously, as noted above, and one we have already recognized. The acoustic signal captured by the Crunchometer is inherently sensitive to the geometry and material of the box, microphone placement, the ambient noise floor of the vivarium or experimental room, and the hardness of the specific pellet batch. To mitigate this, we have 1) released the full hardware specifications and bill of materials so that other laboratories can reproduce the acquisition geometry, and 2) provided the benchmark dataset and trained classifier weights so that groups using comparable setups can deploy the tool directly. We have already acknowledged that the SVM does not always generalize across setups. In this regard, we have now shown that the threshold method is more resistant to white-noise contamination (see new Supplementary Fig. 1–2) and, in our experience in the lab, it performs robustly across multiple setups and conditions we have tested. More importantly, improved algorithms are currently under development in our laboratory.
  
  (1) Some neuroscience findings (calcium imaging of GABAergic vs. glutamatergic neurons) are based on small pilot samples (n=2 mice per condition), limiting generalizability.
  
  (3) The neuroscience findings (calcium imaging of GABAergic and glutamatergic LH neurons) are based on small pilot samples.
  
  The Reviewer is correct, and we appreciate the comment. As noted in the manuscript, we explicitly state in the Results and Discussion that these findings are presented as preliminary. As the Reviewer noted, these findings do not undermine the value of the Crunchometer; we fully agree. The calcium imaging experiments were designed as a proof-of-concept to demonstrate that the temporal precision of the Crunchometer is sufficient to align neural activity with individual bite events, rather than as a definitive circuit-level characterization of LH GABAergic and glutamatergic populations during feeding. Nevertheless, we have now increased the number of Vglut2 mice by 1, bringing the total number of glutamatergic neurons to 386. We have now also performed a formal quantification of all the experiments recorded in Vgat (n=2, three sessions, 79 neurons) and Vglut2 (n=3, 6 sessions, 386 neurons). This new formal analysis uncovers neurons selectively tuned to liquid, solid, and both food types. A fully powered characterization of these two populations is underway in our laboratory, once funding arrives in the lab, and will be reported in a dedicated follow-up study.
  
  (2) Chemogenetic and pharmacological experiments used small cohorts, raising statistical power concerns.
  
  The chemogenetic experiments were conducted with a modest sample size (n = 4 bilaterally infected mice). Nevertheless, the data revealed a robust, reproducible behavioral effect consistent across all four subjects. The primary aim of this study was to illustrate the potential utility of the Crunchometer using complementary experimental approaches, including chemogenetic activation of GABAergic neurons in the lateral hypothalamic area (LHA). To further address this concern, we have now included three additional transgenic mice with unilateral infections and obtained results comparable to those of the bilateral condition. These new data are presented in a new supplementary figure comparing unilateral and bilateral infections (Supplementary Fig. 5-1). Notably, chemogenetic activation of LHA GABAergic neurons promoted eating-related consummatory behaviors to a similar extent under both unilateral and bilateral DREADD activation. Accordingly, we have now added the following text to the Results section:
  
  “Notably, unilateral DREADD infections in other naïve n=3 Vgat-cre mice yielded results comparable to bilateral infections. While the effect size was slightly reduced with unilateral administration, the difference between the two delivery methods was not statistically significant (Supplementary Fig. 5-1)”
  
  (3) Correlation with actual food intake is modest and sometimes less accurate than human observers.
  
  We agree that this result highlights the complexity of feeding behavior, influenced by factors such as hoarding and spillage. The threshold method detects feeding behavior solely based on the magnitude of bite-related sounds (e.g., when the mouse bites the pellet close to the microphone), whereas human observers incorporate additional visual information to infer feeding behavior even in the absence of detectable chewing sounds, introducing variability in detection criteria. Although the number of bouts identified by the Threshold method was comparable to those annotated by human observers, the estimated duration (Bout Size) of those detections differed. This discrepancy likely reflects some inconsistency in the detection criteria among human observers and delays in identifying the onset. Moreover, instances of mice chewing pellets without consuming them (i.e., spillage) were observed. These events were often misclassified as feeding bouts, resulting in false positives for both the threshold method and human observers.
  
  (4) Sensitive to hoarding behavior, which can reduce detection accuracy and requires manual correction for misclassifications (e.g., tail movements, non-food noises). However, these limitations are discussed and not ignored.
  
  We thank the reviewer for this constructive comment and for acknowledging that we explicitly discuss these limitations rather than overlook them. Indeed, gnawing and hoarding behaviors (together with tail movements and non-food noises) are factors that can reduce the accuracy of feeding detection. Even using the Crunchometer, an accurate measurement of solid-food consumption therefore remains challenging, which further supports the inclusion of a human-in-the-loop step to ensure a high-quality, well-curated database. Accordingly, we have added the following sentence to the Result section:
  
  "This human validation was essential for ensuring the high fidelity of our behavioral database and mitigating the inherent limitations of automated classification."
  
  Conclusion:
  
  Overall, this is an exciting and impactful methodological advance that will likely be widely adopted in the field. I recommend minor revisions to clarify the limits of classifier generalizability, better contextualize the small-sample neuroscience findings as pilot data, and discuss future directions (e.g., real-time closed-loop applications).
  
  We thank you for your constructive comments.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The manuscript provides detailed information on the construction of open-source systems to monitor ingestive behavior with low-cost equipment. Overall, this is a welcome addition to the arsenal of equipment that could be used to make measurements. The authors show interesting applications with data that reveal important neurophysiological properties of neurons in the lateral hypothalamus. The identification of previously unknown "meal-related" neurons in the LH highlights the utility of the device and is a novel insight that should spark further investigation on the LH. This manuscript and videos provide a wealth of useful information that should be a must-read for anyone in the ingestive behavior or hypothalamus fields.
  
  A scholarly introduction to the history and utility of various ways feeding is measured in rodents is provided. One point - the microstructure of eating solid food - has been studied extensively (for one of many studies, see https://doi.org/10.1371/journal.pone.0246569 ). However, I agree that the crunchometer will allow for more people to access recordings during food intake and temporally lock consummatory behavior to neural activity.
  
  Apologize for this oversight. This is indeed an important reference for the microstructure of eating solid food in a social context. We have now included it in the Introduction of this reference “Food intake in social contexts is a more ethologically valid model, in which radio-frequency identification (RFID) transponders enable the simultaneous assessment of feeding behavior across multiple mice in a single box (Rathod and Fulvio, 2021)”
  
  Questions on results:
  
  (1) It is unclear why 10% sucrose solution was used as a liquid instead of water, given that the study is focusing on the solid food source.
  
  One motivation for using sucrose rather than water alone was to create a highly palatable environment and to test whether mice would prefer palatable liquid sucrose over HFD. However, the choice of liquid stimulus will ultimately depend on the end user and the specific experimental conditions of each lab implementing the Crunchometer. Future versions of the apparatus could also incorporate multiple sippers to deliver several tastants alongside solid food.
  
  (2) It is unclear how essential the human verification is in the pipeline - results for Figure 1 keep referring to the verification as essential. Is that dispensable once the ML algorithms have been trained?
  
  Human validation, also referred to as a human-in-the-loop approach, is a deliberate design feature of the Crunchometer rather than a limitation (also see answer to Reviewer 2). The outputs of machine-learning algorithms, no matter how accurate, require expert corroboration to confirm or reject the specific behaviors under study, particularly when the behavioral repertoire is as heterogeneous as feeding (which encompasses sniffing, gnawing, biting, hoarding, and manipulating the food item). For this reason, we view human oversight as a safeguard for scientific rigor that remains valuable even as more advanced algorithms (e.g., deep learning and convolutional neural networks) are incorporated into future versions of the pipeline. As noted above, we have implemented a graphical user interface (GUI) that enables batch sorting and rapid inspection of multiple snippets (using a photographic montage view strategy), substantially reducing manual curation time.
  
  (3) The ability to extrapolate food quantity consumed is limited, with high variability. This limitation does not undercut the utility of the crunchometer, but should be highlighted as one of the parameters that are not suitable for this system. This limitation should be added to the limitations section.
  
  We thank the reviewer for this constructive observation. We fully agree that, although the Crunchometer reliably detects feeding events and their temporal microstructure (bouts, meals, and latencies), extrapolating absolute food quantity consumed from acoustic signals is indirect and carries substantial variability and should not be the primary readout for studies that require precise gravimetric measurements. As recommended, we have now explicitly listed this limitation in the Limitations section of the Discussion:
  
  "While the Crunchometer provides accurate temporal detection of bites and feeding microstructure, the estimation of absolute food mass consumed from bite-related acoustic signals shows considerable variability across trials and subjects. This limitation arises from individual differences in gnawing patterns, food fragmentation, and hoarding behavior. Accordingly, the Crunchometer is best suited for analyses of feeding dynamics and behavioral microstructure, whereas studies requiring precise quantification of ingested mass should complement the system with direct gravimetric measurements for example, real-time weighing of feeders."
  
  (4) The ability to discriminate between gnawing and consummatory behavior is a strength (Figure 5), and these findings are important. However, it is unclear what can be made of mice that have 'gnawing' behavior in the fasted state (like in Figure 3). It seems they would need to be eliminated from the analysis with this tool?
  
  We apologize for this misunderstanding. We have now more clearly indicated in Figure 3A that the cumulative feeding time reflects only Chow and HFD feeding bouts, excluding gnawing.
  
  We now state: “The lower panel shows the cumulative feeding time (only for Chow and HFD pellets, gnawing is excluded) over a two-hour session for the fed (green) and fasted (purple) groups (n = 6 mice).”
  
  Under normal physiological conditions, gnawing is an infrequent behavior in rodents. In our study, however, its frequency increased in the fasted state a change possibly attributable to heightened stress. This behavior was further exacerbated by chemogenetic manipulation, driving it to non-physiological levels.
  
  (5) Why is there a post-semaglutide fed group and not a fasted group in Figure 4? It seems both would have been interesting, as one could expect an effect on feeding even 24h after semaglutide treatment. This would help parse the preference better because the animals eat such a small amount of semaglutide, that it is hard to compare to the fasted condition with saline treatment.
  
  We thank the reviewer for this insightful suggestion. It would have been interesting to include a fasted post-semaglutide group, as it could provide relevant information about the lasting effect of an acute administration of semaglutide. However, we decided not to include this additional experimental condition because the semaglutide fasted mice displayed a markedly reduced food intake during the experimental session. An additional post-semaglutide fasted session would have required a prolonged food restriction (at least 24 hours), which we consider an unnecessarily stressful condition for the mice. Therefore, we decided to feed the mice once the experiment was completed. Nevertheless, we believe that comparing the food intake (grams) between the fed group shown in Figure 3C and the post-semaglutide fed group reported in Figure 4D provides insight into the lasting effect of semaglutide. The comparison reveals a remarkable reduction of food intake in the post-sem fed mice relative to the fed group, suggesting that the acute administration of semaglutide suppresses the feeding behavior for up to 24 hours.
  
  (6) The identification of 'meal-related' neurons in the LH is another strength of the manuscript. Although there is currently insufficient data, could similar recordings be used to give a neurophysiological definition of a 'meal' duration/size? Typically, these were somewhat arbitrarily defined behaviorally. Having a neural correlate to a 'meal' would be a powerful tool for understanding how meals are involved in overall caloric intake.
  
  We thank the reviewer for this insightful suggestion. We agree that the traditional behavioral criteria for defining meals, typically derived from log-survivor analyses of inter-pellet or inter-lick intervals, are operationally useful but ultimately arbitrary, and that a neurophysiologically grounded definition would be a valuable complement for the field.
  
  Our current dataset was not designed to formally establish such a definition, and we want to be cautious about the logic of the problem: validating a neural criterion solely against the behavioral one it would replace is circular. A genuinely neural definition of a meal would need to be anchored to independent criteria, for example, its ability to predict the latency and size of the subsequent meal, its correspondence with post-prandial satiety markers, or its response to anorectic agents such as GLP-1 receptor agonists. This is a methodologically nontrivial undertaking that we believe deserves a dedicated follow-up study.
  
  As preliminary evidence that such a problem is tractable, we note that the meal-related LH neurons identified here display sustained activity with onset and offset dynamics that broadly parallel the behaviorally defined meal boundaries (Figure 6), suggesting that meal structure is reliably encoded at the population level. A related approach, using neural activity to segment ingestive behavior at finer temporal scales, has been successful in our previous work on licking microstructure in the nucleus accumbens (Tellez, et al. 2012), and we consider the present findings a natural extension of that line of research to the larger meal timescale.
  
  (7) The conclusion in the title of Figure 8 is premature, given the pilot nature and small number of neurons and mice sampled.
  
  We appreciate this comment and agree with the reviewer. Accordingly, we have performed additional experiments on the Vglut2 glutamatergic population, in some cases using three-plane recordings, which substantially increased the yield to 386 glutamatergic neurons. As the reviewer anticipated, we observed a broad diversity of response profiles in this population, including neurons selective for liquid licking, for solid food intake, and for both food types. We also formally quantified these responses using ROC analysis, applying the same procedure to the Vgat GABAergic neurons (n = 79). These new findings have been incorporated into the revised manuscript (Results and Discussion). We thank the reviewer for prompting this extension of the analysis (see Manuscript).
  
  Conclusion:
  
  Overall, this report on the Crunchometer is well done and provides a valuable tool for all who study food intake and the behaviors around food intake. Clarification or answers to the points above will only further the utility and understanding of the tool for the research community. I am excited to see the future utility of this tool in emerging research.
  
  We sincerely thank the Reviewer for these kind and encouraging words, and for the constructive feedback provided throughout the review. The clarifications and additional analyses prompted by these comments have substantially improved the manuscript, and we share the Reviewer's enthusiasm about the potential of the Crunchometer to contribute to future research on feeding behavior.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) The authors have done a phenomenal job with the Introduction, highlighting the need for this tool, citing the history of feeding measurement systems and their relative strengths and weaknesses.
  
  Thank you for your comment; we greatly appreciate your positive feedback.
  
  (2) A limitation of Automated Pellet Dispensers is the possibility that the animals fail to consume the pellet after it has been retrieved from and registered by the device, potentially constraining accuracy.
  
  We address this issue in the Introduction, specifically, we wrote:
  
  “Current methods to monitor feeding behavior could be classified into four different classes…3) Automated Pellet Dispensers: Often integrated into operant conditioning chambers, these devices provide a controlled way of delivering food pellets. While devices like the open-source Feeding Experimentation Device (FED3) (Ali and Kravitz, 2018; Matikainen-Ankney et al., 2021), a pellet dispenser, are useful for measuring reinforcement, they alter the natural feeding patterns of mice, for example, requiring a simple action, such as a nose-poke can reduce overeating and weight gain in mice (Barrett et al., 2025). A further limitation is that FED3 may overestimate consumption if an animal retrieves and registers a pellet without actually consuming it. A significant strength of this method is its ability to enable closed-loop optogenetic stimulation concurrent with neuronal recordings.”
  
  (3) I really appreciate the data in Figure 2G, where they displayed the results of an "outlier" animal, as behavior is extremely variable, and it's useful to see how this system deals with the variability of the subjects. This is again highlighted by mouse number 5 in Figure 3A, which exhibited profound gnawing behavior.
  
  We thank the reviewer for this positive comment. Our decision to include the outlier animal in Fig. 2G and to report the atypical gnawing behavior of mouse 5 in Fig. 3A reflects a deliberate commitment to documenting inter-individual variability, which we consider a core strength rather than a limitation of behavioral work. We believe that such cases are particularly informative for evaluating the robustness of automated monitoring systems under behavioral-lab conditions.
  
  (4) It would be useful to know if the mice had prior exposure to HFD, as I found it surprising that many animals consumed the chow at all, sometimes completely ignoring the HFD (fasted mouse 3). I only ask because in our experience, mice with constant exposure to both HFD and chow predominantly, if not always, consume the HFD over chow. This could have something to do with the way the food substrates are presented in this chamber.
  
  We thank the reviewer for this point. Mice in this experiment did receive prior exposure to both Chow and HFD during the habituation phase, with at least two 30-min sessions in the experimental chamber with both diets available (no video was collected at this stage). The Chow and HFD feeders were identical in geometry, position, and accessibility, so we do not consider either environmental novelty or spatial bias to be the main driver of the pattern. Rather, we interpret the strong chow preference of fasted mouse 3 as a case of residual neophobia toward the HFD pellet. Since performing these experiments, we have refined our habituation protocol: pre-exposing animals to a single HFD pellet in their home cage, a familiar and safe environment, prior to any chamber session, greatly mitigates HFD neophobia in our hands. Familiarity with the novel food in a safe context thus appears to be the critical factor, rather than the duration of exposure in the experimental chamber. We have added this refinement to the Methods as a recommendation for future users of the Crunchometer.
  
  “Behavioral protocol. All mice were habituated to the Crunchometer for 2 days before the recording session. Each habituation session lasted 30 minutes, during which two food pellets were placed in the chamber: one standard Chow pellet (LabDiet 5008) and one highly palatable high-fat diet (HFD) pellet (Research Diet, D12451). As a practical note, we recommend allowing the HFD to equilibrate to room temperature before the experiment and pre-exposing mice to a single HFD pellet in their home cage to attenuate neophobia prior to testing.”
  
  (5) The authors claim saline or semaglutide was administered immediately before the start of the behavioral experiment, but given the time it takes for this drug to blunt appetite, I was somewhat surprised it led to such a rapid decrease in both chow and HFD intake. Could the authors comment on this? How quickly do these animals experience the malaise associated with these drugs? Also, this dose seems to be on the very high side, so I imagine it's making the animals feel quite sick and is probably a big reason why the effects last so long into the post-sem measurements. Was bodyweight tracked across this treatment? I'm not so convinced that sema treatment led to a loss of strong HFD preference, as the chow intake was already very low to begin with, and as mentioned above, it looks like the drug just led to a cessation of all intake. I'd just tamp down this claim of preference switch. It clearly reduced intake of both substrates, it's just harder to detect for the chow because it was already so low to begin with.
  
  Thank you for these comments. We agree with the Reviewer and have toned down the claim regarding a switch in HFD/chow preference. In the revised Results section, we now explicitly acknowledge that further characterization is needed using chronic semaglutide treatment. Specifically, we added the following sentence:
  
  "Future studies should use the Crunchometer to characterize changes in HFD/chow preference during 24-h monitoring under chronic semaglutide treatment."
  
  In addition, we administered a single subcutaneous dose of semaglutide at 30 nmol/kg (0.123 mg/kg), following the protocol described by Zhang et al. (2023). In their study, pharmacokinetic analyses showed that plasma concentrations, measured by an ELISA assay that immunoreacts with both growth differentiation factor 15 (GDF15) and the intact N-terminal region of glucagon-like peptide-1 (GLP-1), increased shortly after administration of the 30 nmol/kg dose in C57BL/6 mice. Peak plasma concentration (Cmax = 43.1 nmol/L) was reached at 6.7 hours (Tmax), and levels returned to baseline by 24 hours post-administration, indicating complete drug clearance. Although this dose is relatively high, it was intentionally selected to produce a robust acute response from a single administration, as our objective was to assess the drug’s effects within a short, 2-hour observational window. Under these conditions, we observed a rapid reduction in food intake immediately following the onset of Crunchometer recording. While we do not exclude the possibility that these effects could be more pronounced over longer observation periods or with chronic dosing regimens, our study was strictly limited to a single acute exposure.
  
  Although semaglutide is known to suppress food intake through multiple mechanisms, including stress and malaise measured by Conditioned Taste Aversion and release of stress hormones (Teixidor-Deulofeu et al., 2025), we do not believe that discomfort or malaise played a significant role in our study. While the mice did reduce their food intake during semaglutide administration, this reduction persisted for at least 24 hours after the final dose—at which point the drug was no longer present—suggesting a satiety-driven effect rather than one mediated by aversion. In this sense, previous studies have demonstrated that semaglutide continues to suppress food intake even when the aversive pathway mediated by Area Postrema GLP1R neurons is inhibited. Although blocking this pathway reduces flavour aversion, the anorexic effect remains, indicating that suppression of intake can be driven by satiety independently of nausea or malaise (Huang et al., 2024). In summary, although we selected a relatively high dose to ensure a detectable acute effect within our experimental window, this choice was grounded in previously published data, and our findings are consistent with established mechanisms of action for semaglutide.
  
  Additionally, body weight data have now been included in Figure 4D. We observed a similar body weight loss of approximately 5% on the first day of drug administration, consistent with the findings reported by Zhang et al. (2023).
  
  (6) The authors demonstrate that CNO administration prompted significant increase in liquid sugar intake in the last panel of Figure 5F as a confirmation that LH GABAergic neurons are implicated in processing reward, however given the above results it seems likely that these mice will drink anything including water (when not thirsty, thus in a non-rewarding scenario) or possibly aversive agents like quinine.
  
  This is an interesting question, and we agree with the Reviewer. The original discovery by Jennings and Stuber showed that optogenetic activation of these GABAergic neurons induces voracious feeding and that Vgat mice kept licking for liquid rewards in an appetitive task (Jennings et al., 2015). We also acknowledge that prior work has shown LH GABAergic neuron activation can drive consumption of non-caloric and biologically irrelevant stimuli, including wood gnawing, water, or saccharin (Navarro et al., 2016). However, several lines of evidence support a role in reward/palatability processing rather than purely indiscriminate consumption. Our own lab (Garcia et al., 2021) showed that activation of LH Vgat+ neurons increased quinine intake only during water deprivation; in sated animals, activation failed to promote quinine intake. Instead, these neurons promoted overconsumption of sucrose when available, leading us to conclude that LH Vgat+ neurons increase the drive to consume the nearest food, but this drive is potentiated by the palatability of the tastant. In non-human primates, LH GABA activation drives goal-directed eating predominantly for palatable food (Ha et al., 2024), supporting a reward-related function across species. Together, these findings indicate that while LH GABAergic activation does broadly promote consumption, the selectivity toward palatable stimuli observed in Figure 5F is consistent with a reward-related function.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.25.666891v3
www.biorxiv.org www.biorxiv.org

BetaII-Spectrin Gaps and Patches Emerge from the Patterned Assembly of the Actin/Spectrin Membrane Skeleton in Human Motor Neuron Axons

1
1. Public_Reviews 18 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Statement
  
  This valuable study characterizes the emergence of the membrane-associated periodic cytoskeleton (MPS) in the axons of human motor neurons derived from induced pluripotent stem cells. Super-resolution imaging of beta-II spectrin provides convincing evidence for the patterned assembly of spectrin-poor gaps and spectrin-rich MPS in the medial region of the axons and its enhancement by the kinase inhibitor staurosporine. The data advocates against gap formation by cytoskeleton disassembly in a continuous MPS. Instead, a continuous MPS may result from nascent MPS patches and their maturation, a model that would benefit from live imaging for validation.
  
  (R1) We thank the reviewers and editor for their constructive and thoughtful feedback. We are pleased the reviewers found our evidence to be convincing and that our study provides a valuable framework for understanding the complex dynamics of MPS assembly.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.
  
  (R2) We thank the reviewer for the detailed and accurate description of the data shown and its relevance to further our understanding of MPS assembly mechanism and function.
  
  Strengths:
  
  The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.
  
  (R3) We thank the reviewer for the positive comments on the manuscript and the convenience of the experimental system developed to further study the dynamics of MPS assembly. We hope others turn into motor neurons to explore cortical cytoskeleton biology and hopefully shed light into their susceptibility in various degenerative diseases.
  
  Weaknesses:
  
  Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.
  
  (R4) Because staurosporine may hit various kinases relevant to the phenomenon under study we did not elaborate too deeply on the likely targets in the discussion. We have, however, included the possibility that the relevant kinase in this matter could be PKC, in light of the new study published while our manuscript was under revision (Heller et al., 2025) (see second last paragraph in the Discussion section). Staurosporine represented a convenient initial approach that allowed us to find the phenomenon, and we are now conducting new studies dissecting the molecular pathways involved. However, the extent of such studies lies beyond the scope of the present report.
  
  See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.
  
  Some more comments:
  
  (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.
  
  (R5) As we acknowledge this possibility, we believe that, even if they occurred, they could not contribute to our observations of gaps-and-patches phenomenon since this latter subsisted long (hours and days) after any gross manipulation of media. Moreover fixed samples, when observed under DIC, confocal or STED did not evidence such beadings. We do refer to a characteristic local enlargement that was very localized and very low in numbers (see Fig.1C and E, and Suppl. Fig1C and E), so we don't believe these are transient, and do not resemble the structure referred to as beading. Structurally, beading is essentially different since it appears in rows of consecutive “beads” in long stretches, where round, small enlargements of axonal caliber are arranged in a consecutive manner, resembling pearls on a string. As mentioned by the reviewer, the beading phenomena can occur transiently when drastically changing media osmolarity (rarely done in cell culture manipulations) or non-tranciently when axons are undergoing degeneration. Indeed, to prevent gross changes in osmolarity, our routine fixation is a 4% PFA and 4% sucrose in PBS. In any case, we did not observe signs of beading in the cultures used for this study.
  
  (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.
  
  (R6) Our stainings are for tubulin protein isoforms beta-III and alpha-II. That is, they would label microtubules, but free tubulin as well. Hence we don't think this is evidence for “patchy microtubules”. The slight decrease in intensity for tubulin within gaps is indeed something to investigate, and can indicate that tubulin prefers to accumulate within patches.
  
  (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?
  
  (R7) We agree with the apparent discrepancy. However, one has to take into account that these axons are still elongating even at 2 weeks in culture and beyond. Hence, at any time point, there is a new axonal compartment recently added, and hence, with low βII-spectrin and no organized MPS. Also, the dynamical evolution of the gaps-and-patches structure has to take into account the rate of βII-spectrin supply and transport. If supply is somehow lower than a given threshold, it is expected that there will be more gaps, given the new, more distant parts of the axons have a lower supply of βII-spectrin. To explore this formally, we are working on simulations of these multifactorial dynamic systems to better understand this, that together with key experimental observations would enhance our understanding into our model of MPS assembly in growing axons. However, findings for this project will be the subject of another manuscript.
  
  (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?
  
  (R8) The results with the co-treatment with Latrunculin A and Staurosporine are indeed intriguing, and provide clear evidence that the gap-and-patch pattern arises from local assembly of the MPS, requiring newly formed actin filaments. On the other hand, the fact that F-actin within the pre-formed MPS seems unaffected is not surprising. There are many different populations of F-actin in axons (i.e. MPS rings, longitudinal filaments, actin patches, actin trails), all of which have a different rate of monomer turnover. Latrunculin A affects filaments indirectly. The target of Latrunculin A is not actin filaments, but free monomers. Monomer sequestration ultimately affects actin filaments: filaments are constantly exchanging monomers, but, devoid of free monomers, filaments get shorter and eventually disappear. The drastic decrease in global F-actin in LatA-treated axons reflects that. The fact that F-actin in the MPS is preserved shows that these filaments are stable -if they are not losing monomers in the time frame of the treatment, the filament remains unaffected. This subject is extensively covered in the 8th paragraph of the Discussion section.
  
  We have not used Jasplakinolide. The expected outcome will not mimic that of Latrunculin A since Jasplakinolide has a different mechanism of action (i.e. it binds -and stabilizes- the actin filament).
  
  (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?
  
  (R9) We agree with the reviewer's interpretation. A virtue of our experimental model and our interpretations of the observations in fixed cells is that it gives rise to informative questions such as the ones posed by the reviewer. See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.
  
  (R10) We thank the reviewer for the detailed and accurate description of the data shown.
  
  Strengths:
  
  The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.
  
  (R11) We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.
  
  Weaknesses:
  
  The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.
  
  (R12) See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.
  
  We don't discard the notion that axons carrying familial ALS mutations will show defects in MPS formation and/or stability when observed at longer culture times, or under culture conditions that promote neuronal aging (Guix et al., 2021). Thus, we continue to work with these cells, but the goal of such project lies well beyond the primary message of the present manuscript, as we discuss in the second paragraph of the Discussion section.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.
  
  Strengths:
  
  (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.
  
  (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.
  
  (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.
  
  (R13) We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.
  
  Weaknesses:
  
  (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.
  
  (R14) Along the project, various gaps-and-patches parameters were measured in different conditions and stainings. In all these examinations the only parameter that changed considerably was their abundance. While this suggests that the gaps-and-patches features are comparable between control and staurosporine-treated cells, we acknowledge as a general caution regarding negative data—that subtle qualitative differences cannot be entirely ruled out. We have now emphasized this possibility in the 9th paragraph of the Discussion section.
  
  (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.
  
  (R15) See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  The reviewers all agree that the work would strongly benefit from live imaging to assess the maturation dynamics of the gap/patch pattern.
  
  (R16) Reviewers agreed that some of the conclusions of our manuscript would benefit from live imaging for validation. Various anticipated technical and biological challenges made these approaches not to be conducted for this initial study on human motor neurons. Just to mention the most important, from previous work of our labs, these cells themselves are difficult to transfect at 2 weeks in culture. Also, ectopically expression of tagged βII-spectrin escapes normal expression control and it has been noticed that ectopic expression yields to protein localization that does not necessarily reflect the endogenous distribution, or that produces cellular responses that precludes the observation of the phenomena under study. These difficulties in studying over-expressed tagged βII-spectrin have been reported in the field, with mentions that the analysed axons were those expressing “low levels of the construct” (Boyer et al., 2026; Zhong et al., 2014; Zhou et al., 2022). Taking this into account, we did not anticipate that, for the goals of the present project, live-imaging was to be included. However, given the positive comments and reception of our conclusions, we sought to try to perform this challenging and risky approach. To that end, we used a C-terminus tagged mouse βII-spectrin-GreenLantern plasmid to transfect our cells (a kind gift from Dr. Subjohit Roy, UCSD, USA). After 3 rounds of differentiating cells and trying various combinations of plasmid quantity, lipofectimine-to-DNA ratios and times of transfection (amongst other parameters), we have got an extremely low efficiency of transfection, and the few expressing neurons showed a distribution of βII-spectrin-GreenLantern that did not match our observations of immunolocalization of endogenous βII-spectrin. Taking all these into account, the present version of the manuscript will not include live-cell imaging on expressed tagged βII-spectrin. Given that reviewers found that some statements in the initial submission would have been better supported by live-imaging, we made changes in the manuscript so as to acknowledge the limitations of concluding dynamic mechanisms from fixed samples (see for example last sentences on 5th paragraph of the Discussion section). Having said so, we hope to be able, in the future, to overcome these experimental challenges and be able to establish live-imaging of βII-spectrin in neurons. For example, to avoid unregulated transgene expression, Heller and colleagues recently generated a βII- spectrin-mNeonGreen conditional knock-in (cKI) mice, consisting of a LoxP- flanked alternative final exon of endogenous βII-spectrin with a C- terminal mNeonGreen fusion that is expressed upon Cre expression (Heller et al., 2025). The implementation and further development of such approaches will be very helpful in new studies on the dynamics of βII-spectrin and the MPS as a whole. However, the scale of work needed to accomplish those approaches represent stand-alone projects.
  
  Reviewer #1 (Recommendations for the authors):
  
  In the section "The MPS is absent in beta-II spectrin gaps, the authors mention that the presence of MPS in patches suggests that the axons are not undergoing degeneration. I don't think this is a good criterion to use, despite the citations they take support from.
  
  (R17) We agree with the reviewer's suggestion: in virtue of the unlikely connection between the cited developmental axon degeneration process in sensory neurons and the possible axon degeneration of long term cultures of human-iPSCs-derived motor neurons studied here, we have eliminated the sentence of reference
  
  The authors show that degradation by proteases does not happen in their case. In this regard, they may want to discuss the recent article by Heller et al, Science 2025 (https://doi.org/10.1126/science.adn6712) and Hofmann et al, Sci. Rep., 2022 (https://doi.org/10.1038/s41598-022-18562-5)
  
  (R18) By western blot analysis, we did not see evident changes in proteolysis-derived fragments. However it is likely that even when finding phenotypes with protease inhibitors, protein fragments accumulation is below the sensitivity of western blots. We were expecting gross changes observable by western blot in the case proteolysis explained gap formation.
  
  Calpain and Caspase activity has been shown to be relevant in different aspects of MPS biology. To the works cited by the reviewer, now one has to add the very recent work by Fei and colleagues (Fei et al., 2026). We have modified part of the Discussion section to analyse our results in this broader context.
  
  Briefly, Hofmann and colleagues found that acute treatment with calpain inhibitors right before axotomy lead to an increase in percentage of periodic βII-spectrin (referred by authors as “periodicity”) in the regenerated axons in a 2-hour period. Interestingly, the βII-spectrin patches they describe at distal portions did not increase in number, but they increased in size. This indicates that in the particular situation of axonal regeneration calpain activity puts a brake into MPS formation within patches. This invited us to re-examine our own protease inhibition experiments, and measured patch length in this. The new results are shown in Supplementary Fig. 6 and and further analysed in the Discussion section. In summary, our changes were much less notable than the ones found in regenerating axons, but follow the same trend: protease inhibitors made patches longer.
  
  On the other hand, Heller and colleagues found in live-imaging studies that calpain activity contributes to the steady-state dynamics of βII-spectrin exchange in a mature MPS lattice. More recently, Fei and colleagues found that caspase or calpain inhibition does not change the steady-state organization of a mature MPS lattice when observing treated axons after fixation samples. Fei and colleagues find a relevant role for calpains whenever massive endocytosis (of any kind) is engaged experimentally. Interestingly, all these studies, including ours, examined calpains roles in MPS in different scenarios. When looked in detail, we don’t believe that these are contradictory results among them, and a complete picture of calpains (and caspases) roles in MPS assembly, growth, maintenance and remodeling will have to take into account all the above mentioned results, including ours. All these analyses are now included in the Discussion section.
  
  Minor comments:
  
  (1) "Recently, it was proposed that this continuous MPS organization arises from the coalescence of discontinuous "patches" of incomplete MPS units that originate in the distal axon and migrate proximally (Zhong et al. 2014)." Please check the citation. Should it be Hoffman et al. 2022?
  
  (R19) The reviewer is correct. The proper citation has now been included.
  
  (2) Is there an established link between ALS and spectrin? I would suggest decreasing the emphasis on this as no clear conclusions are achieved.
  
  (R20) As stated in the text, the study of ALS mutations is justified from two aspects: one aspect is that there are several tubulin and other cytoskeletal proteins whose mutations are linked to ALS (Castellanos-Montiel et al., 2020) and microtubules dynamics has been shown to affect the cortical skeleton (Qu et al., 2017). Second, since human motor neurons are affected in ALS, we thought that a complete characterization of the βII-spectrin cortical cytoskeleton in these cells should include ALS-related mutations. We have now included an a basic MPS description in TDP43 and SOD1 mutation (Suppl. Fig. 5).
  
  The aspect of ALS-related mutations only occupies two short paragraphs in the main text and some panels in Supplementary information. To follow the suggestions by the Reviewer, we have downplayed the relative relevance of these results in the text, without compromising the amount of data we show.
  
  (3) There is a typo in the approximate symbol used for 150 kDa in the section where calpain and caspase activity is reported.
  
  (R21) Typo corrected.
  
  (4) Please add the Latrunculin concentration used in the main text, as it makes it easier for the reader.
  
  (R22) Done.
  
  (5) In the Discussion, paragraph starting with "We further showed ...", there is a typo where Zhong et al is cited.
  
  (R23) Corrected.
  
  (6) Supplementary Figure 1B: attachment instead of 'atachment'.
  
  (R24) Corrected.
  
  (7) Include DIVs or time in the schematic. It is easier for the reader to understand.
  
  (R25) We have now included time references in schematics of Suppl. Fig1B.
  
  (8) Supplementary Figure 1C
  
  Unable to distinguish βII-spectrin and βIII-tubulin in the merged image. Separate figure panels will help.
  
  (R26) The merged images in the reconstructions are merely to better show the tracing individual axons at such low magnification. Relevant portions with only βII-spectrin channels are shown in C1 and C2. Separated individual channels are shown elsewhere across the manuscript.
  
  (9) Supplementary Figure 4D
  
  Why is there so much cleavage product for αII-spectrin across DMSO and treatment? It varied over batches as well. Doesn't this mean that αII-spectrin is going through more proteolytic cleavage? Why?
  
  (R27) The amount of cleavage product for αII-spectrin is not a surprise to us. For instance, although calpains and caspases can potentially process both α- and β-spectrin, in in vivo scenarios where calpain activity is triggered there are much more fragments of α-spectrin being produced (Czogalla & Sikorski, 2005). On the other hand, our staining of cleaved-αII-spectrin by the SNTF antibody by immunofluorescence (Fig4C) parallels the findings by western blot -high levels of cleaved-αII-spectrin across treatments. A similar strong staining using this antibody has been recently shown in the intact axon (Heller et al., 2025). It will be interesting in the future to address if these fragments have any biological significance beyond being mere byproducts of αII-spectrin processing.
  
  Reviewer #2 (Recommendations for the authors):
  
  Suggestions for improving the quality of the manuscript:
  
  (1) Live imaging in combination with FRAP assays will help define whether the capture of spectrin from gaps into patches is true. Fixed neurons only provide static information and may not reflect real-time physiological effects.
  
  (R28) See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.
  
  (2) Could the presence of F-actin trails in axons facilitate the formation of patches? Will the use of formin/Arp2/3 inhibitors rescue the effect of staurosporine, similar to Latrunculin A?
  
  (R29) Very interesting suggestion. It is likely that different pools of F-actin contribute to the dynamic of MPS formation, and actin trails are definitely worth investigating in this context.
  
  (3) Figure 8 lacks a latrunculin A treated condition? Why is this not present?
  
  (R30) The quantification of that treatment was excluded for space and readability. We have now included the values of group LatA + DMSO in Fig8Cand D and rearranged the whole figure.
  
  (4) Does neuronal stimulation have any effect (KCl treatment) on gaps and patches?
  
  (R31) Very interesting suggestion. Unfortunately, we have not examined whereas neuronal stimulation affects any parameter of the gaps-and-patches structure.
  
  (5) Please check the manuscript for typos and reference insertion points in the text. More than a couple were noted.
  
  (R32) We have corrected typos.
  
  Reviewer #3 (Recommendations for the authors):
  
  This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.
  
  (1) One major concern is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin, solely based on their morphological similarity. This should be further discussed in the manuscript. Further analysis of additional cytoskeleton components, including microtubules in staurosporine-treated neurons, could also be provided.
  
  (R33) See R14.
  
  (2) In Figure 1E, betaIII-tubulin and NF-H seem to accumulate in betaII-spectrin-rich axonal enlargements. If these are patches, how do you reconcile this finding with Figure 2C-D, where NF-M and alphaII-tubulin are not specifically enriched in betaII-spectrin patches?
  
  (R34) We actually show that axonal enlargements and patches are structurally unrelated, in many aspects. We mention these axonal enlargements as a way to perform an exhaustive characterization of all βII-spectrin features found in these axons.
  
  (3) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence. This should be further discussed in the revised version of the manuscript.
  
  (R35) The limitation of the experimental approach is now further discussed (see for example last sentences on 5th paragraph of the Discussion section).
  
  (4) On a more general note, the title of some of the Results sub-sections could be revised to convey the findings of those sub-sections and not the Methods that were used (example: "Quantitave and Qualitative analyses of betII-spectrin distribution....").
  
  (R36) According to the suggestion, we have changed the title of this subsection.
  
  References
  
  Boyer, N. P., Sharma, R., Wiesner, T., Parperis, C., Delamare, A., Pelletier, F., Jullien, N., Bhatt, A. M., Parra-Rivas, L. A., Kearney, P. J., Shavarebi, F., Leterrier, C., & Roy, S. (2026). Spectrin condensates provide a nidus for assembling the axonal membrane-associated periodic skeleton. iScience, 29(1), 114454. https://doi.org/10.1016/j.isci.2025.114454
  
  Castellanos-Montiel, M. J., Chaineau, M., & Durcan, T. M. (2020). The Neglected Genes of ALS: Cytoskeletal Dynamics Impact Synaptic Degeneration in ALS. Frontiers in Cellular Neuroscience, 14, 594975. https://doi.org/10.3389/fncel.2020.594975
  
  Czogalla, A., & Sikorski, A. F. (2005). Spectrin and calpain: A “target” and a “sniper” in the pathology of neuronal cells. Cellular and Molecular Life Sciences: CMLS, 62(17), 1913–1924. https://doi.org/10.1007/s00018-005-5097-0
  
  Guix, F. X., Capitán, A. M., Casadomé-Perales, Á., Palomares-Pérez, I., López Del Castillo, I., Miguel, V., Goedeke, L., Martín, M. G., Lamas, S., Peinado, H., Fernández-Hernando, C., & Dotti, C. G. (2021). Increased exosome secretion in neurons aging in vitro by NPC1-mediated endosomal cholesterol buildup. Life Science Alliance, 4(8), e202101055. https://doi.org/10.26508/lsa.202101055
  
  Heller, E., Kurup, N., & Zhuang, X. (2025). The membrane skeleton is constitutively remodeled in neurons by calcium signaling. Science (New York, N.Y.), 389(6760), eadn6712. https://doi.org/10.1126/science.adn6712
  
  Qu, Y., Hahn, I., Webb, S. E. D., Pearce, S. P., & Prokop, A. (2017). Periodic actin structures in neuronal axons are required to maintain microtubules. Molecular Biology of the Cell, 28(2), 296–308. https://doi.org/10.1091/mbc.E16-10-0727
  
  Zhong, G., He, J., Zhou, R., Lorenzo, D., Babcock, H. P., Bennett, V., & Zhuang, X. (2014). Developmental mechanism of the periodic membrane skeleton in axons. eLife, 3, e04581. https://doi.org/10.7554/eLife.04581
  
  Zhou, R., Han, B., Nowak, R., Lu, Y., Heller, E., Xia, C., Chishti, A. H., Fowler, V. M., & Zhuang, X. (2022). Proteomic and functional analyses of the periodic membrane skeleton in neurons. Nature Communications, 13(1), 3196. https://doi.org/10.1038/s41467-022-30720-x
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.05.09.653215v4
www.biorxiv.org www.biorxiv.org

Cell-to-cell signalling mediated via CO2: activity dependent axonal CO2 production opens Cx32 in the Schwann cell paranode

1
1. Public_Reviews 18 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  The manuscript by Butler et al. explores a novel physiological role for connexin 32 (Cx32) hemichannels in Schwann cells at peripheral nerves. Building on the authors' prior work on CO<sub>2</sub> - sensitive gating of connexins, this study proposes that mitochondrial CO<sub>2</sub> production dependent on neuronal activity promotes the opening of Cx32 hemichannels in the paranode, which in turn modulates neuronal activity by reducing conduction velocity. This hypothesis is addressed using a multifaceted approach that includes immunofluorescence microscopy, dye uptake assays, calcium imaging, computational modeling, and extracellular recordings in isolated sciatic nerves.
  
  Among the strengths of the study are the interdisciplinary integration of imaging, in silico approaches, and functional data. Also, this study proposes a new mechanism with profound physiological relevance. Specifically, Butler et al. provide new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves.
  
  In the current state, the study has some limitations. The evidence linking Cx32 to the observed dye uptake and conduction velocity changes relies primarily on pharmacological inhibition with carbenoxolone, which lacks specificity. The imaging data show overlapping marker signals that preclude the anatomical distinction between nodes and paranodes. FITC uptake, while convincing to test Cx32 hemichannel gating, lacks spatial-temporal information and validation of distribution and localization to viable intracellular compartments. Moreover, while the findings are intriguing, functional proof that Cx32 regulates conduction velocity through ATP release or other downstream effects remains incomplete. Further work using targeted genetic tools, live-tissue imaging, and additional controls would strengthen the mechanistic conclusions.
  
  Overall, the manuscript offers compelling preliminary evidence that supports a new role for Cx32 in peripheral nerve physiology and raises important questions for future investigation.
  
  We thank the reviewer for their comments and agree that the evidence for involvement of Cx32 is indirect. We have now used viral expression of Cx32<sup>DN</sup> in SCs to remove CO<sub>2</sub> sensitivity from the endogenous Cx32 to strengthen this link. We have reviewed our presentation of the morphology in terms of the node/paranode/juxtaparanode distribution and adjusted accordingly. We have added new data using GCaMP transduced into Schwann cells that provides the live-tissue imaging that the reviewer requests.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This article aims to demonstrate that local production of CO<sub>2</sub> at the axonal node opens Cx32 hemichannels in the Schwann cell paranode, and that CO<sub>2</sub> diffuses through the AQP1 channel to reach Cx32 and trigger its opening. The authors also present evidence supporting a physiological role for this regulatory mechanism. They propose that CO<sub>2</sub>-dependent Cx32 activation mediates activity-dependent Ca<sup>2+</sup> influx into the paranode, and by increasing the leak current across the myelin sheath, it contributes to a slowing of action potential conduction velocity.
  
  The study presents a very interesting and novel mechanism for the physiological regulation of Cx32 hemichannels. The findings are relevant to the field, and the methods and results are of good quality, with some improvements in interpretation and explanation required, and some minor experimental suggestions.
  
  Strengths:
  
  The article is solid in terms of the novelty of the findings and relevance for the physiology of myelinated axons. In addition, it is of major interest for the Connexin field because it explores a physiological way to open Cx32 hemichannels. The experiments are well elaborated, and most of them are sufficient for the main points described by the authors. The finding that nervous activity will trigger the mechanism of hemichannel opening by CO2 is probably the most relevant biological mechanism derived from this article.
  
  Weaknesses:
  
  Throughout the manuscript, the authors interpret their findings as if the described mechanism specifically occurs in the node and paranode regions. However, there is no direct evidence identifying the precise site of CO<sub>2</sub> production or the activation site of Cx32 hemichannels. Therefore, statements such as the one in the title ("activity-dependent CO<sub>2</sub> production in the axonal node opens Cx32 in the Schwann cell paranode") should be reconsidered or removed, as they may be misleading and are not essential to the interpretation of the data. In addition, the participation of aquaporin AQP1 as the main conduit for CO2 diffusion through the plasma membrane could have another interpretation.
  
  We thank the reviewer for their comments and agree that we do not have direct evidence for the site of CO<sub>2</sub> production or the site of activation of Cx32 hemichannels. This direct evidence is extremely difficult to obtain, and we therefore depend on indirect arguments. Mitochondria represent the major source of CO<sub>2</sub>, and their distribution will therefore indicate where CO<sub>2</sub> is likely to be produced. We agree that this is not essential to the interpretation of the data and have adjusted the text as recommended. We have added a section to the Discussion to consider this point in more detail. The reviewer alludes to a reported interaction between AQP1 and NaV1.8 as a possible alternative interpretation. We can confidently rule this out as the AQP1 blocker has no effect on the compound action potential.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Main comments:
  
  (1) While the imaging system used in this study is technically capable of resolving nodes and paranodes, interpretation depends critically on marker specificity and tissue orientation. In some figures, markers such as Caspr or KCNA2 appear to partially overlap with KCNQ2 or the putative axonal node, which could reflect biological proximity but may also result from incomplete spatial separation in the z-dimension or the curvature of teased fibers. Similarly, Cx32 immunoreactivity or FITC signal is occasionally seen within nodal gaps, raising questions about how accurately this data supports the author's hypothesis. Additionally, while the authors claim that AQP1 is localized in nodes, the data suggest the opposite. Clarifying these patterns using fluorescence intensity line scans or additional nodal markers such as Nav1.6 or Ankyrin G would help distinguish overlapping signals from true domain-specific localization and reinforce the spatial conclusions of the study.
  
  We have changed our presentation of the localisation studies. We have concentrated on colocalization of Cx32 and AQP1 (now Fig 2) and moved the other studies to supplements to this figure. While we have retained the same images of Cx32 and AQP1 localisation, we have emphasized that these are SIM images and thus higher resolution than conventional LSM images, and also from a single optical plane. We have also clarified that the colocalization studies are restricted to analysis of the node/paranode regions.
  
  (2) To strengthen the conclusion that Cx32 specifically mediates the observed dye uptake, additional data or an alternative approach would be valuable. One feasible, though technically demanding, strategy would be the use of AAV-mediated delivery of Cx32-targeting shRNA directly into the sciatic nerve, ideally under a Schwann cell-specific promoter. This approach could achieve localized, cell-type-specific knockdown of Cx32 within a relevant time frame. Alternatively, the authors are encouraged to consider using additional pharmacological inhibitors to exclude the contribution of other conduction pathways, such as pannexin channels. These complementary strategies would reduce the interpretive ambiguity associated with non-specific blockade.
  
  We agree that this is desirable and have used Cx32<sup>DN</sup> under the control of the Mpz promoter (delivered by AAV via intranerval injection). This approach has several advantages -the Cx32<sup>DN</sup> subunit coassembles with endogenous Cx32<sup>WT</sup> and the heteromeric assemblies lack CO<sub>2</sub> sensitivity (first shown in Butler & Dale, 2023; and this strategy used with Cx26 to demonstrate its role in the control of breathing van de Wiel, 2020). This is a new figure (Fig 9). We have included supplemental figures with Fig 9 to document the coassembly of Cx32<sup>DN</sup> with Cx32<sup>WT</sup> by FRET.
  
  These new data test a very specific hypothesis: that CO<sub>2</sub> binding to Cx32 is responsible for the CO<sub>2</sub> sensitivity of the nerve. We find by comparing transduced and non-transduced fibres in the same nerve that Cx32<sup>DN</sup> essentially abolishes activity dependent loading of FITC into the Schwann cells.
  
  (3) Related to FITC experiments: Assuming the hypothesis of the authors is correct and CO2 release is restricted to the node, one should expect that if the major source of CO2 is in the nodal mitochondria, the hemichannels adjacent to the node will open first, assuming the spatial-temporal diffusion of CO2. To demonstrate this point, I would strongly suggest performing tissue imaging with real-time dye uptake. This approach should capture the FITC wave starting from the Cx32 channel opening in the paranode, as expected. Visualization of uptake in fixed and sectioned tissue is not the ideal approach to detect functional hemichannel opening in intact, viable cells, and at this point, they do not demonstrate that the uptake occurs in the node. From my perspective, if real-time experiments using isolated axons are feasible, it would make this paper more solid.
  
  The suggested method is not practical as the FITC in solution will be fluorescent and thus obscure the entry of FITC into the paranode. We have however expressed GCaMP8 under the control of the Mpz promoter, and this is expressed at paranodes and gives a CO<sub>2</sub> and activity-dependent Ca<sup>2+</sup> signal at the paranode. This gives a real time measure of the effect of CO<sub>2</sub> on the nerve. The GCaMP8 signal is enhanced by AZ and blocked by TC AQP1-1 (see below).
  
  (4) In Figure 5, Supplement 1, the authors present data using GRAB-ATP to suggest that Cx31.3 hemichannels do not release ATP under CO<sub>2</sub> stimulation. However, control experiments with GRAB-ATP alone (without Cx31.3 expression) are not shown, and parallel conditions with Cx32-expressing cells are lacking. Including these controls would strengthen the manuscript. Finally, testing the permeability of Cx31.3 to FITC directly, using the same conditions as in the main experiments, would clarify whether the discrepancy reflects differences in molecular permselectivity or CO<sub>2</sub> sensitivity.
  
  Figure 5 supplement 1, does show GRAB<sub>ATP</sub> alone without Cx31.3 expression (in the box plot). However, we have now added raw traces for this to the figure in panel B. CO<sub>2</sub>-dependent and voltage dependent ATP release via Cx32 has been previously shown in two papers (Butler & Dale 2023, Frontiers Cell Neurosci; Lovatt et al 2025, J Biol Chem). The Cx32<sup>DN</sup> result (above) further eliminates any contribution of Cx31.3.
  
  (5) Suggestion: It would be valuable to explore whether the proposed mechanism is conserved across both motor and sensory neurons, as this would broaden its physiological relevance. Since the sciatic nerve contains both fiber types, selective analysis or comparative data could clarify whether hemichannel activity is differentially regulated or restricted to a specific neuronal subtype.
  
  This is a great idea, but well beyond the scope of this paper. In an ex vivo preparation it would be very difficult to selectively stimulate the sensory vs motor fibres.
  
  Suggestions to improve data presentation and other minor comments:
  
  (1) Reduce/reorganize the figures to make the paper straightforward. For example, (a) immunofluorescence data showing the CO2 signaling machinery could be represented in one single figure; (b) Figure 1 could include all the findings and keep it as a final figure to summarize what the authors claim.
  
  We thank the reviewer for these suggestions. We prefer to keep Fig 1 up front to have our hypothesis clear for the reader to assist their interpretation as they go through the paper. We have altered the balance of figure supplements and main figures that document the immunolocalisation studies to concentrate on the main areas of novelty (AQP1 and Cx32 colocalisation and CA localisation).
  
  (2) The following phrase in the Results section is incomplete: "There was colocalization between Cx32 and CytC in the Schwann cell paranode, and (Fig 2, mean; 95% confidence interval, M1: 0.314; 0.198, 0.431 and M2: 0.261; 0.165, 0.357)."
  
  We have corrected this
  
  Additionally, the three values for M1 and M2 should be clearly defined and contextualized. In the current state, I couldn't understand them.
  
  The three values are mean and lower and upper 95% confidence limit:
  
  M1: mean 0.314; 95% CI, 0.198 to 0.431
  
  We have now made this clearer in the text.
  
  (3) It is unclear whether the authors calculate Manders' coefficients across the whole image or selectively at the node/paranode. Clarifying this would help interpret the specificity of co-localization claims.
  
  The Manders’ coefficients were selectively calculated at the node/paranode and we have amended the text to clarify this.
  
  (4) It is possible that mislocalization of CytC and SFXN1 could reflect antibody unspecificity or post-isolation alterations in protein distribution (e.g., apoptosis or stress). The authors briefly discussed this observation, but it could be a good idea to consider the use of an additional antibody to validate mitochondria localization.
  
  Apoptosis or stress is unlikely as the isolated nerves were fixed immediately after isolation with little dissection prior to fixation.
  
  The SFXN1 antibody was validated by Fowler et al 2013, and IP-HTMS confirmed SFXN1 as an interacting partner with Cx32. In this paper they also described SFXN1 as being present at the plasma membrane, the speculation being that it was taken there by Cx32.
  
  We think this is probably a valid result and we have further cited the Fowler et al 2013 paper in our discussion of this point.
  
  (5) Figure 4: The legend states: "Arrow heads indicate the node, and arrows depict the outer myelin." However, no arrows are visible in the figure. Please check.
  
  Corrected.
  
  (6) Figure 5: Keep consistency: Include in panel N that trpa1 inhibitor is in the presence of 70mmHg PCO2, as indicated for cbx in the same panel.
  
  Done
  
  (7) Figure 5 Supplement 1: Normalization using 1 concentration of ATP could not be appropriate if the sensor-dependent signal is not linear. If possible, authors should make a concentration-response curve and fit the data using the appropriate equation.
  
  Over the range we are measuring ATP (low µM) GRAB<sub>ATP</sub> is approximately linear to allow a single point calibration -we documented this in Butler and Dale 2023. This is also shown in the original paper describing GRAB<sub>ATP</sub> (Wu et al 2022 Neuron). We have clarified this point in the methods by referring to these papers.
  
  (8) Figure 6: The increase in FITC signal could represent a basal uptake over time. Authors should clarify the magnitude/rate of the basal uptake. Another option is showing a picture of the uptake using the control frequency at a time of 10 min. Legend: It is not clear in panel C if this picture corresponds to frequency stimulation. If so, it would be beneficial to specify the time.
  
  Could dye loading in this Fig simply be time dependent rather than stimulation dependent? Our data show that this is not the case -the dye loading controls of Fig 5A were exposed to FITC for 10 mins at 35 mmHg PCO<sub>2</sub> -very little loading is apparent. We now explicitly make this point in the text. Our use of Cx32<sup>DN</sup> also eliminates this explanation, by demonstrating the necessity of CO<sub>2</sub> binding to Cx32 for dye loading to occur.
  
  As there is no panel C in this figure, we assume the referee means panel B and have added the frequency of stimulation and time duration used to achieve the loading.
  
  (9) Please revise the legend of Figure 7. It seems to refer to a previous version of the manuscript's figure.
  
  Thanks for pointing this out. We omitted giving a letter to one of the panels and we have corrected this so that legend and figure now correspond.
  
  (10) Figures 10 and 11. Please consider including a bright field image or indicating with an arrow where the node and/or paranode is located.
  
  The old Fig 11 has been omitted. The old Fig 11 is now Fig 10. Unfortunately, we cannot add a bright field image as we did not save these in this experiment.
  
  (11) Figure 11. The authors could consider doing this experiment in the presence of Cx32 blockers to strengthen their conclusion.
  
  We have decided to remove this figure as it the information it contains is shown in the new GCaMP8 figure (Fig 12).
  
  (12) Figure 12: Calcium signal increases in different areas beyond the ROI. Not clear that the calcium signal is restricted to the node, as shown in previous figures. Please clarify if the preparation is different.
  
  We agree that this is a limitation – there is a lot of out of focus light due to Fluo4 being membrane permeable and loading many fibres within the nerve (potentially both axon and Schwann cell). Importantly, this phenomenon occurs in the in-focus ROI (for which we show BF image).
  
  As we think this is basically a limitation of using Fluo4-AM, we have now produced better data using GCaMP8 under the Mpz promoter (new Fig 12). This expresses at the paranode and in far fewer fibres so the resolution of the recordings is better. We have added these new data into the main body of the paper and relegated the Fluo4 data as a figure supplement to Fig 12 that provides independent supporting information.
  
  (13) Figure 13: Please indicate the stimulation frequency. The authors could consider attaching Figure 7 Supplement 1 to this figure to make the manuscript straightforward.
  
  Frequency now indicated.
  
  With regard to the original Figure 7 supplement 1 -thanks for this suggestion. After consideration, we have split this up and attached it as figure supplements to the relevant figures (Figure 6 and Figure 8). We have added equivalent data to Fig 7 (effect of H<sub>2</sub>O<sub>2</sub>). We think this simplifies presentation for the readers.
  
  (14) Figure 7 Supplement 1 and Figure 8 Supplements: Please indicate trace colors in panel A of these figures. Also, correct the spelling issue in the legend of Figure 8 Supplement 1 (for panel B).
  
  Corrected
  
  (15) Statistical clarifications: The authors should specify which experimental groups were included in some statistical analysis where p-values are reported, but the information about which groups are compared is missing.
  
  Corrected
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Localization of CO<sub>2</sub> production and Cx32 activation
  
  Throughout the manuscript, the authors interpret their findings as if the described mechanism specifically occurs in the node and paranode regions. However, there is no direct evidence identifying the precise site of CO<sub>2</sub> production or the activation site of Cx32 hemichannels. Therefore, statements such as the one in the title ("activity-dependent CO<sub>2</sub> production in the axonal node opens Cx32 in the Schwann cell paranode") should be reconsidered or removed, as they may be misleading and are not essential to the interpretation of the data.
  
  We agree that we have not shown this -and now exercise more caution in the description of the results and discuss this point.
  
  (2) Figures 2 and 3 - Cx32, mitochondria, and AQP1 localization
  
  In Figures 2 and 3, it is difficult to clearly discern the localization of Cx32, mitochondria, and AQP1 in the nodal and paranodal regions. The addition of zoomed-in images and 3D reconstructions (or at least orthogonal views) would greatly help clarify whether these components are indeed localized to the axon or Schwann cell, and whether they are specifically enriched in nodal or paranodal domains. As currently presented, the images suggest that all components of this "triad" are broadly distributed within the cells, not restricted to, nor particularly enriched in, nodal or paranodal areas. This observation further supports the concern raised in point 1.
  
  We have revised our presentation of the localisation more clearly and added a section to the discussion to consider this point more fully. We now explicitly mention that these are SIM images and in a single optical plane, therefore colocalization is genuine. We have also clarified that the calculation of Manders’ coefficients was performed only at the node/paranode regions. However, we accept that these components are distributed more widely than the node/paranode.
  
  (3) Figure 5 - Clarify legend labels
  
  In the graph shown in Figure 5, the legend would benefit from more descriptive labeling of the experimental groups. For clarity, indicate that FCCP was applied alone, and that HCO30031 was co-applied with high PCO<sub>2</sub>, to simplify interpretation for the reader.
  
  Corrected
  
  (4) Additional experiment to block mitochondrial CO<sub>2</sub> production
  
  An experiment should be added to completely or significantly inhibit mitochondrial CO<sub>2</sub> production, for example, by combining FCCP treatment with a TCA cycle inhibitor such as fluoroacetate. This would more directly demonstrate that CO<sub>2</sub> generation is required for hemichannel opening during FCCP treatment. It is important to control for this because FCCP can increase ROS production as a result of compensatory metabolic activity (i.e., increased NADH/FADH<sub>2</sub> generation). Since Cx32 hemichannels are known to be modulated by ROS, and can also regulate mitochondrial ROS production, it is crucial to distinguish the role of CO<sub>2</sub> from that of ROS in these experiments.
  
  Thanks for this great comment, as it gave us the idea of linking activity-dependent (rather than FCCP-evoked) gating of Cx32 to the TCA cycle and, as the reviewer says, CO<sub>2</sub> generation more directly. As fluoroacetate is only effective at inhibiting the TCA cycle in glial cells, we used H<sub>2</sub>O<sub>2</sub> at 50 µM which is highly effective at blocking aconitase in neurons (Tretter & Adam-Vizi, 2000). This greatly reduced FITC dye loading in response to activity. We now include these data in the paper (Fig 7).
  
  We note that our new data with Cx32<sup>DN</sup> further establishes the link to CO<sub>2</sub> as opposed to ROS.
  
  Furthermore, to complement the experiments involving carbonic anhydrase (CA) manipulation, additional controls or mechanistic validation may be necessary to support the conclusions drawn.
  
  We think that our use of Cx32<sup>DN</sup> greatly strengthens our conclusions that CO<sub>2</sub> is the messenger from the axon that gates Cx32 in the paranode.
  
  (5) AQP1 and Na<sup>+</sup> channel interaction - alternative interpretation
  
  It has been reported that AQP1 interacts with voltage-gated Na<sup>+</sup> channels, influencing action potential generation. For example, in AQP1 knockout mice, current injection-evoked action potentials show a reduced peak inward current, suggesting impaired Nav1.8 function (Zhang et al., J. Biol. Chem., 2010; doi: 10.1074/jbc.M109.090233). This raises the possibility that the observed effects of AQP1 inhibition (e.g., with TC AQP1-1) could also result from altered Na<sup>+</sup> channel activity, not just impaired CO<sub>2</sub> transport. I suggest that this alternative interpretation be acknowledged and discussed, as the current data do not rule it out.
  
  While constitutive KO of AQP1 does alter action potential generation in DRGs and an interaction between AQP1 and Nav1.8 has been documented, we do not think that this is a viable alternative interpretation of our data. We have measured the CAP during all our manipulations including the use of TC AQP1-1, and its amplitude is unaltered (see Fig 8 fig supplement 1 and Fig 13D). Our data therefore shows that, in the context of our experiments, application of the AQP1 blocker, TC AQP1-1, does not alter Na<sup>+</sup> channel activity. The difference between our data and the evidence from AQP1 knock-out may arise from the nature of an acute application of an antagonist (short term effect without changing protein expression) and constitutive knock out, which is likely to have longer term effects. We have added some discussion to address this point (last few lines, Page 9).
  
  (6) Figures 11A and 12C - Add heat map calibration
  
  In Figures 11A and 12C, the changes in Ca<sup>2+</sup> signals are difficult to interpret. In some areas, color changes appear to occur outside of cellular structures. I recommend including a heat map calibration scale for both figures to facilitate the interpretation of the signal intensity and localization.
  
  We agree that these data are limited by the technique used, and as mentioned above we now have GCaMP8 data that has better resolution and strengthens our conclusions.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.04.647196v2
ecampusontario.pressbooks.pub ecampusontario.pressbooks.pub

What is Intrapersonal Communication?

2
1. sanikasabnis 18 Jun 2026
  
  in Public
  
  Intrapersonal communication can be defined as communication with one’s self, and that may include self-talk, acts of imagination and visualization, and even recall and memory (McLean, 2005).
  
  This definition explains that intrapersonal communication is the communication we have with ourselves. It includes our inner thoughts, self-talk, memories, and mental visualization. This type of communication is important because it influences how we think, make decisions, solve problems, and understand our experiences before communicating with others.
2. telmore6 10 Jun 2026
  
  in Public
  
  Intrapersonal communication can be defined as communication with one’s self, and that may include self-talk, acts of imagination and visualization, and even recall and memory (McLean, 2005).
  
  The communication we have with ourselves. Takes place in our own mind. It includes the thoughts that run through our minds, the way we talk to ourselves, imagine different situations, and think back to past experiences.
Visit annotations in context

Annotators

telmore6

sanikasabnis

URL

ecampusontario.pressbooks.pub/commbusprofcdn/chapter/what-is-intrapersonal-communication/
www.biorxiv.org www.biorxiv.org

Improved cryo-EM reconstruction of sub-50 kDa complexes using 2D template matching

1
1. Public_Reviews 17 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This important study builds on previous work from the same authors to present a conceptually distinct workflow for cryo-EM reconstruction that uses 2D template matching to enable highresolution structure determination of small (sub-50 kDa) protein targets. The paper describes how density for small-molecule ligands bound to such targets can be reconstructed without these ligands being present in the template. However, the evidence described for the claim that this technique “significantly” improves the alignment of the reconstruction of small complexes is incomplete. The authors could better evaluate the effects of model bias on the reconstructed densities.
  
  We have addressed both concerns. Regarding the claim that 2DTM “significantly” improves alignment, the most direct evidence is the controlled comparison in Fig. 3: using the same particle stack and the same reconstruction software (RELION), 2DTM-derived orientations yield a 3.1 Å reconstruction whereas RELION auto-refinement of the same particles yields 3.7 Å. Because the orientations are the only variable, this comparison directly demonstrates that 2DTM produces more accurate alignments.
  
  We further evaluated RELION auto-refinement with initial low-pass filters of 3, 5, 10, and 15 Å (Fig. 3c); the final resolution remained between 3.7 and 4.0 Å across all conditions, indicating that the achievable resolution difference reflects a fundamental distinction between the two approaches. 2DTM directly leverages high-resolution signal in the template during alignment, which is particularly advantageous for small particles.
  
  To assess whether this improvement extends beyond the ligand pocket, we constructed a composite omit map (Fig. 5) assembled from 36 reconstructions, each generated using a template with a different subset of residues deleted. The composite shows that density can be recovered at distributed locations across the kinase, including peripheral and surface-exposed regions further away from the alignment center. Recovery varies across sites, with some regions exhibiting weaker or fragmented density, consistent with local differences in structural heterogeneity and residual alignment error. Together, these results indicate that the orientation estimates support global density recovery rather than being confined to the ligand-binding region.
  
  Regarding model bias, we have strengthened both the quantitative and visual analyses. Specifically, we have (i) updated the template-bias metric Ω in Fig. 4, (ii) added grouped occupancy refinement showing that omitted residues 222–227 refine to 0.55–0.80 (mean 0.72), ATP to 0.61, and Mn to 0.28, while template-included control residues 150–155 remain near 1.0 (0.88–1.00; mean 0.96), and (iii) completed the composite omit map described above. Together, these results provide consistent evidence that densities corresponding to omitted regions are not driven by the template and can be recovered from the data, while template-included regions show some, albeit limited evidence of overfitting, as expected.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper describes an application of the high-resolution cryo-EM 2D template matching technique to sub-50kDa complexes. The paper describes how density for ligands can be reconstructed without having to process cryo-EM data through the conventional single particle analysis pipelines.
  
  Strengths:
  
  This paper contributes additional data (alongside other papers by the same authors) to convey the message that high-resolution 2D template matching is a powerful alternative for cryo-EM structure determination. The described application to ligand density reconstruction, without the need for extensive refinements, will be of interest to the pharmaceutical industry, where often multiple structures of the same protein in complex with different ligands are solved as part of their drug development pipelines. Improved insights into which particles contribute to the best ligand density are also highly valuable and transferable to other applications of the same technique.
  
  Weaknesses:
  
  Although the convenient visualisation of small molecules bound to protein targets of a known structure would be relevant for the pharmaceutical industry, the evidence described for the claim that this technique “significantly” improves alignment of reconstruction of small complexes is incomplete. The authors are encouraged to better evaluate the effects of model bias on the reconstructed densities in a revised paper.
  
  We thank the reviewer for these constructive comments. We have updated the template-bias metric Ω in Fig. 4 and added two further quantitative controls: grouped occupancy refinement of omitted residues and a composite omit map spanning the entire protein. Full details are provided in our responses to Comments 1 and 2 below.
  
  Reviewer #1 (Recommendations for the authors):
  
  Main Comments
  
  (1) For the 1ATP structure: Q-scores for deleted residues/ligands are worse than the Q-scores for residues in the template. This means that the reconstructed map must suffer from template bias. Another indication of this bias is that the density for the ATP (and the omitted residues) appears to be weaker than the density for the residues in the template (although this is not easy to assess from the figures). The authors should perform additional experiments to quantify this bias.
  
  (a) One option could be to do what the X-ray crystallographers call an OMIT map, and omit allresidues, a few at a time, from the template in multiple 2DTM runs. They could then assemble a density map from all the omitted residues together and measure the resolution of the omit map against the known template by FSC.
  
  (b) Another insightful experiment would be to take the various 2DTM reconstructed maps describedin the paper and perform a refinement of the atom occupancies of all residues in the structure. Residues included in the template should refine to values close to 1. In the absence of bias, the occupancies of the omitted residues should be 1 too; if the reconstructed map were completely biased, those occupancies would refine to 0. Therefore, the refined occupancies of omitted residues could perhaps serve as a measure for the amount of bias in the reconstructed map.
  
  We thank the reviewer for these detailed and constructive suggestions. We agree that the lower Q-scores for omitted regions indicate weaker density and that template bias exists at residues that are included in the template. To quantify this more directly, we corrected the template-bias metrics at the omitted region (mask from the full–omit template difference) in Fig. 4.
  
  Following the reviewer’s suggestion, we performed Phenix real-space grouped occupancy refinement against the omit reconstruction using the docked full model. The results are shown in Table. S2. We refined occupancies for the omitted residues (chain E 222–227), ATP, Mn, and template-included control residues (chain E 150–155), while excluding waters. The omitted residues refined to occupancies of 0.55–0.80 (mean 0.72), ATP to 0.61, and Mn to 0.28, whereas the control residues remained near 1.0 (0.88–1.00; mean 0.96). These results indicate substantial recovery of density in the omitted regions, but also some degree of bias.
  
  The substantially lower refined occupancy of Mn<sup>2+</sup> may reflect genuine partial occupancy in the dataset. While compact features can be especially sensitive to residual alignment error, we cannot conclude from the present analysis that alignment effects alone account for the weak Mn<sup>2+</sup> density.
  
  Finally, we have constructed a composite omit map to assess density recovery across the protein. We generated 36 omit templates, each deleting ∼10 non-overlapping residues scattered across the structure (including peripheral and surface-exposed regions). For each template, an independent 2DTM search and reconstruction was performed. Local density patches were extracted within 3 Å of the omitted atoms (with neighboring residues excluded as described in Methods) and assembled into a composite map (Fig. 5). The composite map shows that density can be recovered at distributed locations across the protein and is not restricted to the central binding pocket. Recovery is variable across sites, with some regions exhibiting weaker or fragmented density, consistent with local differences in signal-to-noise, structural heterogeneity, and residual alignment error.
  
  (2) The claim that 2DTM leads to “Improved” reconstruction (title) and “alignment and reconstruction [...] can be significantly improved” (abstract) is not supported by the data presented in the paper. The smallest single particle structure to resolutions sufficient for de novo atomic modelling is currently the ACA2 complex, with an ordered mass of less than 40 kDa, which was reconstructed using Blush regularisation in RELION. This paper should be referenced, and statements about single particle analysis (SPA) not working for sub-50 kDa complexes should be toned down. In general, I would say that 2DTM and SPA are not competing techniques, and the paper would be better if it focused on the intrinsic advantages of 2DTM (like ease-of-use for screening of pharmaceutical compounds) and useful findings described that make 2DTM better, e.g., excluding thick ice.
  
  We thank the reviewer for this important perspective and have added the Blush regularization reference Kimanius et al. (2024) to the revised manuscript, noting that the 40 kDa Aca2–RNA complex was reconstructed to 2.5 Å resolution using this approach (at L451). Furthermore, Blush regularization could be applied to reconstructions derived from 2DTM-based particle stacks, and a combination of both approaches may yield further improvements.
  
  We agree that 2DTM and SPA are complementary rather than competing techniques and have revised the manuscript to reflect this. We have also toned down claims in the abstract, which now states that 2DTM “reconstructed a previously intractable ∼43 kDa kinase complex and improved the density of its ligand-binding site” rather than making broad claims about SPA limitations. In the discussion, we now describe 2DTM as broadening possibilities for structural studies of targets “that have remained difficult to reconstruct” rather than implying they are impossible by SPA.
  
  Regarding the intrinsic advantages of 2DTM: beyond ligand screening, the composite omit map (Fig. 5, described in Comment 1) demonstrates that 2DTM-derived orientations support density recovery throughout the entire protein, including peripheral and surface-exposed residues, using roughly an order of magnitude fewer particles than conventional SPA workflows.
  
  (3) Given the uncertainties about the amount of template bias in the reconstructed 2DTM densities, I have trouble interpreting the predictions in Table 1. Where would the 1ATP structure lie in Figure 8? How much bias would there be in a 2DTM reconstruction at SNR n = SNR s? Could the authors perform tests on simulated data to confirm these predictions? At the point of SNR n = SNR s, how would a 2DTM reconstruction look, and what would refined occupancies for deleted residues be?
  
  (This may reflect a misunderstanding on my part, but I don’t really see how the SNR n = SNR s is completely dependent on the number of orientations searched (through Equation 1). In Figure 8, is the full search in a 4k x 4k micrograph, or inside a particle box? And what are the relevant search ranges? Perhaps as a consequence of this misunderstanding, I do not understand how one would decide on the amount of noise in the simulated data for these tests.)
  
  We thank the reviewer for this important question and agree that this point needed clearer explanation. In our framework, is the expected alignment-noise level from maximizing many cross correlations, where N<sub>s</sub> is the total number of sampled hypotheses in the 5D search (in-plane angle, out of-plane angles, and x, y shifts), not only the number of orientations. Thus, the relevant search is the per-particle alignment search window (full or constrained), not a full 4k×4k micrograph area.
  
  At SNR<sub>n</sub> = SNR<sub>s</sub>, the true-match and noise-maxima levels are at a threshold; one could imagine if SNR<sub>s</sub> is only slightly larger than SNR<sub>n</sub>, the correct pose is favored on average, so with sufficiently large particle numbers real omitted-region density should accumulate, but with residual pose errors that attenuate high-frequency amplitudes (effectively a large positive B-factor). In that regime, sharpening (negative-B correction) can improve visibility once signal is accumulated. Therefore, we expect partial recovery rather than fully unbiased recovery at this threshold, with omitted-region occupancies remaining between 0 and 1 and below template-included controls (consistent with our measured values), and improving as SNR<sub>s</sub> − SNR<sub>n</sub> and particle number increase. Simulations at this exact threshold would require a very large particle number to achieve sufficient statistics, and we leave this to future work. We have added this clarification to the Supporting Information.
  
  (4) The strong (> 5 sigma!!) and ubiquitous difference densities in Figure 9A imply that the authors have a serious problem with their forward model, which could explain some of the effects of model bias discussed above. I recommend they investigate these differences in detail. It would be good to see negative and positive densities in different colours to understand these differences better. The text speaks about incomplete capture of the solvent background, but the difference densities appear to be of much higher spatial frequencies than those typical for background/solvent effects (e.g., 15-20A). It may thus also be helpful to analyse these differences in Fourier space.
  
  We thank the reviewer for this important point. In our previous analysis, we did not incorporate an appropriate protein mask when generating the difference map, which contributed to widespread residual densities. We have now regenerated the map using the program diffmap.exe (https: //grigoriefflab.umassmed.edu/diffmap) with a protein soft mask and moved it to the Supplementary Information (Fig. Figure 1—figure supplement 4, contour SD = 20). With this controlled setup, the strongest coherent residual densities localize to the omitted ATP pocket and residues 222–227, consistent with recovery of omitted features. We have revised the figure/text accordingly and clarified that remaining diffuse residuals are likely due to forward-model mismatch (including solvent/background representation). We also added to the manuscript that improved template generation may be achieved by incorporating recent methods that learn environment-aware scattering factors directly from experimental cryo-EM maps.
  
  Other Comments
  
  (1) P.1: Alongside reference 2, a reference to the 1.2 Å apoferritin structure from the Stark group should be included.
  
  We have added the reference at L30.
  
  (2) P.2: “commond line tool”
  
  We have corrected the typo.
  
  (3) P.2-3: Robust reconstruction of the ATP binding pocket: Auto-refinements in RELION without alignments do not exist, and corresponding statements need to be removed from the manuscript. If one wants to skip alignments, then there is no refinement left to be done. In that case, one should just perform a reconstruction of the 2 halves (e.g., using relion reconstruct) and then run a standard RELION postprocessing.
  
  We agree with the reviewer and have revised the manuscript accordingly. Technically, RELION’s relion refine with the --skip align flag runs an iterative loop that re-estimates the per-particle noise model (spectral noise σ<sup>2</sup>) and computes the gold-standard FSC between half-maps, but it does not modify the particle orientations or translations. As the reviewer correctly points out, this is effectively a 3D reconstruction followed by postprocessing, not a refinement. We have updated the text to replace “skip-alignment auto-refinement” with “3D reconstruction without angular refinement” to accurately reflect what was performed.
  
  (4) P.3: What are “first-quadrant p-values” and “three-quadrant p-values”?
  
  We apologize for the ambiguity and now define these terms explicitly in the revised text (with citation to the p-value paper). After transforming z-score and SNR to probit coordinates, “first-quadrant” (1Q) p-values use only candidate points with both coordinates > 0 (i.e., both probit-zscore and probitSNR are positive). “Three-quadrant” (3Q) p-values include candidates where at least one coordinate is > 0 (equivalently, all points except the quadrant where both are < 0).
  
  (5) P.5: In Equation (2), it is unclear what Q means from the main text. Would it be better to leave Equation (2) for the Appendix, and only show Equation (3) in the main text?
  
  Thank you for this suggestion. We kept Equation (2) in the main text to preserve the continuity of the derivation, but we now define Q(k,N<sub>i</sub>) explicitly at first use as the normalized exposure-weighting transfer function (following Grant 2015). The detailed derivation and assumptions remain in the Supporting Information.
  
  (6) P.6: “Remaining gaps”: this section considers differences between 200 keV and 300 keV electron beam energies. The main practical effect for cryo-EM data sets is that the current detectors are designed for detecting 300 keV electrons, and their DQE is thus a lot worse at 200 keV. The entire paper doesn’t mention detectors. Perhaps because they are assumed to be perfect, but it is still far from the case.
  
  Also, why were defocus searches not performed if the thickness of micrographs was up to 1500 A?
  
  The conclusion of this section states “Considering all these factors...”, but it then claims standard single particle analysis still remains an outstanding challenge. This concluding statement makes no sense, as this whole section was about 2DTM.
  
  Thank you for this comment. We agree and have revised the text to make these points explicit. First, we now state clearly that detector response (DQE) is generally more favorable at 300 keV than at 200 keV, which contributes to the experimental–theoretical gap. Second, we clarify why we did not perform a defocus search in 2DTM: after CTF/thickness filtering, the retained micrographs are predominantly in the thin-ice regime, so expected defocus spread is smaller, while adding a defocus dimension substantially increases computational cost. We also tested downstream refinement (including CTF/beam-tilt related refinement in cisTEM) and did not observe measurable improvement for this dataset (data not included in the manuscript). Finally, we revised the concluding sentence in this subsection to refer specifically to 2DTM-based alignment limits rather than standard SPA, so the section scope is now consistent.
  
  (7) P.7: Data-driven refinement of AlphaFold3 models: it might be worth pointing out that removing residues a few at a time from AF3 models and checking their reconstructed density by 2DTM would come at a considerable computational cost.
  
  We agree. We have demonstrated residue-level omission validation using the X-ray template via a composite omit map (Fig. 5), confirming that the approach is feasible. We have updated the Discussion to reflect this: extending the composite omit approach to AlphaFold3-based templates remains computationally expensive — each omission design requires an independent 2DTM search and downstream reconstruction — and we present this as an important direction for future work.
  
  (8) Figure 1: What is “full FSC” and what is “particle FSC”?
  
  Thank you for pointing this out. We have clarified the terminology in the figure legend and text using cisTEM and Frealign definitions (Grant et al., 2018). What was previously labeled “Full FSC” is now referred to as the uncorrected FSC (FSC<sub>uncor</sub>), computed within a generous mask. “Particle FSC” denotes the solvent-corrected FSC, obtained from FSC<sub>uncor</sub> using the mask-volume correction factor f as described in the cisTEM/Frealign framework (Grant et al., 2018).
  
  (9) Figure 3: Why were particles in class 5 discarded? The 2DTM approaches described in this paper are all about carefully selecting good particles, yet now the authors use standard 3D classification to throw away another 156 particles. This seems to be an arbitrary choice. How different would the results have been if these had been included in the reconstruction? Alternatively, did these few particles have any 2DTM metrics that would justify their exclusion?
  
  We thank the reviewer for raising this point. Class 5 contained only 156 particles (∼2% of the dataset). While the 2DTM p-value and SNR metrics provide principled criteria for particle selection, they are not perfect, and a small number of suboptimal particles may still pass these filters. To address the reviewer’s concern, we repeated the reconstruction including all five classes. The resulting map achieved a resolution of 3.7 Å, identical to the reconstruction without class 5, confirming that including these particles does not affect the results. We have clarified this point in the manuscript.
  
  (10) Figure 4C: What are the negative sample thicknesses here? Why use an inset?
  
  The negative sample thickness values are artifacts of the CTF-based thickness estimation algorithm in ctffind5. This algorithm fits oscillations in the 1-D power spectrum arising from the interaction between the CTF and the specimen’s finite thickness (a sinc-modulated envelope). When the ice is very thin or the power spectrum is noisy, the optimizer can converge to a physically meaningless negative value. Of the 2,488 total micrographs across both sessions (after CTF score filtering, 2,314 retained), 136 (∼5.9%) returned negative thickness estimates. We have revised Figure 1—figure supplement 1c (previously Figure 4c) to show only the physically meaningful positive thickness values without the inset, which gives a clearer view of the unimodal distribution peaked near 350–400 Å.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this manuscript, Zhang et al describe a method for cryo-EM reconstruction of small (sub50kDa) complexes using 2D template matching. This presents an alternative, complementary path for high-resolution structure determination when there is a prior atomic model for alignment. Importantly, regions of the atomic model can be deleted to avoid bias in reconstructing the structure of these regions, serving as an important mechanism of validation.
  
  The manuscript focuses its analysis on a recently published dataset of the 40kDa kinase complex deposited to EMPIAR. The original processing workflow produced a medium resolution structure of the kinase (GSFSC ∼4.3 Å, though features of the map indicate ∼6-7 Å resolution); at this resolution, the binding pocket and ligand were not resolved in the original published map. With 2DTM, the authors produce a much higher resolution structure, showing clear density for the ATP binding pocket and the bound ATP molecule. With careful curation of the particle images using statistically derived 2DTM p-values, a high-resolution 2DTM structure was reconstructed from just 8k particles (2.6 Å non-gold standard FSC; ligand Q-score of 0.6), in contrast to the 74k particles from the original publication. This aligns with recent trends that fewer, higher-quality particles can produce a higher-quality structure. The authors perform a detailed analysis of some of the design choices of the method (e.g., p-value cutoff for particle filtering; how large a region of the template to delete).
  
  Overall, the workflow is a conceptually elegant alternative to the traditional bottom-up reconstruction pipeline. The authors demonstrate that the p-values from 2DTM correlations provide a principled way to filter/curate which particle images to extract, and the results are impressive. There are only a few minor recommendations that I could make for improvement.
  
  We appreciate the positive assessment. In response to the bias-related concerns raised elsewhere, we have: (i) updated the template-bias metric Ω reported in Fig. 4, (ii) added grouped occupancy refinement showing that omitted residues 222–227 refine to a mean occupancy of 0.72 while template-included control residues remain near 1.0, and (iii) assembled a composite omit map (Fig. 5) from 36 partial-deletion reconstructions spanning the entire protein. These additions are described in the revised Results and in the rebuttal below.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) On page 3, “Finally, by comparing Figure 2a and b, we observed that deleting IP20 strongly reduced signal at several residues.” Looking at Figure 2a and 2b, it was unclear which residues they were referring to.
  
  We have revised the text to explicitly list the affected residues. In the updated Figure 2, we now label the omitted residues with the lowest backbone Q-scores in the structural views (column 2) and include per-residue backbone Q-score plots (column 4), making the comparison between panels (a) and (b) quantitative. For example, when IP20 is additionally deleted (Fig. 2b), residues Phe54, Gly55, Lys72, Glu127, Glu170, and Asp184 all fall below a backbone Q-score of 0.5, compared with only Ser53 and Glu127 in the within-3 Å deletion alone (Fig. 2a).
  
  (2) Figure 1a. Both the published density map and the text “Template” are gray, but the 2DTM template density map is yellow.
  
  Thank you for catching this inconsistency. We have updated Figure 1a so that the 2DTM template density is now rendered in gray, consistent with the X-ray crystal structure (PDB) coloring. The published single-particle map is shown in wheat and the 2DTM reconstruction in blue, providing a clear three-way color distinction.
  
  (3) Figure 1b. I would recommend the x-axis label of “spatial frequency” instead of “resolution” (which is overloaded). Furthermore, the fact that this is not a GSFSC should be clearly labeled in the figure to prevent confusion with a standard GSFSC.
  
  We agree with both suggestions. The x-axis has been relabeled “Spatial Frequency (1/Å)” in the revised figure. We have also added a note in the figure caption stating that these FSC curves are not gold-standard FSCs, as the reconstruction uses orientations determined by template matching rather than independent half-set refinement.
  
  (4) Figure 2: The usage of the negative sign in the labels “-3 Å”, “-5 Å” to indicate within a given radius is a bit confusing. “Within 3 Å”, perhaps?
  
  Thank you for this suggestion. We have changed the labels in Figure 2 from “−3 Å” and “−5.5 Å” to “Within 3 Å” and “Within 5.5 Å.” We have also added a fourth column to Figure 2 showing per-residue backbone Q-scores for each deletion experiment, with omitted residues distinguished by color and marker shape. The residues with the lowest backbone Q-scores among the omitted set are circled in red and correspond to the labeled residues in the structural views.
  
  (5) Figure 4c: Why does the sample thickness histogram go to negative values (-20,000 A)?
  
  As noted in our response to Reviewer 1, the negative thickness values are artifacts of the ctffind5 thickness estimation, which fits a sinc-modulated envelope to the 1-D power spectrum. For micrographs with very thin ice or noisy power spectra, the fit can converge to unphysical negative values. These account for ∼5.9% of micrographs. We have revised Figure 1—figure supplement 1 (originally Fig. 4c) to display only positive thickness values, removing the inset and providing a clearer histogram.
  
  (6) Figured 4d: Should the label be “(Before Filtering)” instead of After?
  
  Yes, thank you for catching this. The original Figure 4d was mislabeled—it showed particle counts before filtering but was titled “After Filtering.” We have corrected the labels: Figure 1—figure supplement 1d (originally Fig. 4d) now reads “Before Filtering” and Figure 1—figure supplement 1e (originally Fig. 4e) reads “After Filtering.”
  
  (7) Supplementary Note 1: Please provide units for d, p, D, and k max in equation S4 and the preceding text.
  
  We have added units to the text preceding Eq. S4: d = 1/k<sub>max</sub> is the high-resolution alignment limit (Å), k<sub>max</sub> is the maximum spatial frequency (Å <sup>−1</sup>), p = d/2 is the ideal pixel size (Å/pixel), and D is the particle diameter (Å).
  
  (8) What does the map-model FSC look like with the template as the model vs. the AF3 structure as the model?
  
  We have computed the map–model FSC for both the X-ray crystallographic template (PDB 1ATP) and the AlphaFold3-predicted template against their respective 2DTM reconstructions (Fig. Figure 6—figure supplement 1). Both curves cross the FSC = 0.143 threshold at ∼2.3 Å. We note that the map–model FSC in this context should be interpreted with caution, because the vast majority of the structure lies outside the omitted region and is present in the template, so template bias in those regions will dominate the map–model FSC and obscure differences in the small omitted region.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Due to the low SNR of cryo-EM micrographs necessitated by radiation damage, determining the structure of proteins smaller than 50 kDa is exceedingly challenging, such that only a handful have been solved to date. This work aims to improve the reconstruction of small proteins in single-particle cryo-EM by using high-resolution 2D template matching, an algorithm previously used to locate and align macromolecules in situ, to align and reconstruct small proteins. This approach uses an existing macromolecular structure, either experimentally determined or predicted by AlphaFold, to simulate a noise-free 3D reference and generates whitened projections, crucially including high-spatial-frequency information, to align particles by the orientation with maximal cross-correlation. They demonstrate the success of this approach by generating a 3D reconstruction from an existing dataset of a 41.3 kDa protein kinase that had previously evaded attempts at high-resolution structure determination. To alleviate concerns that this is purely from template bias, they demonstrate clear density at two regions that were not present in the template: 6 residues in an alpha helix and an ATP in the ligand binding pocket. The latter is particularly important for its implications in determining structures of ligand-bound proteins for drug discovery. Additionally, the authors provide an update to the classic calculation in Henderson 1995 to predict the minimum molecular mass of a protein that can be solved by single-particle cryo-EM.
  
  Strengths:
  
  I am in no doubt that this technique can be used to gain valuable insights into the structures of small proteins, and this is an important advancement for the field. The ability to determine the structure of ligands in a binding site is particularly important, and this paper provides a method of doing that which outperforms traditional single-particle cryo-EM processing workflows.
  
  The claim that using high-spatial frequency information is essential for aligning small proteins is a valuable insight. A recent pre-print published at a similar time to this manuscript used high-resolution information in standard ab-initio reconstruction to generate a high-resolution reconstruction from the same dataset, supporting the claims made in the manuscript.
  
  The theoretical section outlined in the appendix is also theoretically sound. It uses the same logic as Henderson, but applies more up-to-date knowledge, such as incorporating dose-weighting and altering the cross-correlation-based noise estimation. This update is valuable for understanding factors preventing us from reaching the theoretical limit.
  
  Weaknesses:
  
  Given that this technique creates template bias, only parts of the reconstruction not in the template can be trusted, unlike standard single-particle processing, where the independent half-maps from separate, ab initio templates are used to generate a 3D reconstruction. Although, in principle, one could perform the search many times such that every residue has been omitted in at least one search, this will be extremely computationally intensive and was not demonstrated in this manuscript. It is therefore currently only realistically applicable when only a small portion of the sub-50 kDa protein is of interest.
  
  The applicability of this technique to more than a single target was also not demonstrated, and there are concerns that it may not work effectively in many cases. The authors note in the results that “the ATP density was consistently recovered more robustly than nearby residues” and speculate that this may be because misalignments disproportionately blur peripheral residues. Since the region of interest in a structure is not necessarily in the center, this may need further investigation. The implications of this statement may also be unclear to the reader. For example, can this issue be minimized by having the region of interest centered in the simulated volume?
  
  In Figure 3, the authors demonstrate that it is not solely improved particle filtering and a noise-free reference that improves alignment, but that the high spatial frequency information is important. This information is very valuable since it can be applied to other, more standard methods. However, this key figure is not as clear or convincing as it could be. The FSC curves are possibly misleading, since the reduced resolution could be explained by reduced template bias when auto-refining with a map initially low-pass filtered to 10 A. Moreover, although the helix reconstruction does look slightly better using the 2DTM angles, the improvement in density for ATP in the binding pocket is not clear. A qualitative argument only clear in one out of two cases is not as convincing as a quantitative metric across more examples.
  
  We address these concerns in three ways: (i) we quantify template bias using Phenix real-space grouped occupancy refinement: omitted residues 222–227 refine to occupancies of 0.55–0.80 (mean 0.72) and ATP to 0.61, while template-included control residues 150–155 remain near 1.0 (mean 0.96), confirming that recovered density is genuine rather than a template artifact; (ii) we have now completed a composite omit-map experiment (Fig. 5), in which 36 partial-deletion templates, each omitting ∼10 non-overlapping residues, were used to perform independent 2DTM searches and reconstructions; local density patches from all 36 reconstructions were assembled into a composite map showing density recovery at distributed locations across the protein, including peripheral and surface-exposed regions, although recovery is variable across sites; and (iii) we have expanded the discussion to clarify that, while the primary scope of this work is omitted-region validation for the ligand-binding site, the composite omit-map result demonstrates that the approach generalizes beyond the central pocket.
  
  Reviewer #3 (Recommendations for the authors):
  
  In addition to the comments on the public review, I have some more specific suggestions that could improve the manuscript.
  
  (1) Another recent pre-print posted on BioRxiv shortly before this manuscript (Kim et al. Highresolution ab initio reconstruction enables cryo-EM structure determination of small particles) determined a high-resolution structure of the same protein from the same dataset, as well as determining the structures of other small proteins. Since both manuscripts rely on high-spatial frequency information, I think that the paper strengthens the claims in this manuscript and should be cited.
  
  We thank the reviewer for this suggestion. We agree that the recent preprint by Kim et al. strengthens the relevance of high-spatial-frequency information for small-particle cryo-EM reconstruction. We have now added this work to the revised manuscript and included a brief discussion comparing its ab initio strategy with our 2DTM-based approach.
  
  (2) The claim in the abstract that “we were able to reconstruct previously intractable targets under 50 kDa and improve the density of the ligand-binding sites in the reconstructions” should be altered to make it clear that this is only a single previously intractable target.
  
  We agree. The revised abstract now reads “. . . we reconstructed a previously intractable ∼43 kDa kinase complex and improved the density of its ligand-binding site” making clear that a single target is demonstrated in this work.
  
  (3) Q-scores in the manuscript were sometimes used to quantify the improvement in map to model fit for the ATP binding pocket, but never for the 6 residues of the alpha helix. They were also not reported in every case for the ATP-binding pocket. This could lead a reader to think it is only being reported when the Q-score matches the expectation. For transparency, I would suggest either using Q-scores in every comparison or in no cases and simply relying on the qualitative result.
  
  We agree with the reviewer. In the revised manuscript, we now report Q-scores consistently for both ATP and residues 222–227 across all conditions: individual residue Q-scores for the omitted residues 222–227 in Fig. 1 are reported in the main text and figure caption; per-residue backbone Q-score plots for all deletion experiments in Fig. 2 are shown as the fourth column of each panel; Fig. 3 (RELION reconstruction) does not include Q-scores as the focus is on orientation accuracy rather than map-model fit; and average Q-scores for all four particle selection conditions in Fig. 4 are listed in Figure 4—source data 1.
  
  (4) The sigma values used for viewing the maps should also be stated in several figures, particularly Figure 3 and Figure 6.
  
  We have added contour levels (σ) to the captions of Fig. 3 and Fig. 4 (originally Fig. 6) in the revised manuscript.
  
  (5) I have a slight concern about how well this method applies away from the region centered in the alignment. If parts on the periphery of the structure are removed, do these also reconstruct? Is it required that the omitted region be centered in the simulation of the 3D volume for each alignment? If so, this should be clearly stated.
  
  2DTM determines particle orientations by matching the full projected template to the image, so alignment is driven by the global structure rather than a localized region. As a result, the recovered orientations define the reconstruction throughout the entire particle, not only near the center. The omitted region does not need to be centered in the template volume. Any region of the protein can be omitted and its density evaluated after reconstruction.
  
  To directly test whether peripheral regions are recovered in the same manner as central ones, we performed a composite omit-map experiment. We generated 36 omit templates, each deleting ∼10 non-overlapping residues distributed across the entire protein, including peripheral and surface-exposed regions. For each template, an independent 2DTM search and reconstruction was performed. Local density patches corresponding to the omitted regions were then extracted and assembled into a composite map (Fig. 5). The resulting map shows density at distributed locations across the protein, indicating that density recovery is not restricted to regions near the alignment center and that peripheral regions can be reconstructed under the same alignment framework, although the quality of recovery varies across sites.
  
  (6) I was confused by the difference between the FSCs in Figure 1 and Figure 3. I understand Figure 1 is from cisTEM and Figure 3 from RELION, but I expected the unmasked FSC and full FSC to be similar. Do the authors have any insights into why there is such a large difference? I would also consider removing the FSCs in Figure 3, since the reduced resolution may only be due to reduced template bias, meaning including this may be misleading.
  
  Thank you for raising this point. The apparent discrepancy arises from multiple differences between the two figures: different FSC definitions, different half-maps (reconstructed with different software and slightly different particle sets), and different masks.
  
  In cisTEM (Fig. 1), two FSC curves are reported: the uncorrected FSC (FSC<sub>uncor</sub>), measured within a spherical mask, and the “Particle FSC”, which applies an analytical solvent-fraction correction (Grant et al., 2018) to account for solvent dilution within the mask. The Particle FSC crossed the 0.143 threshold at ∼2.6 Å, whereas FSC<sub>uncor</sub> crossed at ∼3.0 Å. In Fig. 3, RELION postprocess applied phase-randomization correction with a soft mask, yielding ∼3.1 Å. However, the Fig. 3 FSC was computed on different half-maps (RELION skip-alignment reconstruction of 7,197 particles after 3D classification) with a different mask.
  
  To directly compare the two packages, we computed the FSC on the same cisTEM half-maps using both methods (Figure 3—figure supplement 1). The cisTEM Particle FSC (spherical mask + solvent correction) gave ∼2.6 Å, while RELION image handler with a tight 3D protein mask gave ∼2.7 Å. These two approaches converge to a similar resolution through different mechanisms: cisTEM compensates for a generous spherical mask using the solvent-fraction correction, while RELION uses a tight mask that excludes most solvent directly. This confirms that when the same half-maps are used, the two packages give consistent results and the apparent discrepancy between Figs. 1 and 3 is primarily due to differences in the reconstruction and particle set, not the FSC calculation.
  
  We agree with the reviewer that the FSC values in Figure 3 should be interpreted with caution. In this case, the particle orientations are not independently refined but are instead inherited from the 2DTM alignment, so the two half-maps are not strictly independent. We have added clarifying language in the revised manuscript to make this point explicit (Fig. 1 caption).
  
  (7) I would also like to see how RELION auto-refinement performs with different low-pass filtering. This could strengthen the argument that high-resolution information is necessary from the start to successfully align small particles.
  
  We thank the constructive suggestion from the reviewer. We performed RELION auto-refinement on the same 7,197-particle stack using different initial low-pass filter resolutions (--ini high) of 3, 5, 10, and 15 Å. The resulting post-processed resolutions were:
  
  Author response table 1.
  
  The results show that varying the initial low-pass filter has minimal effect on the final resolution. This is expected because RELION uses a gold-standard, maximum-likelihood framework in which the resolution used for alignment is determined iteratively from the data via a probability distribution, rather than being fixed by the initial reference. After the first iteration, the reference is updated from the data, and higher-resolution information is incorporated only to the extent supported by the definition of the current reconstruction. Consequently, differences in the initial low-pass filter have limited impact on the final refinement outcome.
  
  This behavior contrasts with 2DTM, where alignment is performed by direct cross-correlation against a fixed template. In this case, high-resolution features in the template contribute directly to the scoring function and can improve alignment accuracy.
  
  To directly test the importance of high-resolution information for 2DTM alignment, we performed an additional experiment in which 2DTM was run on bin4x images (2.234 Å/pixel), and the detected particle coordinates were used to extract particles from the corresponding bin2x images (1.117 Å/pixel) for reconstruction. Despite using the same bin2x images for reconstruction, the bin4x-aligned particles yielded a map in which ATP density was lost and backbone density for residues 222–227 was visibly degraded compared to the bin2x-aligned reconstruction (Fig. Figure 1—figure supplement 3). This demonstrates that access to high-spatial-frequency information during template matching is critical for accurate alignment of small particles.
  
  (8) The caption in Figure 3 should be more descriptive about what is being shown in each panel.
  
  We have substantially expanded the Figure 3 caption. It now describes each panel explicitly: (a) 3D classification results with particle counts, percentages, and per-class resolutions; (b) side-by-side comparison of reconstructions using 2DTM orientations versus RELION auto-refine, including full maps, zoomed binding-pocket views with the atomic model overlaid, orientation distributions, and FSC curves with reported resolutions; and (c) a table of RELION auto-refinement resolution as a function of the initial low-pass filter setting. We also added a new panel (c) showing that including all five classes yields the same 3.7 Å resolution, addressing the concern about Class 5 exclusion.
  
  (9) Figures 4 and 5 may be better suited as supplementary figures.
  
  We agree. Figures 4 and 5 have been moved to the Supplementary Information in the revised manuscript.
  
  (10) In Figure 4c, it is difficult to understand why the thickness distribution plot goes negative, especially to such a high magnitude as 1.5 microns.
  
  We agree this was confusing. The negative values are fitting artifacts from ctffind5’s thickness estimation, which fits a sinc-modulated envelope to the power spectrum. When the ice is very thin or the spectrum is noisy, the optimizer can converge to unphysical negative values (affecting ∼5.9% of micrographs). We have revised Figure 1—figure supplement 1c (previously Figure 4c) to show only positive thickness values, which now clearly displays the unimodal distribution peaked at 350–400 Å.
  
  (11) In Figure 5d, the micrograph looks a lot like a cross-grating grid used for calibration instead of crystalline ice or a fractured film.
  
  We agree. We have updated the caption for Figure 1—figure supplement 2d (originally Figure 5d) to read “Cross-grating calibration grid”
  
  (12) Figure 6 was very surprising to me if I am interpreting it correctly. It is not stated in the caption what omega is, but I am assuming it is a measurement of template bias. It is very surprising that the template bias drops when using more particles by reducing the p-value from 8.0 to 7.0. This goes against what I understood from Lucas et al. 2023, so I am curious as to why this is the case.
  
  We thank the reviewer for this question and apologize for the unclear presentation. We have revised Fig. 4 (previously Figure 6) and its caption to define Ω explicitly and updated the Ω values. We also identified that the mask used in the original computation was too loose; the revised mask is now constrained to the omitted region only (ATP, Mn<sup>2+</sup>, and residues 222–227), derived from the difference between the full and omit templates and shown in Figure 4—figure supplement 1. Ω is adapted from the template-bias metric introduced in (Lucas et al., 2023) and measures how much of the density in the omitted region is attributable to using the full template rather than the omit template. Specifically, for each particle selection condition we reconstruct two maps using orientations and particles derived from independent 2DTM searches with the full and omit templates (V<sub>full</sub> and V<sub>omit</sub>, respectively). Ω is the fractional reduction in density within the omission mask: . In the revised Fig. 4, Ω increases from 46% (p-value = 8.0) to 48% (p-value = 7.0), consistent with the expectation that including more, lower-quality particles increases the relative contribution of the template to the reconstruction. The Ω values are 48% for the SNR = 7.5 and 53% for the tilt conditions.
  
  (13) It would be useful if the in-house Python script used to calculate template bias could be made publicly available.
  
  We agree. The template-bias calculation (measure-template-bias) is now included in the publicly available Python package at https://github.com/kekexinz/2DTM_postprocess_tool, and can also be accessed in the official cisTEM repository at https://github.com/timothygrant80/cisTEM. The package also contains the extract-particles and filter-particles tools described in the Methods section.
  
  (14) The p-value used is said to be a three-quadrant p-value instead of a one-quadrant p-value. Although I assume this is simply replacing an ‘and’ statement with an ‘or’ statement, the exact difference could be made clearer to the reader.
  
  We have now defined these terms explicitly in the revised Methods. After probit transformation of z-score and SNR, the first-quadrant (1Q) p-value requires both values to be > 0 (logical AND), whereas the three-quadrant (3Q) p-value requires at least one to be > 0 (logical OR). The 3Q criterion is therefore looser, retaining more candidates—which is beneficial for small targets that may score well on one metric but not both.
  
  (15) I was, perhaps naively, surprised that z-scores could not be used. It was my understanding that by removing the rotationally invariant component from the cross-correlation, the z-score would down-weight low-resolution information compared to the cross-correlation. Given that the manuscript suggests low-resolution alignment can cause getting stuck in local minima, this is surprising to me. The authors note it led to the rejection of most particles; were there simply too many false positives when a lower threshold was used?
  
  The reviewer is correct that subtracting the angular mean removes the rotationally invariant component of the cross-correlation. However, the resulting z-score primarily measures how strongly a specific orientation stands out relative to other orientations. In other words, it reflects the orientation discriminability (closely related to Fisher information) rather than the absolute correlation strength. For small particles the cross correlation often varies only weakly across orientations, so CC<sub>max</sub>− CC<sub>avg</sub> remains small even when the absolute correlation is significant. As a result, using the z-score alone as a selection criterion led to the rejection of many true particles.
  
  Theoretical Section Improvements
  
  (a) The discussion on beam-induced motion could be improved by separating it into initial motion (e.g., cryo-crinkling, buckling) that can be eliminated through grid design, and pseudo-Brownian motion, which cannot. Pseudo-Brownian motion will become much more significant for small proteins (based on reference 5, for a 10 kDa protein, this would be a MSD of ∼0.1 A 2/e−/A 2, or a B-factor of over 2 A 2/e−/A 2), and Bayesian Polishing is unlikely to correct this perfectly, given that it imposes a smoothness of motion between nearby particles. The impact of not correcting for this could be quantified more explicitly.
  
  We thank the reviewer for this helpful suggestion. As noted, pseudo-Brownian motion of particles within irradiated ice introduces stochastic displacements that accumulate with dose and are expected to be more significant for small particles. Based on the analysis in (Mcmullan et al., 2015), and scaling with particle size, this effect can be aproximated as a dose-dependent mean-squared displacement (MSD) of ∼0.1 Å<sup>2</sup> per (e<sup>−</sup>/Å<sup>2</sup>) for a ∼10 kDa particle. Over a typical total exposure of 40–60 e<sup>−</sup>/Å<sup>2</sup>, this corresponds to an accumulated RMS displacement of ∼2–2.5 Å, sufficient to attenuate high-resolution signal.
  
  In practice, such motion acts as an additional high-frequency attenuation in Fourier space, analogous to an envelope function, reducing the coherent signal available for template matching. While Bayesian polishing can partially correct beam-induced motion, it assumes spatially smooth trajectories between nearby particles and therefore may not fully compensate for stochastic, particle-specific motion.
  
  Within the theoretical framework presented here, this effect can be interpreted as an additional frequency-dependent damping of the signal (B-factor). Its primary consequence would be to reduce the effective signal-to-noise ratio at high spatial frequencies and therefore shift the detectable molecular-weight limit somewhat upward, without altering the structure of the derivation. We have added text in the manuscript to clarify this point and to indicate the expected magnitude of this effect.
  
  (b) The inclusion of inelastic scattering assumes an energy filter is being used, and this should be clearly stated.
  
  We have added this clarification in the inelastic scattering paragraph of the Supplementary Information.
  
  (c) The reasons for not including other factors, such as DQE and the temporal and spatial coherence envelope functions, could be stated.
  
  We have added a note in the dose-weighting section clarifying that these instrument-dependent attenuation factors were not explicitly included, and that they could be incorporated as additional frequency-dependent weighting terms without changing the structure of the derivation.
  
  (d) The flexibility and heterogeneity in protein structures, especially at high spatial frequencies, must also be a reason for a gap from experiment to theory, but this is not clearly stated.
  
  We agree. We have added a statement in the “Remaining gaps” section noting that structural flexibility and conformational heterogeneity act as an additional envelope that attenuates high-resolution signal relative to the rigid-particle model assumed in our derivation.
  
  Additional Minor Comments
  
  (15) It is noted in the discussion that 2DTM-based single-particle alignment simplifies the processing pipeline. Although true, I think stating the computation time would be useful for the reader.
  
  We have added computation times to the Discussion. For a typical single-particle dataset of ∼2,000 micrographs (5k × 4k pixels), a 2DTM search without defocus refinement completes in approximately one day on 64 NVIDIA A6000 GPUs. Once particles are located with their orientations and positions, a single 3D reconstruction is sufficient without further refinement, eliminating the iterative 2D classification, ab initio modeling, 3D classification and refinement steps of a conventional pipeline.
  
  (16) There are some formatting issues with e−/A 2, sometimes losing the minus sign.
  
  Thank you for catching this. We have corrected all instances to consistently use e<sup>−</sup>/Å<sup>2</sup> throughout the manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.11.675606v2
www.biorxiv.org www.biorxiv.org

Serotonergic modulation of motor subspace dynamics drives a sleep-independent quiescent state

1
1. Public_Reviews 16 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  In response to the reviewers’ comments, we have made revisions to the manuscript. Specifically, we have:
  
  (1) Increased the sample size in the whole-brain imaging and demixed principal component analysis (dPCA) analyses presented in Figures 1 and 3, strengthening the statistical support for our conclusions;
  
  (2) Revised the presentation of Figure 3B to clarify that the displayed dPC1 traces were scaled for visualization purposes only (dPC1 / max(dPC1)), rather than normalized for quantitative comparison across animals;
  
  (3) Expanded the main text and supplementary figures to provide more intuitive explanations and geometric illustrations of dPCA and hyperbolic space analysis, and clarified the interpretation of correlation matrices and principal-angle analyses to improve readability;
  
  (4) Substantially expanded the sections on Bayesian multidimensional scaling and hyperbolic embedding, including additional methodological details and validation analyses to strengthen the computational framework and its interpretation;
  
  (5) Expanded the Discussion to incorporate recent studies and discuss potential mechanisms underlying DRN 5-HT-mediated motor suppression.
  
  We believe that these revisions have substantially strengthened the manuscript and addressed the major concerns raised during peer review.
  
  Reviewer #1 (Public review):
  
  The wide-ranging serotonergic projections emerging from the Dorsal Raphe nucleus (DRN) are suggestive of a central role in regulating brain-wide activity and behavioural states. DRN activity has been associated with diverse functions, ranging from mood, motivation and pain regulation to sleep and cognitive flexibility. Its far-reaching connectivity made it challenging to assess the brain-wide effect of its activation, especially during behaviour.
  
  The present study by Qi et al. addresses these challenges by combining state-of-the-art tracking microscopy with the whole-brain accessibility of the larval zebrafish model. To investigate the effect of DRN activation, the authors leveraged the Tg(tph2:ChrimsonR) line to optogenetically activate tph2-positive neurons in the DRN, while monitoring changes in brain-wide activity, locomotion and auditory-stimuli evoked responses.
  
  Optogenetic activation had a suppressing effect on locomotion, which the authors distinguished from inducing sleep by the maintenance of posture and its sleep disturbing effect of nighttime stimulations. Further, the authors report a distinct effect of DRN activation on motor-related, but not auditoryrelated neuronal subspaces, identified by demixed principal component analysis.
  
  In addition, rather than affecting all motor-correlated neurons similarly, tph2+ DRN-mediated suppression focused on neurons encoding high-amplitude or turning motion.
  
  In summary, the work of Qi et al. provides solid evidence for a predominant role of the DRN in wake-state motor suppression by aptly combining the vast data-acquisition possibilities of the larval zebrafish model with computational methods to extract relevant information.
  
  The brain-wide scope of the analysis is a key strength, reducing bias, confirming the involvement of known motor and auditory regions, and providing a valuable dataset for future analyses.
  
  While the results well support the conclusion of the authors, certain biological and technical aspects demand discussion.
  
  We thank you for the positive and thoughtful evaluation of our work. We also appreciate your constructive comments on the biological and technical aspects of the study. We have carefully considered these concerns and addressed them point-by-point below, with corresponding revisions to the manuscript.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Further samples required:
  
  Figure 1D relies on n=3 with lots of variability; the author should add more Ns to illustrate their point (typically 10-15 fish used per study to show reliability across fish).
  
  Figure 3 also relies only on 5 fish in each condition; the authors should increase to 10-15 to show variability.
  
  Thank you for this valuable suggestion. To address this concern, we have increased the sample size in the revised manuscript. Specifically, the number of animals in Figure 1D has been increased from n = 3 to n = 5, and additional statistical analyses have been included to strengthen the quantitative support for our conclusions. Note that the error bars are plotted as standard deviation (SD), which may make the variability appear larger. In Figure 3, the number of animals was also increased from n = 5 to n = 8.
  
  In addition, our findings are consistent with previous work showing a strong association between elevated dorsal raphe nucleus (DRN) activity and reduced locomotion in zebrafish [1, 2, 3]. Importantly, across animals, the variance explained by the dPCA components and the rapid modulation of whole-brain state remain highly consistent, supporting the robustness and reproducibility of our observations.
  
  Given this increased sample size together with consistency across animals and convergence with prior studies, we believe the current dataset provides sufficient statistical and biological support for our conclusions.
  
  (2) Further steps to be added to the analysis to fully support the claim:
  
  It appears that the individual brains are registered and individually clustered into areas by combining highly-correlated nearby neurons.
  
  dPCA is then computed for individual brains. Evidence for our interpretation of individual dPCA spaces:
  
  (1) Figure 3A depicts separate dPCs for different fish.
  
  (2) Line 488–489 describes normalization of the value range of dPCs to compare across fish, which implies separate dPCs.
  
  While the authors normalize the projections onto the principal components, the dPCA spaces remain individual, as does the meaning of their components. It is thus questionable how to conclude from data across fish in a rigorous manner.
  
  Instead, we recommend that the authors build voxels for each individual’s brain and calculate dPCA across all brains, not individual ones, so that components could become truly comparable across the brains of given individuals.
  
  We thank the reviewer for this important comment. We would like to clarify that our analysis does not aim to construct a shared dPCA space across animals or to quantitatively compare dPC scores between individuals. In this analysis, dPCA was performed separately for each fish to capture the dominant low-dimensional population dynamics within each individual brain.
  
  The purpose of Figure 2 is to demonstrate that DRN activation induces a rapid and robust transition in whole-brain activity, rather than to define a common population subspace across animals.
  
  We also attempted to register and pool data across animals for a joint analysis, as suggested by the reviewer. However, our dataset includes zebrafish at slightly different developmental stages (6–12 dpf). Although the behavioral effects of DRN activation (including motor suppression and global brain-state modulation) were robust across this age range, developmental differences introduced substantial anatomical variability in brain size and morphology, which reduced registration accuracy and made voxel-wise correspondence across animals unreliable.
  
  We realize that our previous description of “normalization” may have caused confusion. To clarify, the dPC1 traces shown in Figure 2 were only scaled for visualization by dividing each fish’s projection by its maximum value (dPC1 / max(dPC1)), so that trajectories from different fish could be displayed on the same axis. This scaling does not alter the underlying dPCA space, does not constitute normalization for cross-animal comparison, and was not used for any quantitative analysis.
  
  Importantly, despite being computed independently for each fish, we observed a consistent temporal pattern across animals: DRN activation was reliably accompanied by a rapid transition captured by dPC1 in each individual fish. We have revised the Methods and corresponding text in the manuscript to make this distinction explicit and avoid ambiguity.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors examine the effects of activating the dorsal raphe nucleus serotonergic system using a combination of calcium imaging and optogenetics in freely moving larval zebrafish. Their findings show that optogenetic stimulation induces a state of behavioral quiescence.
  
  They further investigate whether this state corresponds to sleep or reduced motor activity. Analyses of posture and sleep-related paradigms indicate that serotonergic activation primarily suppresses motor output rather than promoting sleep. Notably, this suppression appears to be bout type-dependent, with stronger effects on neurons associated with larger tail amplitudes and turning angles.
  
  In addition, auditory stimulation experiments reveal no significant impact of serotonin on sound encoding.
  
  We thank the reviewer for the careful and thoughtful summary of our work.
  
  Strengths:
  
  The study combines advanced experimental techniques with state-of-the-art analytical methods, enabling precise and compelling insights into the role of serotonergic modulation. The experiments and analyses are well aligned with the questions being addressed, and the results appear robust and reliable.
  
  Moreover, the implementation of experiments that combine calcium imaging and optogenetics in freely moving animals is technically challenging and appears well justified in the context of the research questions.
  
  We thank you for the positive assessment of our work and for recognizing the technical and analytical strengths of our experimental approach.
  
  We address the reviewer’s specific comments in detail below.
  
  Weaknesses:
  
  While the analytical techniques employed are sophisticated and appear to be appropriately applied, their presentation makes the manuscript difficult to follow. Although the explanations are provided in the Methods section, including more guidance in the main text, such as how to interpret each analytical approach and what outcomes would be expected under different scenarios, would help readers who are less familiar with these techniques.
  
  Providing this context would better guide the reader in navigating the figures, broaden the accessibility of the work, and ultimately increase its impact.
  
  We thank you for this important suggestion. To improve clarity and accessibility, we have revised the main text to provide more intuitive explanations of both demixed principal component analysis (dPCA) and hyperbolic space analysis, with additional emphasis on how to interpret their outputs and what different outcomes imply biologically.
  
  Additionally, we have included new supplementary figures (Figure S2 and Figure S6) with geometric illustrations and simplified examples to provide a more visual and conceptual understanding of these methods. We hope these revisions make the analytical framework easier to follow and improve the accessibility and impact of the manuscript.
  
  While the authors discuss different quiescent states mediated by serotonin reported in previous studies, their interpretation is limited to stating that “a common feature shared by these distinct behavioral states is a pronounced reduction in movement,” and consequently proposing that activation of dorsal raphe nucleus is not sufficient to specify a particular behavioral state, but rather plays a primary role in driving motor suppression.
  
  In my view, a more thorough attempt to determine whether the observed state corresponds to any of the previously described forms of quiescence, or represents a subset or variant of them, would strengthen the manuscript. This would help better integrate the findings with the existing literature.
  
  For example, given that the authors have access to whole-brain activity data, it would be valuable to examine and discuss whether there are shared patterns of activation with previously reported quiescent states.
  
  Thank you for the insightful suggestion. To address this, we compared our whole-brain activity patterns with key neural signatures reported in previously characterized zebrafish quiescent states.
  
  A recent study reported that exposure to conspecific alarm substance (CAS) induces a quiescent but vigilant state associated with elevated DRN 5-HT activity and low-frequency synchronized forebrain activity [3]. In our dataset, although DRN 5-HT activation similarly induced robust locomotor suppression, we did not detect comparable low-frequency synchronized forebrain dynamics during the stimulation period. These results suggest that while DRN 5-HT activation is sufficient to induce motor suppression, it does not recapitulate the full neural signature of CAS-induced vigilant quiescence. We have incorporated this comparison and its interpretation into the Discussion section of the revised manuscript.
  
  Following the termination of optogenetic stimulation, we observed a gradual recovery of locomotory speed, consistent with the behavior in an earlier study [3], although our recovery was much faster. Interestingly, whole brain imaging also revealed a transient increase in forebrain activity. This elevated forebrain activity gradually returned to baseline as locomotor activity recovered. In accordance with the reviewer’s suggestion, we propose that these forebrain dynamics represent a common motif that facilitates the transition out of the DRN-induced quiescent state (Author response image 1.).
  
  The manuscript largely avoids discussing the mechanisms underlying the observed motor suppression. For instance, is this effect driven directly by serotonin release onto target neurons? Is it mediated by glial activity, as suggested in other studies? Are additional neuromodulatory systems being recruited?
  
  While addressing these questions may require substantial further work, potentially beyond the scope of the present study, the availability of whole-brain data provides an opportunity to at least explore or
  
  Author response image 1.
  
  Forebrain activity increases following termination of DRN optogenetic stimulation. (A) Following the termination of optogenetic stimulation of DRN 5-HT neurons, locomotor speed in Tg(tph2:ChrimsonR) zebrafish gradually recovered and returned to control levels. (B) Neural activity in forebrain regions showed a transient increase immediately after stimulation offset and gradually returned to baseline as locomotor activity recovered. discuss these possibilities. In particular, it would be interesting to examine the recruitment of regions not directly stimulated but known to be associated with other neuromodulatory systems or promoting glial activation (e.g., the locus coeruleus).
  
  We thank you for this important suggestion. In the revised Discussion, we now frame our findings in relation to several candidate mechanisms.
  
  Our results are most consistent with a direct neuromodulatory action of serotonin on downstream motor-related circuits. This is supported by the known projection patterns of DRN 5-HT neurons [4], which target midbrain and hindbrain regions involved in motor control, as well as by prior serotonin imaging studies showing elevated 5-HT levels in hindbrain regions during low-motor states, where inhibitory HTR1-family receptors are enriched [5]. In addition, recent voltage imaging studies have shown that DRN serotonergic neurons are embedded within a broader motor-state-dependent circuit, in which they are dynamically regulated by local GABAergic inputs [6]. We have incorporated a discussion of these potential mechanisms into the revised Discussion.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Lines 91-97 page 2.
  
  “dPCA separates neural population activity into components tied to specific experimental variables, allowing us to isolate DRN-dependent changes (Methods). Components associated with DRN activation explained significantly more variance in Tg(tph2:ChrimsonR) zebrafish than in controls (Fig. 3A), indicating a strong serotonergic impact on brain-wide neural activity. The small stimulation-related variance in controls likely reflected visual responses to laser.”
  
  Directly stimulated neurons are not included, as stated in the Methods, but I think it would be better to mention this explicitly in the main text.
  
  We thank you for this helpful suggestion. We agree that explicitly stating this point in the main text improves clarity. In our analysis, neurons directly stimulated by the laser were excluded (as described in the Methods) to ensure that the identified components reflect whole brain responses rather than direct optogenetic activation. We have now added a clarifying sentence in the Results section to make this explicit.
  
  (2) Lines 113 - 115 page 3.
  
  “To examine how DRN 5-HT neuron activation affects sensorimotor processing (Fig. 4C), we next recorded whole-brain neural activity in head-fixed, tail-free larvae embedded in agarose to capture transient calcium signals with minimal motion artifacts.”
  
  Lines 117-119 page 3.
  
  “Because head-fixed larvae rarely enter natural sleep, we applied 1 mM mepyramine, a sleep-promoting antihistamine, to induce a sleep-like state (41), which markedly changed auditory responses (Fig. 4E, Fig. S2C)”
  
  Why not perform these experiments in freely moving fish instead? To what extent do movements in freely moving animals affect segmentation? Is it actually problematic to apply dPCA in that case? You used it in the previous section.
  
  We thank the reviewer for raising this important point. In principle, freely moving preparations would provide a more natural behavioral context. However, reliable application of dPCA requires stable neuron identification and accurate trial alignment across time, both of which are substantially compromised in freely moving larvae due to motion-induced imaging noise and segmentation errors.
  
  In our hands, whole-brain calcium imaging in freely moving fish introduces significant variability in segmentation and signal extraction, which in turn leads to unstable and noisy low-dimensional decompositions, preventing robust estimation of task-related components. By contrast, the head-fixed preparation enables consistent neuron tracking and precise alignment to sensory stimuli, which are critical for dPCA.
  
  We have now clarified in the manuscript that all dPCA analyses were performed on head-fixed animals.
  
  (3) Line 117 page 3.
  
  Why do you use cosine similarity? Are the results different when using other metrics?
  
  I can see the matrix, but what exactly are you looking for in it to support the claim ”DRN activation preserved the structure of the auditory population code”? I think explaining some of these concepts more clearly, or at least providing expectations or interpretations for the different metrics and analyses, would make the manuscript easier to follow.
  
  We thank you for this question. Cosine similarity is widely used to quantify similarity between population activity patterns because it captures relative activity across neurons while ignoring overall gain.
  
  In our analysis, each trial is a population activity vector, and the cosine similarity matrix encodes pairwise relationships between these vectors. We assess preservation of the auditory population code by testing whether this similarity structure (i.e., the geometry of population responses) remains consistent across conditions. We have expanded the text to clarify how these matrices are constructed and interpreted.
  
  In addition, we computed alternative similarity measures based on Pearson correlation, which is equivalent to the cosine similarity of two vectors after they have been centered (subtracting the mean of each vector) (Author response image 2A). We further quantified pairwise trial distances using the Euclidean chord distance on the unit hypersphere, defined as
  
  D<sub>ij</sub> = √2(1−C<sub>ij</sub>), where C<sub>ij</sub> is Pearson correlation; smaller distances indicate higher similarity (Author response image 2B). Both alternative measures yielded qualitatively consistent results, showing that DRN 5-HT neuron activation preserves the similarity structure across trials.
  
  (4) Figure 4D.
  
  If “significant alignment between DRN activation and motor-related neural subspaces, with the sound related subspace being nearly orthogonal” is correct, shouldn’t there be some visible overlap between blue and red, and little to no overlap with yellow? This is not easy to see. Perhaps plotting all three in a single panel would help.
  
  We thank you for this helpful suggestion. We would like to clarify that the “alignment” we refer to is defined in terms of the angle between neural subspaces, rather than the spatial overlap of neurons. In other words, significant alignment indicates that the corresponding population activity patterns occupy similar directions in a high-dimensional activity space.
  
  As a result, even statistically significant aligned subspaces (see further exposition below) do not necessarily involve overlapping sets of neurons with large PC weights. This distinction is important because subspace geometry is defined at the population level and cannot be directly inferred from spatial overlap in low-dimensional visualizations. In addition, the visualization shown in Fig. 4D highlights only brain regions containing neurons with relatively high weights for illustrative purposes.
  
  We also note that the current visualization is based on a maximum intensity projection of a 3D volume, which can create the appearance of overlap in two dimensions even when the underlying neurons are spatially segregated in three dimensions. To provide a clearer spatial reference, we have re-plotted the three subspaces in a three-dimensional representation.
  
  (5) Figure 4F.
  
  Do the arrows represent the values for each combination? This is not clear to me. Perhaps it could be clarified in the paragraph. Most of the values, including those being compared, are around 87 plus minus 2 degrees, i.e., mostly orthogonal. Does this imply no overlap between patterns (again, this is hard to see in Figure 4D)? The values are different from the null model but still close to orthogonal. The phrase “significant alignment between DRN activation and motor-related neural subspaces” could be interpreted as strong alignment, but the values do not seem to support that, do they?
  
  Author response image 2.
  
  Alternative similarity measures reveal preserved trial-to-trial similarity structure. (A) Trial-by-trial similarity matrix quantified using Pearson correlation. Higher correlation indicates greater similarity between trials (B) Pairwise trial distances quantified using the Euclidean chord distance on the unit hypersphere (D<sub>ij</sub> = √2(1−C<sub>ij</sub>)), where smaller distances indicate greater similarity between trials.
  
  Author response image 3.
  
  Three-dimensional visualization of DRN activation-, motor-, and sound-related subspaces. Threedimensional rendering of the high-weight neurons in the DRN 5-HT activation, motor-related, and sound-related subspaces. Colors are consistent with Figure 4D.
  
  We thank the reviewer for this important clarification.
  
  We agree that the phrase “alignment” could be interpreted as implying strong spatial overlap in the anatomical space, which is not what we intend to convey. In our analysis, “alignment” refers to a statistically significant deviation from a null distribution.
  
  In high-dimensional spaces, random vectors are expected to be nearly orthogonal, with angles tightly concentrated around 90°. To demonstrate this phenomenon, we conducted simulations using random vectors over a range of dimensionalities (100–10,000 dimensions) and observed that the expected angle distribution over 1000 trials becomes progressively more concentrated around 90° as the dimensionality increases (Author response image 4). Therefore, even modest deviations from 90° reflect a systematic bias and indicate structured overlap beyond chance. So, “significantly aligned” means the motor–DRN angle is significantly less than the random baseline, and “significantly orthogonal” for sound–DRN means the angle is significantly closer to 90° than the random baseline. We will revise the text to clarify this point and avoid potential misinterpretation.
  
  Regarding Figure 4D, we agree that the meaning of the arrows was not sufficiently clear. The arrows represent the mean angle, computed across all fish, between the DRN 5-HT activation subspace and the motor-related subspace (left), and between the DRN 5-HT activation subspace and the sound-related subspace (right). We will update the figure legend to explicitly define these elements.
  
  Author response image 4.
  
  Random vectors become increasingly orthogonal in high-dimensional spaces. Simulated distributions of pairwise angles between random vectors across different dimensionalities (100–10,000 dimensions; 1000 repetitions per dimensionality). As dimensionality increases, the angle distribution becomes increasingly concentrated around 90°.
  
  (6) Lines 125 - 126 page 5.
  
  “After detecting bouts, we computed each bout’s direction and amplitude and classified them into 12 types.”
  
  It would be interesting to see how the distribution of bouts looks in the direction-amplitude space, in order to better visualize the 12 bout types (perhaps using different colors). It might also be useful to include examples of the 12 bout types in the supplementary material.
  
  We thank you for this helpful suggestion. To better visualize the distribution of bouts and the definition of the 12 bout types, we have added a new supplementary figure showing the distribution of all bouts in the direction–amplitude space, with each bout color-coded according to its assigned category, consistent with the scheme used in the main text.
  
  We further quantified the frequency of each bout type across the dataset, which comprises 1,493 bouts from 7 animals. Among these, 4 animals exhibited all 12 bout types and were therefore included in subsequent regression analyses that require complete coverage of all categories.
  
  In addition, we have included examples of representative bout types in the supplementary material. These additions improve the clarity and interpretability of the bout classification scheme.
  
  (7) Lines 131 - 133 page 5.
  
  “Some neurons exhibited activity related to all bout types with similar amplitudes, yielding low coefficient variability, whereas others responded selectively to specific bout types - typically those with larger tail amplitudes and turning angles - exhibiting higher variability in regression coefficients (Fig. 5B).”
  
  I would appreciate some quantification of “typically.”
  
  We thank you for this suggestion. Fig. 5B (bottom) shows a neuron with large variability in regression coefficients across bout types, quantified by the coefficient of variation (CV). Bout types with large amplitudes and turning angles (e.g., type 12) have larger regression coefficients than others. We will remove “typically” from the text.
  
  (8) Lines 546 - 547 page 15.
  
  “Fish whose baseline tail movements were insufficient to cover all 12 bout types were excluded from further analysis.”
  
  It would be useful to report the number or proportion of animals that did not exhibit all 12 bout types. Which types of bouts are less frequently observed?
  
  Thank you for this helpful suggestion. In the full dataset (n = 7 fish), 4 animals exhibited all 12 bout types. We have now added a supplementary figure showing the occurrence probability of each bout type across all animals.
  
  (9) Line 147 page 5.
  
  Honestly, the Bayesian multi-dimensional scaling is difficult to follow, and it is not clear what new insight it provides. I assume that ”hyperbolic geometry indicates complex hierarchical organization” is the main point, but its meaning in this context is not sufficiently explained. This paragraph would benefit from being rewritten for clarity or potentially removed if it does not contribute essential information.
  
  We appreciate your insightful comments. In response, we have substantially expanded the section on Bayesian multidimensional scaling. First, we now provide an intuitive exposition (see Figure S6) of hyperbolic geometry and multidimensional scaling, clarifying why this framework constitutes a powerful approach for uncovering the geometric and functional organization of neuronal populations. Second, we show that multidimensional scaling in a curved hyperbolic space more accurately captures the correlation structure among neurons than embeddings in a flat Euclidean space. Third, and most notably, we find that the inferred curvature of the hyperbolic embedding space tightly scales with the degree of quiescence: fish in which dorsal raphe nucleus (DRN) stimulation nearly abolished locomotor activity exhibit the largest curvatures (new Figure 5F). Collectively, these computational analysis indicate that the curvature of the embedding space serves as a quantitative signature of the quiescent state.
  
  References
  
  (1) J. C. Marques, M. Li, D. Schaak, D. N. Robson, J. M. Li, Internal state dynamics shape brainwide activity and foraging behaviour. Nature 577, 239–243 (2020).
  
  (2) V. Choudhary, C. R. Heller, S. Aimon, L. de Sardenberg Schmid, D. N. Robson, J. M. Li, Neural and behavioral organization of rapid eye movement sleep in zebrafish. bioRxiv pp. 2023–08 (2023).
  
  (3) Y. Zhao, C.-X. Huang, Y. Gu, Y. Zhao, W. Ren, Y. Wang, J. Chen, N. N. Guan, J. Song, Serotonergic modulation of vigilance states in zebrafish and mice. Nature Communications 15, 2596 (2024).
  
  (4) Z. Song, C.-X. Huang, H. Zhang, C. Ye, N. Guan, J. Song, Integrated single-cell atlases unveil the operation principles of whole-brain 5-ht neuronal subsystems. Science Advances 11, eadv8128 (2025).
  
  (5) R. Haruvi, R. Barbara, I. Shainer, A. Rosenberg, L. Moshe, D. Malamud, J. Toledano, D. Braun, H. Baier, T. Kawashima, Global and compartmentalized serotonergic control of sensorimotor integration underlying motor adaptation. BioRxiv pp. 2024–09 (2024).
  
  (6) T. Kawashima, Z. Wei, R. Haruvi, I. Shainer, S. Narayan, H. Baier, M. B. Ahrens, Voltage imaging reveals circuit computations in the raphe underlying serotonin-mediated motor vigor learning. Neuron (2025).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.28.702359v2
www.biorxiv.org www.biorxiv.org

Species biology and demographic history determine species vulnerability to climate change in tropical island endemic birds

1
1. Public_Reviews 16 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  Tropical single-island endemic bird populations are particularly vulnerable to climate change. The authors investigate genetic evidence of how such species dealt with climate changes in the past as a possible predictor for how they will respond to change in the future, which could provide an important example for the fields of conservation genetics and island biogeography. The authors' integration of genomics and habitat modeling is commendable, but we find that the support for their conclusions is incomplete: at times, the results presented appear to contradict each other, the authors do not fully account for key variables, and the limited taxonomic scope may cause problematic biases for the conclusion.
  
  We thank the editors for supporting the premise of this study and highlighting the importance of the study approach. Based on the lacuna identified by the editors and the reviewers, we have modified the manuscript and details of the same are given below. We believe that these revisions have now substantially improved the flow and scope of the manuscript and have addressed the concerns raised by the reviewers.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors combine PSMC and habitat modeling to try to connect habitat change during the Last Glacial Period to changes in Ne.
  
  Strengths:
  
  Observing how tropical single-island endemic bird species responded to habitat change in the past may help inform conservation interventions for these particularly vulnerable species. The combination of genomics and habitat modeling is a good idea - this sort of interdisciplinary thinking is what is needed to tackle these complex questions. Additionally, the use of PSMC makes it possible to perform this analysis on poorly-studied species with only a single genome available.
  
  Room for Improvement:
  
  Why coalescent Ne is a better predictor of extinction risk than current genomic diversity, or current Ne, isn't explicitly explained. PSMC in particular has many caveats, and some are not acknowledged or adequately addressed by the authors. For example, the authors note that population structure is a confounding factor with PSMC, but that it is not a problem in this instance. They do not provide compelling evidence for why this would be the case, they simply state that the species studied are all single-island endemics. However, single-island endemic species are not necessarily panmictic; this is even less likely to be true for species studied here that inhabit a large geographic area (ie, Australian species). Differing PSMC parameters may also impact results: the differences between passerines and non-passerines were one of their main results, but they do not provide any analysis to show that this difference was not driven by the different mutation rates used for the two groups.
  
  Parameters for many steps are not described, and choices that are described (such as the PSMC parameters) are not always fully explained. It is unclear why all data was mapped to the autosomes rather than removing reads that map to the sex chromosomes first. Using all the data, the reads belonging to the sex chromosomes could potentially map to other areas of the genome. It does not seem like a mapping quality filter was used, so these potential spurious alignments would not have been removed prior to analysis.
  
  There are points where the results are described in ways that appear to potentially differ from the supplementary figures. The authors state that even for species where PSMC results differed between models, "trends of Ne increase or decrease from the LIG to LGM were robust across all three PSMC models considered." The figures in the supplement for Pachycephala philippinensis, Rhynochetos jubatus, and Zosterops hypoxanthus appear to potentially contradict this statement, but it is difficult to tell, as the time period observed is not clearly marked on the graphs. How this robustness of trends was determined is not explained, leaving the precision of the analysis unclear.
  
  Table 1 also includes some information that contradicts what is in the Supplementary Tables, leading to a lack of clarity. Centropus unirufus, Chaetorhynchus papuensis, and Cnemophilus loriae are not included in Supplementary Table 4. Table 1 says Eulacestoma nigropectus, Paradisaea rubra, and Parotia lawesii did not undergo PSMC analysis, but Supplementary Table 4 says PSMC and modeling trends matched for these species. Table 1 says Rhagologus leucostigma underwent both PSMC and climate modeling, but Supplementary Table 4 says "NA" as if it was missing one of these analyses.
  
  Additionally, some of the results appear to contradict each other. For example, they show that there is no impact of habitat change in larger-bodied species, but also that larger-bodied species saw a decrease in Ne during the LGP. In another example, they state that when a species saw an increase in habitat during the LGP, they also had an increase in Ne. However, they also state that this was not the case for non-passerines.
  
  Ecosystems are highly complex; there may also be other variables influencing past demographic change other than those explored here. Results should be interpreted with caution.
  
  We thank the reviewer for their comments, which has helped us in improving the scope of the manuscript while also removing errors in the supporting information. We have improved the section of the manuscript which addressed the drawbacks of PSMC in our revised version. Details and rational for parameter choice are now included in the revised manuscript.
  
  We performed additional PSMC analyses for a subset of the samples (n = 5), wherein the scaffolds mapping to the sex chromosome were removed only after mapping the reads. We compared the new approach suggested by the reviewer to our original approach and no differences in the PSMC pattern were observed, highlighting the robustness of the results (Supplementary Information Fig. S3).
  
  Additionally, we have included multiple box-plot and tables in the revised manuscript that helps with interpreting the changes in effective population size. The details of the revisions are presented below in the “Recommendations for the authors” section. We believe that these changes have improved the scope of the manuscript and removed any redundancies and conflicts.
  
  Reviewer #2 (Public review):
  
  Summary and strengths:
  
  In this manuscript, Karjee and colleagues used coalescent-based effective population size reconstruction (PSMC) from single genomes to understand past population trends in island birds and related this to life history traits and glacial patterns. This concept is fairly new, as there are still relatively few multiple PSMC synthesis studies. I also thought that the focus on island endemics was unique and adds value to this paper. I enjoyed seeing a paper focused on South East Asia and think that this could help contribute to our knowledge of the important biodiversity within this region.
  
  Major weaknesses:
  
  My biggest concern with this paper is that the analyses are limited to 20-30 species, and significant taxonomic bias is present (there are multiple species of passerine but only 1-2 representatives of other groups). While this is not an issue alone, many of the life history traits or geographical traits are conflated with phylogenetic diversity (e.g., there are no large-bodied passerines). Thus, it is my opinion that the impact of these drivers of past population size is conflated and cannot be disentangled with the current data. The authors themselves state that the core hypothesis surrounding Ne and habitat availability is not supported by their entire dataset (only seen in Passerines). This was not clear enough in the abstract, and conclusions cannot be drawn here as the impact of taxonomy cannot be separated from data richness, traits, etc. The PSMC analysis was done according to the most recent recommendations, and this part of the manuscript is fairly robust. However, in several places, it is incorrectly stated that the PSMC measures or can infer genetic diversity; PSMC only infers past effective population size. It cannot measure genetic diversity in the past. I cannot review the habitat reconstruction modelling as I am a conservation genomics specialist.
  
  Appraisal:
  
  I am not convinced about the findings within the paper. I do not think that the results are sufficiently supported at this time, largely due to the conflation of taxonomy with other variables. As this type of comparison is new, I do think that there is a chance for reasonable impact on the field of genomics and island biogeography if the manuscript's constraints are addressed. I do not see scope for impact on conservation at this time and find the conclusions in the abstract regarding conservation relevance to be unfounded.
  
  We thank the reviewer for highlighting the unique and robust analytical approaches we have taken in this study. We agree with the reviewer that our sample size currently is small. However, we do observe a robust correlation between habitat fluctuation and change in effective population size. Further, the study also highlights the predicament of tropical island endemics, which are currently understudied and future studies are necessary to safeguard the biodiversity. We have highlighted this while also addressing the concerns in the revised version of the manuscript.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Overall:
  
  This starts with a great premise - looking at how tropical single-island endemic bird species dealt with climate changes in the past may be a predictor of how they will respond to change in the future. Since these species are at high risk of extinction in the face of climate change, tailored approaches to conservation are a good idea. While the premise is solid, I have some questions and recommendations. At times while reading, I did feel a bit confused, which may be due to the fact that this isn't my exact area of expertise. However, if I'm confused, that means a reader from a general audience is also likely to be confused. Some results appear to be conflicting, some claims about data seem possibly inaccurate, and some major limitations are not acknowledged or fully addressed.
  
  Below I've noted areas that I feel could benefit from revisions. That being said, I liked the integration of habitat modeling and genomics! These sorts of multifaceted approaches are necessary when it comes to unraveling the complex dynamics involved in ecology and evolution.
  
  Crucial Issues to Address:
  
  (1) Line 75: With the lower sea levels and habitat change, you say animals can disperse across barriers of land and sea. When it comes to these single-island endemics, were they always confined to a single island? Is there no possibility of introgression with ancient populations of birds on other islands during these periods?
  
  We thank the reviewers for identifying the potential artifact in effective population size estimates that may occur due to hybridization/introgression. Most of our species belong to small and oligotypic families as has been addressed in the discussion section already, making them likely to be newly arisen lineages rather than refugial ones. There is scant information available in the literature on where the species in our dataset originated from, and further species-specific studies are required to identify signatures of hybridization/introgression. However, we have included this caveat in the revised version of the manuscript (line numbers: 73-78 and 303–305).
  
  (2) Lines 149-151 "However, in these species as well, trends of Ne increase or decrease from the LIG to LGM were robust across all three PSMC models considered." Please double-check this claim. Some of your figures in the supplement appear to contradict this. In particular, Pachycephala philippinensis, Rhynochetos jubatus, and Zosterops hypoxanthus appear to differ a bit in the time frame described, but it is difficult to tell-I would recommend adding some shading on the graphs to indicate the time period observed. If there was a way you determined this that is more precise than eyeballing the figures like I did, this should also be explained.
  
  We thank the reviewer for this comment and have reworded the sentence by cross verifying with the PSMC graphs. In addition, we have calculated the precise values of effective population size at the Last Interglacial (LIG) and Last Glacial Maximum (LGM) for each species using custom scripts and used these to evaluate whether the change in Ne during the Last Glacial Period (LGP) was significantly different for the three PSMC settings used. A table depicting these effective population size changes from LIG to LGM are also included in the revised version of the manuscript (Supplementary table S4; line numbers: 145-156 and 345-357).
  
  (3) Lines 280-292: Issues with PSMC that are not acknowledged here are my largest concern. The situation being investigated does not necessarily meet all the assumptions PSMC makes (ie, neutral evolution and panmixia), which should be explained in this section. I'll point out the two issues I think should be acknowledged and addressed: First, selection is a confounding factor with PSMC, which is not mentioned here. While that's likely not an issue due to the size of the genome, this is still something that should be stated and explained. Second, the following statement is what I take the most issue with: "Population structure is thus a confounding factor. However, this is unlikely to be a problem given that all our species are single-island endemics". This needs justification. You state that in the past, islands could be connected (see my first comment regarding line 75), so it seems unlikely that 1) migration between past populations on other islands never happened, and 2) there is no population structure *on* the island.
  
  We thank the reviewer and have modified the PSMC caveats section of the revised version of the manuscript (line numbers: 289-307).
  
  (4) Line 310: Mapping all the data to the autosomes seems inappropriate to me. The sex chromosome reads could potentially map to other areas of the genome. Unless this information was accidentally left out of the methods section, it doesn't seem like any mapping quality filter was used, so spurious alignments aren't being removed. To remove sex chromosome data, I would instead align data to the whole genome, remove all reads that map to the sex chromosomes, and then map the remaining reads to the autosomes.
  
  As mentioned earlier, for a subset of the species (n =5), we directly mapped raw reads files onto the genome and then called SNPs on only autosomal regions using the SAMtools mpileup-bcftools pipeline, after which we performed PSMC as above (Supplementary Information Fig. S3). We did not observe and significant difference between the two approaches. Further, only high-quality mapped reads were used for SNP calling as mentioned in the previous version of the manuscript (line numbers: 338-343; Supplementary Information Fig. S3).
  
  (4) Table 1 includes some information that contradicts what is in the Supplementary Tables: Centropus unirufus, Chaetorhynchus papuensis and Cnemophilus loriae are not included in Supplementary Table 4. Table 1 says Eulacestoma nigropectus, Paradisaea rubra, and Parotia lawesii did not undergo PSMC analysis, but Supplementary Table 4 says PSMC and modeling trends matched for these species. "Pseudorectes ferrugineus" and "Rhynochetos jubatus" are spelled differently in Supplementary Table 4. Table 1 says Rhagologus leucostigma underwent both PSMC and climate modeling, but Supplementary Table 4 says "NA" as if it was missing one of these analyses.
  
  We thank the reviewer for identifying the errors and we have corrected for these in the revised version of the manuscript. Please see the detailed changes for these comments outlined below
  
  Centropus unirufus, Chaetorhynchus papuensis and Cnemophilus loriae are not included in Supplementary Table S4 (Now Supplementary table S2): we have added these species to the revised table S2.
  
  Table 1 says Eulacestoma nigropectus, Paradisaea rubra, and Parotia lawesii did not undergo PSMC analysis, but Supplementary Table 4 says PSMC and modeling trends matched for these species: The genomes for these samples were obtained from museums and exhibited high error rates. Hence, we excluded these samples from further analysis. However, the supplementary table S2 was not updated, and we have corrected this error in the revised version of the manuscript.
  
  "Pseudorectes ferrugineus" and "Rhynochetos jubatus" are spelled differently in Supplementary Table 4 (Now table S2): we have corrected the typographical error in the revised manuscript.
  
  Table 1 says Rhagologus leucostigma underwent both PSMC and climate modeling, but Supplementary Table 4 (Now table S2) says "NA" as if it was missing one of these analyses: This was a typographical error, and we have updated it to “mismatch”.
  
  Major Issues to Address:
  
  (1) Lines 97-99: "Information on tropical, single-island endemics' demographic responses to past climate change can inform conservation efforts, owing to the genomic signatures that predispose a species to extinction". This needs more explanation. For example, why couldn't we just look at these genomic signatures instead of recreating demographic responses? I'm not sure I fully understand what you mean here.
  
  We thank the reviewer for this comment and have modified the introduction to highlight the importance of demographic history in predicting species extinction. Comparison of genomic diversity and demographic history of over 200 mammalian genomes, highlights the importance of demographic history in predicting species endangerment and extinction risk (Wilder et al., 2023) (line numbers: 99-104).
  
  (2) Line 181-182: Whether or not a species was a passerine was an important predictor of Ne only in combination with the change in habitat from LIG to LGM". This is a major finding, but "respond positively to habitat change" (line 183) is a bit ambiguous. Were they responding to habitat expansion? Habitat contraction? Increase in rainfall? What is the change happening? Not all habitat changes are equal.
  
  We thank the reviewer for this comment and have modified this section for clarity in the revised results and discussion section of the manuscript. We observed a positive correlation between effective population size and availability of suitable habitat. Further, we observed precipitation of the warmest quarter to be the largest contributing bioclimatic variable for all but one Caribbean species (line numbers: 172-191; 196-211).
  
  (3) Line 184-185: "The interaction between habitat change and body mass (β = 10.05, 95% CI: [-0.3, 24.41) suggests that there is no impact of habitat change in larger species." Doesn't this contradict the earlier finding of larger-bodied species seeing a decrease in Ne? Or do you mean the decrease in Ne was not due to habitat change?
  
  We have edited this section for clarity. With the inclusion of additional species, we observed a significant positive relationship between body size and effective population size (line number: 191-193).
  
  (4) Lines 206-207: "Our results also reveal that both passerine and non-passerine island endemics have entered the Holocene with low genetic diversity." How does this align with the statement that passerines responded positively to habitat change?
  
  The observation that passerines respond positively to habitat change is based on a systematic analysis of the last glacial period. However, a close look at the entire species’ demographic history reveals the often the Ne is at the lowest following the LGM, and coinciding with the advent of Holocene, the current interglacial. We have therefore modified the sentence in the revised version of the manuscript (line numbers: 213-214).
  
  (5) Line 215: If we already know flightless birds and endemics are particularly prone to extinction, what is the benefit of this study? Be clear about how your method can be used in a way that is better than what people are already doing. It would be good to explicitly explain why coalescent Ne is a better predictor of extinction risk than current genomic diversity or current Ne.
  
  We thank the reviewer for this comment and have modified this section in the revised version of the manuscript (line numbers: 221-224).
  
  (6) Line 259-261: "Habitat change in the LGP was positively associated with Ne fluctuations (Figure 3, β = 9.45), that is, species which showed an increase in habitat in the LGP also showed a concurrent increase in Ne." Is this true in all instances? I thought you found it had no effect for some, or did I misunderstand?
  
  We thank the reviewers for pointing this out. Species which showed an increase in habitat in the LGP did not always show a concurrent increase in Ne. Our results instead reflect an overall trend and this is clarified in the revised version of the manuscript (line numbers: 268-269).
  
  Lines 328-330: Could the different mutation rates used for passerines and non-passerines be driving the differences found between the two groups?
  
  The difference in the mutation rate is low and using the passerine specific mutation rate for non-passerines only shifts the PSMC graph slightly. As our analysis is considering the change in Ne across the LGP, this shift is minimal and does not affect the overall results.
  
  How are you connecting the demographic changes to species traits? I'm a bit confused about that, so I think some further explanation would be beneficial.
  
  We have modified the discussion to highlight the role of species traits in shaping the species response to habitat modification and ultimately the change in effective population size. We have included this in the revised version of the manuscript (line numbers: 437-439).
  
  Minor Issues to Address:
  
  (1) Lines 165-168: "Habitat change was poorly associated with change in Ne for the 20 species for which both PSMC and ENM analyses were possible (Cramer's V = 0.15). However, passerine species only showed a strong association (Cramer's V = 0.96), while non-passerines showed a weak negative association (Cramer's V = -0.15)." This is phrased in a way that is a bit confusing. I'd consider rephrasing for clarity.
  
  We have modified this section in the revised version of the manuscript (line numbers: 167-170).
  
  (2) Line 177: The confidence interval says "16.27, -2.61". I think it's supposed to be -16.27?
  
  We have corrected the typographical error in the revised version of the manuscript.
  
  (3) Line 185-187: "Finally, the random intercept for Country (sd (Intercept)) showed a marginal positive influence (β = 0.85, 95% CI: [0.04, 2.24])". What does this mean? This needs further explanation.
  
  We modified this sentence in the revised version of the manuscript (line number: 189-191).
  
  (4) Line 204: landbridge is misspelled as "landbride".
  
  We have fixed the typographical error.
  
  (5) Line 310: What were your Trimmomatic parameters?
  
  We have included the parameters used for Trimmomatic in the revised version of the manuscript (line numbers: 324-326).
  
  (6) Line 311: What were your bwa parameters?
  
  We used default parameters for bwa alignment and this is included in the revised version of the manuscript (line numbers: 328-329).
  
  (7) Line 322-324: Why did you choose those specific parameters for PSMC? Splitting up the first time window makes sense (as shown in Hilgers 2025), but why did you choose t=5, r=1, and 84 atomic time intervals? Did you choose these parameters independently, or did you decide to use them because they were used by Nadachowska-Brzyska et al? Either way, that information is important to state.
  
  The parameter selection followed the suggestions based on both Hilgers et al. 2025 and Nadachowska-Brzyska et al. 2016. The information is included in the revised version of the manuscript (line numbers: 345-350).
  
  (8) Lines 325-326: What did you use for bootstrapping? If not Psmcfa, why?
  
  We have used “splitfa” to generate files for bootstrap analysis and have included this information in the revised version of the manuscript (line numbers: 350-351).
  
  (9) Lines 350-354: Please explain the reasoning behind using the different resolution and worldclim for Amazona guildingii.
  
  Based on the reviewer’s comment, we have re-run the habitat model with the same resolution for Amazona guildingii and include this in the revised version of the manuscript.
  
  (10) Line 412-413: "For the response variable i.e., the change in Ne, a Bernoulli distribution with a logit link because it is a binary response variable." I think this sentence might be missing some words.
  
  We have fixed the typographical error in the revised version of the manuscript (line numbers: 444-445).
  
  (11) Figure 1 is difficult to read, especially the top left panel. I would consider presenting this differently.
  
  We have supplemented Figure 1 with boxplots of effective population size values estimated during the Last Interglacial and the Last Glacial Maximum which should aid in clarity.
  
  Reviewer #2 (Recommendations for the authors):
  
  The authors state that they intentionally chose to remove several avian species that would be suitable for this analysis, because they were subject to larger studies elsewhere. This seems like an unnecessary constraint, and it is my opinion that the authors need to add this data in. I am not aware of what species were excluded, but I hope this will increase the non-passerine proportion of their dataset to help them robustly address their questions. An alternative solution would be for the authors to only include passerines, but this will come at the expense of statistical power with the current dataset and so would also require an increase in sample size. Overall, I recommend including more non-passerine species with traits similar to your passerine species.
  
  This was a typographical error from the previous versions of the manuscript arising from the fact that we excluded museum species from our samples. We have modified this sentence in the revised version of the manuscript as well as included one new species (Melanocharis versteri) in our study panel (line number: 311-314).
  
  It was not clear how or if PSMC bootstrapping was included in the comparisons across species, i.e. how did you include bootstrapping when you turned PSMC into a response variable within your statistical analysis? Failing to account for it would introduce measurement error into the data, and I would suggest that the authors explore how to incorporate this.
  
  We thank the reviewer for this comment and have calculated the precise values of effective population size during the LIG and the LGM using custom scripts to generate boxplots. These boxplots were used to investigate if effective population size values were significantly different during the LGP for all three PSMC parameter settings. Non-significant results were treated as “no change” in effective population size for further statistical analyses. The bootstrap values were used for this analysis, in addition to circumventing the issue of selection on the genome.
  
  I would also like to see a greater discussion on what aspects of the PSMC curve were used for comparisons and the limitations therein. These cross-species comparisons are still relatively new, and I think they will add value to this paper.
  
  In our study, the change in Ne from LIG to LGM is considered. We have elaborated this in the revised version of the manuscript. Addition analysis, depicting the changes in Ne as box plots were also included to help understand the fluctuations in Ne.
  
  Lines 164-168, which refer to your core hypothesis, are really unclear. What was actually found here? Please rephrase.
  
  We have rephrased the sentence for clarity in the revised version of the manuscript (line numbers: 169-172).
  
  PSMC measures effective population size, not genetic diversity. Please change throughout.
  
  Based on the reviewer’s comment we have changed this in the revised version of the manuscript.
  
  I was surprised to see some references to conservation within the abstract of the paper. It is important that this is also included in the discussion so that the authors ensure their logic is accessible to managers. It would also be good to discuss the risks of using PSMC to inform conservation from just one genome, as I see these being quite high.
  
  We thank the reviewer for this comment and have included both pros and cons of using PSMC in the revised version of the manuscript (line numbers: 229-237).
  
  As this paper is based on public reference genomes, it is best practice that the original notes or reference genome papers are cited to acknowledge the data holders.
  
  We thank the reviewer for this comment and have included a supplementary table (Supplementary Table S7) acknowledging all the data holders.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.14.623644v4
www.biorxiv.org www.biorxiv.org

Profiling of terminating ribosomes reveals translational control at stop codons

2
1. Public_Reviews 16 Jun 2026
  
  in eLife
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This paper presents results interpreted to indicate that sequences upstream of stop codons capable of base-pairing with the 3' end of 18S rRNA prolong the dwell time of 80S ribosomes at stop codons in a manner impeded by Rps26 in the 40S subunit exit channel, which leads to the proper completion of termination and ribosome recycling and prevents spurious translation of 3'UTR sequences by one or more unconventional mechanisms.
  
  Strengths:
  
  The standard 80S and selective eRF1 80S ribosome profiling data obtained using EZRA-Seq are of high quality, allowing the authors to detect an enrichment for purine-rich sequences upstream of stop codons at sites where termination is relatively slow and ribosomal complexes are paused with eRF1 still engaged in the A site.
  
  Weaknesses:
  
  There are many weaknesses in the experimental design and interpretation of results that undermine several of the final conclusions of the study described in the abstract, as described in detail below.
  
  (1) It's not indicated how far upstream of the stop codon the sequences were searched to find the enriched motifs in Figs. 1C and 2D. If it's further upstream of -15 then the sequence would generally not be found in the exit channel of a terminating ribosome positioned with the stop codon in the A site in the manner expected from their final model of mRNA:18S rRNA pairing. (This would be analogous to the occurrence of the Shine-Dalgarno within 15 nt of the initiation codon for most mRNAs in E. coli.) They could have depicted nucleotide percentages at each nucleotide from -1 to -15 for the high and low pause stop codons to better facilitate consideration of their proposed mechanism of termination pausing involving the 3' end of 18S rRNA.
  
  (2) lines 234-242: Their reporter data in Fig. 4B suggest that only the presence of GGG triplets at any location in the 9 nt substantially prevents downstream translation. If their interpretation about these G-rich sequences promoting termination by forming G-quadruplexes is correct, then this would have little to do with the purine-rich motifs identified by the profiling experiments (and their proposed function in base-pairing with rRNA), as the purine-rich motifs do not feature GG bases (as shown in Fig. 2D in particular). The authors point out that the MPRA can sample sequence space not represented in living cells. While true, this doesn't change the fact that it failed identify sequences conforming to the purine rich motifs found by the profiling experiments and identified instead sequences capable of forming G-quadruplexes that may well function by a different mechanism than that employed in cells. The authors cannot persist in claiming that the MPRA results confirm the findings of the profiling experiments regarding the purine-rich motif. Also, the claim of enrichment for C-rich sequences in the MPRA results is not compelling as only 3 of the 11 triplets showing the smallest M/P ratios contain more than 1 C and three of them contain no Cs. Also, there was no evidence for depletion of C's upstream of the stop codons with low pause scores from the ribosome profiling data in Fig. 1, so it's inaccurate to claim "mirroring" of results from the ribosome profiling and MPRA data on this point as well.
  
  (3) lines 256-260: I still contend that the different results shown in Fig. 4E for the C-rich and GA-rich sequences are not compelling as results for only a single sequence of each type are shown, which might not be typical of the entire class. In fact, the GA-rich sequence has two GG's and could form a G-quadruplex, whereas the GA-rich motifs identified by ribosome profiling and eRF1-seq do not exhibit consecutive GGs, such that the single G-rich sequence chosen for analysis might function by G-quadruplex mediated stalling rather than base-pairing with the 3' end of 18S rRNA, as they actually suggested in their rebuttal. Even the second GA-rich sequence analyzed in Fig. S3G has two GGs. Thus, while the results in Fig. 4 provide support for the notion that C-rich sequences preceding the stop codon promote stop codon read-through, it's important to note that no evidence was obtained by ribosome-profiling in Fig. 1 that the increased 3'UTR translation seen for low-pause stop codons is associated with C-rich sequences. It's unclear why they would be unable to observe this in the manner they document for the eRF1-Seq data in Fig. 2D for the three C-rich triplets enriched at stop codons lacking eRF1 peaks.<br /> - lines 278-282: These differences are quite small and could arise from the different sequences of the GFP-HiBit fusion proteins, as observed in Fig. 4C (top two control constructs), precluding mechanistic interpretations.
  
  (4) Notwithstanding their claim in the rebuttal, I still find no definition of the GA-rich and C-rich mRNAs described in Fig. 5C in the Methods or legends, nor whether the compilation is restricted to -15 from the stop codons. In addition, if expression of the mutant 18S rRNA is sufficient to alter the height of the termination peaks as shown in Fig. 5C and to alter reporter expression in Fig. 5D, I see no reason why they cannot carry out the pause score/motif enrichment of Fig. 1C to determine if they see the expected diminished enrichment for the GA-motif shown there on expressing the mutant 18S vs. the WT 18S control strain. If not, this would undermine their interpretation of the results in Figs. 5C-D as favoring base-pairing between the 3' end of 18S rRNA and sequences upstream of the stop codon.
  
  (5) I still find a significant shortcoming in their failure to analyze the 18S rRNA 3' end biochemically to show that the expected ~15% with the mutant sequence. Stating simply that they followed a previous protocol is not sufficient to document their success in this notoriously challenging experimental approach.
  
  (6) lines 382-384: The level of the control protein RACK1 is diminished in testis polysomes, and it's unclear that the ratio of Rps26:RACK1 is actually lower in testis polysomes in the manner claimed.
  
  (7) lines 414-427: I still contend that the authors should have quantified the ratio of the stop codon peak to the adjacent coding sequences in Figures 7E to establish that Rps26 OE decreased the stop codon peaks selectively on the GA-rich cohort of mRNAs. In addition, they still have not explained why the C-rich reporter behaves like the GA-rich reporter in Fig. 7F in showing reduced HiBiT expression on Rps26 OE when it should be unaffected. As such, the reporter data do not support the conclusion reached from the data in Fig. 7E.
  
  (8) Notwithstanding their rebuttal I still contend that the failure to measure Rps26 association with 80S ribsoomes or polysomes and show that it is depleted by the shRNA knockdown and increased by Rps26 OE is a significant shortcoming, especially since their interpretation of the OE data depends on the occurrence of 40S subunits lacking Rps26 in unstressed WT cells, which seems improbable based on the prior work on yeast.
  
  (9) Overall, examining the claims in the revised Abstract, I feel that I am in agreement with the claim "We identify a sequence motif upstream of the stop codon that promotes termination pausing,.." but disagree that the function of this motif was "validated by massively paralleled reporter assays", for the reasons stated above in point 2. Regarding the statement "Unexpectedly, reduced termination pausing increases the likelihood of stop codon slippage, giving rise to proteins with heterogenous C-terminal extensions." , I believe it would be more cautious to say that "reduced pausing is associated with stop codon read-through accompanied by frameshifting" since the MRPA did not provide compelling evidence for causality for the reasons described in point 3 above. Regarding the statement "Mechanistically, we show that sequence-dependent termination pausing arises from post-decoding mRNA scanning by the 3' end of 18S rRNA", I find this statement too strong in view of the shortcomings described above in points 4-5 and think it would be more correct to say that their findings are consistent with (rather than showing) this point, and also think they should add qualifying statements to the manuscript acknowledging the limitations of these experiments. I further contend that there are shortcomings in the experiments leading to the conclusion that the stoichiometry of Rps26... modulates mRNA:rRNA interactions, described above in points 6-9. Finally, in the last sentence, the claims that termination pausing is shaped by ribosome heterogeneity, and cell type-specific translational control is too strong.
  
  Review 1
2. Public_Reviews 16 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank the Editor and Reviewers for their careful evaluation of our manuscript and for the constructive feedback. We agree with eLife’s overall assessment that, while profiling terminating ribosomes provides important insights into termination dynamics, additional clarification of the underlying mechanisms was needed. In response, we have focused our revision on three major conceptual points:
  
  (1) We have moderated our interpretation regarding the contribution of putative mRNA:rRNA interactions to sequence-specific termination pausing and clarified the limitations of the current evidence.
  
  (2) We have refined and clarified our model for the role of Rps26 in regulating translation termination.
  
  (3) We have expanded and strengthened the discussion of tissue-specific termination pausing, including its potential implications and current uncertainties.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors use high-resolution ribosome profiling (Ezra-seq) and eRF1 pulldown-based ribosome profiling (eRF1-seq) developed in their lab to identify a GA rich sequence motif located upstream of the stop codon responsible for translation termination pausing. They then perform a massively parallel assay with randomly generated sequences to further characterize this motif. Using mouse tissues, they show that termination pausing signatures can be tissue-specific. They use a series of published ribosome structures and 18S rRNA mutants, and eS26 knockdown experiments to propose that the GA rich sequence interacts with the 3′-end of the 18S rRNA.
  
  Strengths:
  
  (1) Robust ribosome profiling data and clear analyses clarify the subtle behavior of terminating ribosomes near the stop codon.
  
  (2) Novel termination or "false termination" sites revealed by eRF1-seq in the 5′-UTR, 3′-UTR, and CDS highlight a previously underappreciated facet of translation dynamics.
  
  Weakness:
  
  (1) Modest effects seen in ABCE1 knockdown do not seem to add up to the level of regulation. The authors state "ABCE1 regulates terminating ribosomes independent of the sequence context" on pg 9, and "ABCE1 modulates termination pausing independent of the mRNA sequence context" in the figure caption for Figure S4. Given the modest effect of the knockdown, such phrasing is most likely not supported. Further clarification of "ABCE1 plays a generic role in translation termination" is necessary.
  
  We acknowledge that the modest effects observed for ABCE1 are likely influenced by incomplete knockdown in HEK293 cells. Importantly, the increased ribosome density occurred at all stop codons rather than in a sequence-dependent manner, supporting the conclusion that ABCE1 functions broadly in termination rather than acting in a sequence-specific context. We have revised the manuscript to clarify this point and to temper our interpretation accordingly.
  
  (2) The authors propose that the GA rich sequence element upstream of the stop codon on the mRNA could potentially base pair with the 3′-end of the 18S rRNA. In the PDBs the authors reference in their paper and also in 3JAG, 3JAH, 3JAI (structures of terminating ribosomes with the stop codon in the A-site and eRF1), the mRNA exiting the ribosome and the 3′-end of the 18S rRNA are about 25-30 A apart. In addition, a segment of eS26 is wedged in between these two RNA segments. This reviewer noted this arrangement in a random sampling of 5 other PDBs of mammalian and human ribosome 80S structures. How do the authors anticipate the base pairing they have proposed to occur in light of these steric hindrances? RpsS26 is known to be released by Tsr2 in yeast during very specific stresses. Is it their expectation that termination pausing in human/mammalian cells happens during stressful conditions only?
  
  We agree that structural rearrangements in the absence of Rps26 remain speculative. In the revised manuscript, we have removed overly definitive language and clarified that, while Rps26 dissociation has been reported under stress conditions, its stoichiometry is unlikely to be exclusively stress-dependent. We now present this aspect as a working model supported by indirect evidence rather than a demonstrated structural mechanism.
  
  (3) The authors say, "It is thus likely that mRNA undergoes post-decoding scanning by 18S rRNA." (pg. 10). It is unclear what the authors mean by "scanning." Do they mean that the mRNA gets scanned in a manner similar to scanning during initiation? There is no evidence presented to support that particular conclusion.
  
  We appreciate the comment regarding the term “18S rRNA scanning.” We recognize that this wording may have been misleading and have revised the relevant text to more accurately describe post-decoding mRNA–rRNA interactions without implying an active scanning mechanism.
  
  (4) Role of termination pausing in the testis is highly speculative. The authors state: "It is thus conceivable that the wide range of ribosome density at stop codons in testis facilitates functional division of ribosome occupancy beyond the coding region." It is unclear what type of functional division they are referring to.
  
  We agree that the functional significance of testis-specific termination dynamics remains unclear. As multiple reviewers raised this concern, we have substantially expanded the discussion of tissue-specific termination pausing, explicitly outlining current limitations and framing this as an important direction for future investigation.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This paper presents results interpreted to indicate that sequences upstream of stop codons capable of base-pairing with the 3' end of 18S rRNA prolong the dwell time of 80S ribosomes at stop codons in a manner impeded by Rps26 in the 40S subunit exit channel, which leads to the proper completion of termination and ribosome recycling and prevents spurious translation of 3'UTR sequences by one or more unconventional mechanisms.
  
  Strengths:
  
  The standard 80S and selective eRF1 80S ribosome profiling data obtained using EZRA-Seq are of high quality, allowing the authors to detect an enrichment for purine-rich sequences upstream of stop codons at sites where termination is relatively slow and ribosomal complexes are paused with eRF1 still engaged in the A site.
  
  Weaknesses:
  
  There are many weaknesses in the experimental design, interpretation of results, and description of assay design and assumptions, the data obtained, and the interpretation of results, all of which detract from the scientific quality and significance of this work. In fact, a large proportion of paragraphs in the text and figure panels present some difficulty either in understanding how the experiment or data analysis was conducted or what the authors wish to conclude from the results, or that stem from an overinterpretation of findings or failure to consider other equally likely explanations.
  
  We appreciate the reviewer’s thoughtful evaluation and constructive suggestions. We recognize that our original description of the MPRA and reporter assay results may have lacked sufficient clarity, particularly regarding the sequence motifs associated with termination pausing. In the revised manuscript, we have carefully rewritten these sections to clarify the experimental design, data interpretation, and relationship between sequence context and termination dynamics. We believe these revisions address the reviewer’s concerns and improve the overall clarity of the manuscript.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This study from Jia et al carried out a variety of analyses of terminating ribosomes, including the development of eRF1-seq to map termination sites, identification of a GA-rich motif that promotes ribosome pausing, characterization of tissue-specific termination dynamics, and elucidation of the regulatory roles of 18S rRNA and RPS26. Overall, the study is thoughtfully designed, and its biological conclusions are well supported by complementary experiments. The tools and datasets generated provide valuable resources for researchers investigating the mechanisms of RNA translation.
  
  Strengths:
  
  (1) The study introduces eRF1-seq, a novel approach for mapping translation termination sites, providing a methodological advance for studying ribosome termination.
  
  (2) Through integrative bioinformatic analyses and complementary MPRA experiments, the authors demonstrate that GA-rich motifs promote ribosome pausing at termination sites and reveal possible regulatory roles of 18S rRNA in this process.
  
  (3) The study characterizes tissue-specific ribosome termination dynamics, showing that the testis exhibits stronger ribosome pausing at stop codons compared to other tissues. Follow-up experiments suggest that RPS26 may contribute to this tissue specificity.
  
  Weaknesses:
  
  The biological significance of ribosome pausing regulation at translation termination sites or of translational readthrough, for example, across different tissue types, remains unclear. Nevertheless, this question lies beyond the primary scope of the current study.
  
  We thank the reviewer for the positive assessment of our work. We agree that tissue-specific differences in termination pausing were insufficiently described in the original submission. In response, and in light of similar concerns from other reviewers, we have expanded the relevant sections in the main text and Discussion. We now more clearly articulate both the biological context and the current limitations, identifying tissue-specific regulation of termination as an open question and future research direction.
  
  Reviewer #4 (Public review):
  
  Summary:
  
  This manuscript by Qian and colleagues utilizes ribosome profiling, and reporter assays to dissect translation termination. Unfortunately, the data do not support the conclusions of the paper, controls are missing and several assays are not well validated and do not reproduce previous findings from others.
  
  Specific comments:
  
  Translation termination has been studied in several organisms including mammalian cells and yeast. In those cases what is analyzed is not the peak height at the stop codon, but rather the difference in the ribosome density before and after the stop. Thus, analyzing peak height is not validated. I understand that this is relevant only for the ribosome profiling experiments (and Ezra-seq) not the RF1 profiling. But much of the data was acquired that way.
  
  Moreover, the data do not reproduce previous findings and no effort is made to connect them to previous data. Previous data has shown that stop codon efficacy varies. This is not reproduced (S1C). Similarly, an effect from the +1 residue is not reproduced. The data isn't even stratified by different stop codons as previous work has shown that different surrounding residues have different effects in the context of different stop codons. Thus, none of the sequencing data is validated or trusted and does not reproduce previous findings.
  
  The GA-rich sequence identified by Ezra-Seq and RF1 seq is not the same and it differs from previous sequences (Wangen &Green).
  
  The authors claim that the majority of Rf1 peaks is at stop codons, but that is not true. It is only about 30% of the peaks. Also, not all mRNAs have peaks at the stop codons. That is at best problematic. Finally, there are mRNAs that are known to "suffer" from NMD, what do these look like in the Ezra-Seq and RF1-Seq? How about mRNAs that have programmed frameshifts? This raises questions on the validity of the eRF1 data.
  
  Figure 4: First, instead of M/P ratio, one should analyze M/M+P, to normalize out differences in the loading and effects from collisions, which are guaranteed to occur here, but not considered or analyzed. Second, the data are analyzed as if what matters are codons in the P and E site (and beyond, where there are definitely NOT recognized codons). While there is evidence for some interactions, one would think that an additional analysis based on sequence would be helpful. Also, the supplemental data indicates that very rarely are there reciprocal changes (as should be the case), and as seen for stop codons.
  
  Regarding the HiBit reporter assay: The two sequecnes clearly have effects on translation without considering stop codon context (Figure 4C), which need to be taken into account. Also, the effect from the sequences varies in the context of the assay in 4C and 4D (2-fold vs .5 fold), further questioning the assay. Moreover, the authors claim that re-initiation cannot account for Hibit levels, but that is clearly incorrect. The western in Figure 4E does not reproduce the data in 4D. While Hibit goes up (as in 4D, the putative GFP-fusion goes down. Finally, while the second reading frame should be more efficient is not explained and further argues for an artifact. Previous work (and work herein) suggests that read-through occurs equally in each reading frame. No controls for these assays are presented: e.g. stimulation by antibiotics, ABCE1 depletion, etc.
  
  Figure 5 has similar problems. I don't understand how the Figure in 5A is made, but when you overlay the cited structures on Rps26, the molecules are identical. I guess the authors used some fantasy to build non-existing sequences differently into the structure. There is no basis for that. In panel C and the same in Figure 7, the number of analyzed mRNAs varies. This could influence the outcome and the EXACT same set of mRNAs should be analyzed. But the main problem here is that the authors need to analyze readthrough and not peak height as detailed above. Essential controls are missing that show what fraction of the 18S rRNA is mutated. Previous work has shown that 2 nt truncated 18S rRNA is actively degraded. It is hard to believe how 15% of altered ribosomes can abolish 100% of the effect from the C-rich sequences. Important validation is missing: the authors should analyze rRNA sequences in their ribo-seq dataset to demonstrate that they have the mutated rRNAs, and that these enrich and de-enrich as predicted.
  
  In Figure 5-7 the authors develop a model that the sequence selectivity arises from base pairing between 18S rRNA and the mRNA. If so, then they should really stratify the data by number of WC pairs that can be formed. And only WC pairs, as GU pairs have a totally different geometry that will likely be discriminated against in this context. Also, the mutation is in a part of the helix that has no effect (Figure S3G). Thus, the data within the manuscript are inconsistent.
  
  Figure 6 does not agree with published data (Li et al., Nature 2022). Previous work did not show testis-depletion of Rps26 in purified ribosomes. This is the critical difference as the authors here did not purify ribosomes. Also, another Rps is an essential control, even if purified ribosomes are used. The validity of this dataset is thus questionable . Depletion from polysomes is hard to believe, as overall there is less signal in the polysomes.
  
  Figure 7 has similar problems as figure 5. Different pools of mRNAs are analyzed; peak height is not validated. Overexpression of Rps26 is not shown, as only Myc is shown, not Rps26. Beyond that, increased occupancy in ribosomes needs to be shown for the effect to come from ribosomes. Given how sick the cells are it is most likely that all effects are secondary and arise from whatever else is going on in the overexpression or depletion of Rps26. No controls are presented to show specific effects from Rps26.
  
  The authors need to check Rli1/ABCE levels in their cells. Their data have features that are indicative of low ABCE1 levels. These include a very small effect from ABCE1 depletion. These could be responsible for some of the effects they observe.
  
  We appreciate the reviewer’s engagement with our study and the opportunity to clarify several points.
  
  With respect to perceived inconsistencies with prior literature, we emphasize that our findings do not contradict established principles of translation termination. Rather, enabled by the development of eRF1-seq, we provide higher-resolution insight into termination dynamics that extends existing models. We have revised the manuscript to better contextualize our findings within prior studies and to avoid overstating novelty where continuity exists.
  
  Regarding the analysis of ribosome profiling data, we note that peak height and read density are widely used metrics for inferring ribosome dwell time and pausing. Nevertheless, we recognize that our original presentation may not have sufficiently explained this analytical framework. In the revised manuscript, we have clarified the rationale and interpretation of peak-based analyses, particularly in Figures 5 and 7 involving 18S rRNA mutants and Rps26 perturbation.
  
  Finally, we appreciate the reviewer’s comments concerning base pairing. We have carefully revised both the Results and Discussion sections to present mRNA–rRNA interactions as a supported but not definitively proven mechanistic model, clearly distinguishing experimental evidence from inference.
  
  We are grateful for the reviewers’ thoughtful feedback. We believe the revisions have strengthened the manuscript by clarifying interpretations, moderating mechanistic claims, and expanding discussion of tissue-specific regulation, while preserving the central contributions of the study.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Some minor typos are present in the main text and methods section.
  
  We thank the Reviewer’s attention to detail in reviewing our manuscript. We have now thoroughly revised the main text and methods section.
  
  (2) S1I is missing or unlabelled.
  
  We are glad to have this opportunity to fix this mistake. Both S1I and S5D have now been added to the revised figures.
  
  (3) Could the authors clarify in the main text whether crosslinking was a step in the eRF1-seq protocol? Pg 5: "Without crosslinking, ribosomal proteins were minimally pulled down by the eRF1 antibody, confirming the transient nature of eRF1 binding."
  
  Yes, crosslinking is needed for eRF1-seq. We tried no-crosslinking but very little was pulled down, as stated in the sentence in Page 5.
  
  (4) Are termination events in the 5′-UTR or the CDS, as seen in the eRF1-seq data, also influenced by the GA-rich sequence? If the data is disaggregated into those two buckets, can you still pull out the motif?
  
  Yes, stop codons in 5’UTR and CDS share the same feature. However, the number of 5’UTR stop codons captured by eRF1-seq are too few to generate reliable sequence motif analysis.
  
  (5) Could the authors please clarify what peaks/fractions they are using as the monosome in Figure 4A? From the manner in which the red boxes are drawn on the sucrose gradient profile traces, it seems that the 40S, 60S, 80S monosome and half of the disome peak are included in the monosome fraction.
  
  The red box shown in Figure 4A is a bit misleading. For the massive paralleled reporter assay, we selected ribosome fractions based on the sucrose gradient tracing corresponding to monosome and polysomes, respectively. However, the fraction accuracy is not absolute as the fraction tube corresponding to monosome could contain traces of subunits as well as disomes. In practice, 40S and 60S are less concerned than disome, but the primary component is 80S ribosome.
  
  (6) On page 13, please cite references for Normal mode analysis.
  
  Normal Mode Analysis (NMA) using the Anisotropic Network Model (ANM) is a computationally efficient method for predicting large-scale, functional, and directional protein motions near equilibrium. We have followed the Reviewer’s suggestion by citing a review paper in the field of structural biology (Bahar, I. et al. 2005).
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) The authors interpret the height of RPF peaks at stop codons in their Ribo-Seq data as an indication of pausing by ribosomes during termination, resulting from slow or inefficient decoding of the stop codon and peptide release; although it could equally result from slow recycling of the 60S subunit by ABCE1 following peptide release. Arguing against the latter possibility, they show later in the study that shRNA knockdown of ABCE1 has little effect on the stop codon RPF peaks; however, because the ABCE1 depletion does not elicit collisions near the stop codon in the manner observed in other studies, it appears that the ABCE1 depletion was insufficient to impair recycling substantially. The authors also don't attempt to support their interpretation by showing that depletion of eRF1 increases the stop codon peaks and produces collisions just upstream of the stop codon. They never specify with any precision whether it is stop codon recognition by eRF1, peptide hydrolysis, or recycling of the 60S subunit from the post-termination complex that is delayed, which is very unsatisfying.
  
  We agree with the Reviewer that the RPF density at stop codons only reflects the dwell time of terminating ribosomes. In fact, it is not possible to dissect molecular details from Ribo-seq data sets, same as interpreting other pausing events. Regarding ABCE1, we did observe the increased termination peak in cells with ABCE1 knockdown (Figure S4C). The lack of collisions is perhaps due to incomplete depletion of ABCE1. Notably, ABCE1 depletion selectively increased ribosome density at the –15 nt position, whereas the forward-shifted –12 nt peak was largely unaffected (Figure S4D). These results suggest that ABCE1 primarily facilitates late-stage termination or ribosome splitting, and its absence delays pre-termination progression. Nevertheless, the main focus of the study is to decipher the sequence context of termination pausing, which seems to be irrelevant to ABCE1. We thank the Reviewer for understanding.
  
  (2) They found enrichment for a GA-rich motif in the mRNAs with the largest stop codon peaks, which they attribute to its effect in slowing down some aspect of termination or ribosome recycling to increase the dwell time of the terminating ribosomes. They found no motif, however, in mRNAs containing the smallest RPF peaks at stop codon peaks, which presumably terminate more rapidly; even though they conclude later in the study from their massively parallel reporter assays (MPRA) that "C-richness" in the 9 nt 5' of stop codons enables rapid termination. The mRNAs with high pause scores at the stop codon that are enriched for the GA motif also show lower RPFs in 3'UTRs compared to the low pause score mRNAs, which they interpret to mean that long-lived termination complexes produce more efficient peptide termination and ribosome recycling, while short-lived complexes fail to be recycled and continue translation into the 3'UTR. However, because the 3'UTR reads are in all three frames, this could not occur simply by stop codon readthrough but would also require a frameshift upstream or at the stop codon itself to prevent termination and continued translation into the 3'UTR; and it could also arise from unconventional reinitiation by unrecycled post-termination complexes, which has been seen by others on inhibition of 60S recycling. The authors' interpretation is too simplistic.
  
  We thank the Reviewer’s summary about the sequence features controlling ribosome dwell time at stop codons uncovered by eRF1-seq. We are fully aware of the complex scenarios about 3’UTR translation, however, unconventional reinitiation cannot explain the results of the reporter assay shown in Figure 4D. Unlike frameshifting that generates prolonged products with mixed C-termini, reinitiation is associated with a new start. In Figure 4D, we observed products with C-terminal fused HiBiT, which cannot be explained by reinitiation. We thank the Reviewer for understanding.
  
  (3) They obtain support for the role of a GA-motif in pausing at stop codons from their selective ribosome profiling of eRF1-bound 80S ribosomes present at stop codons, finding a related GA-motif enriched at stop codons with high occupancies of eRF1-bound RPFs. However, once again, there is no C-rich motif enriched upstream of stop codons with low eRF1-bound RPF occupancies, at odds with later claims for such a motif. They ultimately propose that the GA motif pauses terminating ribosomes by base-pairing with the 3' end of 18S rRNA in the ribosome mRNA exit channel, principally utilizing two UU residues at the penultimate bases in the 18S rRNA that presumably base-pair with either A or G residues in the GA motif.
  
  The Reviewer might be confused by the results from Ribo-seq and massively paralleled reporter assay (MPRA). Ribo-seq data sets are limited to endogenous sequences that were shaped during evolution. In contrast, MPRA uses completely randomized sequences that offer unbiased analysis of sequence elements. The lack of C-rich motif in eRF1-seq data sets is due to the under-representation of such sequence elements in human genome. Perhaps this sequence bias is beneficial for termination fidelity by minimizing 3’UTR translation. We have further clarified this point in the revised manuscript.
  
  (4) They claim to have obtained independent confirmation of this last idea from their massively parallel reporter analysis (MPRA), in which sequences upstream of the stop codon of a uORF were randomized to determine those that appear to prevent translation downstream of the uORF and thereby place the mRNA in the monosome fraction versus those that allow downstream translation by any mechanism including leaky scanning of the uORF start codon, stop codon readthrough, or reinitiation (the assay doesn't distinguish between these mechanisms) and place the mRNA in the polysome fraction. In actuality, their results showed that the presence of only GGG triplets at any location in the 9 nt substantially prevents downstream translation, whereas only CCG and CCC proline codons enable downstream translation by one or more mechanisms. In view of their final model, it's very difficult to understand why GGG at any position would be able to base-pair with the U-U residues in the 18S rRNA when the stop codon is in the A site, and also why the many other triplets with two G's, two A's or an A and G base-all consistent with the GA-rich motif identified earlier-would not act similarly. Similarly, it's also puzzling that CCG and CCC can exert their effects at multiple positions upstream of the stop codon, and why the 7 other codons with two C's do not act similarly. Thus, it's unconvincing that a specific C-rich motif (which they refer to repeatedly but never identify) or even C-richness upstream of the stop codon confers elevated downstream translation. It's also important to note that the MPRA does not report on pausing at stop codons explicitly, only on whether ribosomes can be found downstream of the uORF stop codon, and assigning this outcome to the presence or absence of pausing during termination requires an ad hoc assumption that the authors have not identified as such.
  
  The Reviewer brought up excellent points in this comment regarding the MPRA result. Indeed, MPRA does not report ribosome pausing events as pointed out by the Reviewer. Additionally, MPRA is not designed to distinguish mechanisms underlying translational readthrough. As we mentioned above, both MPRA and Ribo-seq bear different experimental features that partly explain the similar, but not identical sequence motif uncovered by two assays. The prominent GGG motif identified by MPRA is intriguing, reminiscent of our prior study focusing on translation initiation (Jia et al. NSMB 2020). We propose that G-rich sequences upstream of stop codons form G-quadruplexes that block ribosome movement, resulting in monosome enrichment. Supporting this notion, the GGG motif was not identified by eRF1-seq, echoing the importance of using complementary experimental procedures in drawing conclusions.
  
  (5) They claim to confirm their conclusions from the profiling and MPRA data by measuring translation of the HiBiT sequence inserted downstream of the stop codon of the uORF in two reporters in which the upstream 9 nt contain either a single C-rich sequence or a single G-rich sequence. It's unclear how or why these two particular sequences were chosen. The G-rich sequence does not conform closely to either of the GA-motifs captured in the sequence LOGOs of Figures 1-2, and as noted above, there was no C-rich motif ever identified in these analyses. Thus, it's unclear whether the different effects of these two sequences are representative of sequences that pause or do not pause terminating ribosomes that they identified by the genome-wide analyses. In addition, given that the exact position of the GG or CC sequences relative to the stop codon doesn't seem to matter based on the MPRA data, it is actually possible to find the same number of base pairs with the 3' end of 18S rRNA for both of the two GA-rich and C-rich sequences analyzed in these reporter assays by sampling different registers of pairing between the mRNA and 18S rRNA. What is needed instead is be a systematic analysis using both the polysome:monosome assay, and the HiBiT translation assay of sequences that can pair perfectly with the 18S rRNA or contain increasing numbers of mismatches predicted to destabilize the putative helix that would be formed, and to determine whether the stability of the helices thus formed is highly correlated with the presence of the reporter mRNA in monosomes and with low HiBiT translation.
  
  We appreciate the Reviewer’s effort to improve our manuscript. The sequences inserted into the reporters were chosen based on several considerations. First, we chose the GA-motif rather than the G-rich sequences because the former represents physiological sequence element uncovered by eRF1-seq. As mentioned above, the G-rich sequences could form G-quadruplex artifacts. Second, the C-rich sequences were uncovered by both eRF1-seq (Figure 2D) and MPRA (Figure 4b). Third, only sequences top ranked were selected for the reporter assay. For the positional effects of inserted sequence elements, it is important to note that the proposed mRNA:rRNA interaction is not static because of the continuous mRNA movement along the channel. Instead of using sequences with perfect pairing, we have conducted experiments by placing the C-rich sequences at different positions of the insert. As shown in Figure S3H, the position relative to the stop codon does not seem to matter. In the revised manuscript, we have rephrased several sentences in the main text to avoid confusion.
  
  (6) They attempt to support their model by overexpressing a mutant 18S rRNA with mutations of the penultimate U-U residues to G-G, and present evidence that this decreases the stop codon RPF peaks on mRNAs rich in GA sequences upstream of the stop codons, and has the opposite effect on mRNAs that are C-rich; however, they never indicate the criteria used to assign mRNAs to these two bins, and whether it is based on the GA-rich motifs/LOGOs identified by genome-wide analysis or on the few triplets turned up by the MPRA. Clearly, it would be far better to conduct the same analysis of motif enrichment for high and low pause scores that produced the motif in Figure 1C and determine if the motif for high pausing switches from the GA-rich motif for WT 18S rRNA to a C-rich motif for the mutant, and vice versa for the low pause score mRNAs. It should also be noted that the C-rich sequence used in the reporter can form only 2 base pairs with the mutant 18S rRNA when the mRNA's C-C dinucleotide base pairs with the new G-G dinucleotide in rRNA, but it can actually form 4 base pairs with the WT 18S rRNA sequence in a different pairing register, undermining their interpretation of these data. Note also that there was no analysis done to determine what proportion of 40S subunits actually contain the mutant 18S rRNA, which is expected to be only a minor fraction under the best circumstances, and cannot simply be taken for granted, requiring a direct analysis of the sequences of the 3' ends of 18S rRNA in the cells expressing the mutant 18S.
  
  The Reviewer’s comment on 18S rRNA mutants are insightful. Given the low percentage of ribosomes incorporated with the rRNA mutants, it is not feasible to conduct motif analysis based on ribosome pausing at stop codons. As shown in Figure 5C, stop codon peaks are still evident after 18S mutant transfection albeit less prominent than the wild type. Notably, introducing 18S rRNA mutants into cells is not an easy task, and we have followed closely the protocol published previously (Burman and Mauro. NAR 2012) to obtain meaningful data. We believe (and hope the Reviewer will concur) that the experiment using the 18S rRNA mutants offers critical evidence in support of the mechanism.
  
  (7) They attempt to implicate Rps26 in the pausing by depleting or overexpressing (OE) the protein and comparing pausing at stop codons between the same two ill-defined GA-rich and C-rich bins of mRNAs mentioned above and by assaying the HiBit reporters. Again, they haven't determined whether the amount of Rps26 in mature 40S subunits is reduced or elevated compared to WT cells, and their interpretation of the OE data actually depends on the occurrence of 40S subunits lacking Rps26 in unstressed WT cells, which seems improbable and requires direct confirmation. Also, they haven't quantified the 80S peaks at the stop codons relative to the CDS reads immediately 5' of the stop codons, which varies with Rps26 OE versus the WT control, and doing so might well contradict their conclusion. Moreover, the C-rich and GA-rich HiBiT reporters behave identically rather than oppositely in response to Rps26 OE, which the authors fail to acknowledge or comment on.
  
  The Reviewer might be confused by the role of Rps26 partly due to the lack of clarity in our original description of the results. In yeast, Rps26 can dissociate from fully assembled 80S ribosomes under stress (Yang, et al. Sci Adv 2022). Therefore, although quantifying the Rps26 in mature 40S subunits is informative, it does not infer the composition of 80S ribosomes in cells with Rps26 depletion or overexpression. As pointed out by the Reviewer, we also noticed that, in cells with Rps26 depletion or overexpression, mRNAs with C-rich sequences showed no difference of ribosome density at stop codons. This is quite expected because C-rich sequences have minimal interaction with the 3’ end of rRNA. As a result, Rps26 depletion or overexpression is not supposed to affect ribosome dwell time at stop codons with upstream C-rich sequences. In contrast, only stop codons preceded with GA-rich sequences are influenced by Rps26 heterogeneity. In the revised manuscript, we have clarified this confusion in the main text.
  
  Additional specific comments
  
  (8) In the Summary statement: "We identify a sequence motif upstream of the stop codon that contributes to termination pausing, which was confirmed by massively paralleled": This is unjustified, as the MPRA showed only that a GGG triplet inserted anywhere in 9 nt 5'of the stop codon reduces ribosomes from traversing a stop codon either by blocking leaky scanning or reinitiation after an upstream uORF, and it is unclear why the position of this triplet does not matter nor why other GA-rich sequences capable of base pairing with the 3' end of 18S rRNA were not identified in the MPRA.
  
  As mentioned above, eRF1-seq and MPRA assays are complementary with advantages and disadvantages. Nevertheless, the Reviewer’s comments are well-taken and we have rephrased the Abstract of the revised manuscript.
  
  (9) A supplementary figure explaining EZRA-Seq would be very helpful.
  
  Since EZRA-seq methodology has been published (Mao, et al. NSMB 2023), we think a citation makes more sense. We thank the Reviewer for understanding.
  
  (10) The bottom plots/histograms of Figure 1A are very unclear. What is the y-axis of the bottom histogram, and relative to what elongating ribosomes have been analyzed?
  
  We apologize for the confusion in the histograms of Figure 1A. We stratified all mappable reads into footprints of initiating, elongating, and terminating ribosomes. Like many Ribo-seq results, the majority of footprints are of 29 nt length. If all three ribosome groups are of the same conformation, they are expected to have the same size distribution of the footprint length with the same bar height. It is true for initiating ribosomes (left) but not terminating ribosomes (right). We have now rephrased the figure legend in the revised manuscript.
  
  (11) Page 5: "A close inspection of stop codon footprints revealed an additional peak at -12 nt, which becomes more prominent when the reads are shorter (Figure 1B)." No explanation is offered for this finding. Do forward-shifted termination complexes have an empty A site owing to dissociation of eRF1? If so, they would be undetectable in eRF1-Seq data.
  
  Previous toe-printing assays have shown that eRF1 induces a forward movement of terminating ribosomes, shifting the leading edge from +13 nt to +15 nt (Pisarev, et al. Cell 2007). Moreover, single-molecule analyses have identified distinct pre- and post-termination phases catalyzed by eRF1 (Lawson, et al. Science 2023). Together, these observations suggest that the two 5’ end peaks correspond to pre- and post-terminating ribosome states, with the latter likely adopting a rotated conformation. We have revised the relevant paragraph in the main text.
  
  (12) Page 5: ". It is possible that the two distinct 5' end peaks represent pre- and post-terminating ribosomes, with the latter assuming the rotated conformation. We could not rule out the possibility that these terminating ribosomes have the stop codons at the P-site prior to disassembly." The logic here is difficult to follow.
  
  We have revised the relevant paragraph in the main text.
  
  (13) Figure 1C: provide coordinates relative to the stop codon on this motif.
  
  The motif analysis is position-independent and there is no coordinate on the logo plot.
  
  (14) Page 6: "This was not due to biased downstream sequences as the +4 nucleotide minimally affected the 3'UTR translation (Figure S1C)." The logic here is unclear.
  
  We have rephrased this sentence to “This effect could not be explained by downstream sequence bias, as the identity of the +4 nt had minimal impact on 3’UTR translation (Figure S1C).”
  
  (15) Page 6: "Like Ribo-seq, we also observed a forward shifting of post-terminating ribosomes from eRF1-seq (Figure 2C). " But by definition, they will have eRF1 in the A site, so why are they 26nt vs 29nt?
  
  Like many Ribo-seq results, the majority of footprints are of 29 nt length. However, ribosome populations with smaller footprint sizes are of physiological meanings, likely due to conformation changes.
  
  (16) Page 6 "In agreement with the Ribo-seq data sets, eRF1-seq revealed that not all the mRNAs exhibited eRF1 peaks at the annotated stop codons (Figure 2B), echoing the wide range of termination pausing." It should be determined whether eRF1 occupancy is correlated with 80S occupancy at stop codons in the standard Ribo-Seq. And if not, why?
  
  As shown in Figure 2B, there is a strong correlation between eRF1-seq and Ribo-seq in terms of termination pausing. However, the pausing index will be different between these two data sets due to distinct normalization. We thank the Reviewer for understanding.
  
  (17) Figure 2D: The plot on the left doesn't specify how far upstream the triplets can be from the stop codon. Is the LOGO significantly more similar to that shown in Fig. 1C than expected by chance alone?
  
  In Figure 2D, the codon frequency analysis is position independent. Similarly, the sequence logo in Figure 1C and Figure 2D is also position independent.
  
  (18) Page 7: ". Notably, three different stop codons show similar pausing features and sequence motifs (Figure S1G and S1I)." The figure citations here are incorrect.
  
  We apologize for the missing Figure S1I, which was also pointed out by Reviewer #1. We have now updated Figure S1 in the revised manuscript.
  
  (19) Page 7: The term "false termination" is a poor descriptor if termination doesn't occur.
  
  We have followed the Reviewer’s suggestion by replacing “false termination” with “failed termination”.
  
  (20) Page 8: "Consistent with previous reports 27, mutating the stop codon UAG abolished the reinitiation event that drives out-of-frame HiBiT translation (Figure 3E)." How is HiBit assayed? No details are given in the legend. This result doesn't confirm any of the actual eIF1 peaks upstream of stop codons, just that REI can occur at some level 5' of stop codons; and the eRF1 peak at the HiBit stop codon would be 3' of the peak at the main stop codon.
  
  HiBiT assay is a standard reporter like luciferase and Promega offers a detection kit, as described in the methods section. The result shown in Figure 3E is to confirm stop codon-associated reinitiation, which suggests that ribosomes migrated from the stop codon could contain eRF1 before reaching a start codon for reinitiation. We have revised this paragraph to avoid confusion.
  
  (21) Figure 4A: Unclear what position 0 to 6 in the bottom heat map corresponds to in the inserted 9 nt sequences. Are these codon positions vs. nucleotide positions? The legend lacks explanatory information.
  
  Figure 4A shows nucleotide positions (x axis) grouped by 3nt to reflect codon information (y axis). For the inserted 9nt random sequences, the last two nucleotides cannot be used because of the fixed nucleotides downstream of the insert. The same analysis has been reported in our prior study (Jia, et al. NSMB 2020).
  
  (22) Page 8: "For instance, codons enriched in frame 2 belong to NUA and NUG, another indication of in-frame stop codons (Figure S3B, bottom panel). " Need more or better explanation here.
  
  We have rephrased this sentence in the main text. “Codons enriched in alternative reading frames were also informative; for example, codons enriched in frame 2 predominantly belong to NUA and NUG, consistent with frameshifted presentations of in-frame stop codons (Figure S3B, bottom panel).”
  
  (23) "This is likely due to the faster turnover of these mRNAs because of 3'UTR translation". Need more or better explanation here.
  
  MPRA in Figure S3C showed that mRNA variants containing C-rich downstream sequence were depleted from both monosome and polysome fractions. Since 3’UTR translation is well-established to induce mRNA decay, it is possible that these sequences are under-represented due to mRNA turnover. We have added more explanations in this paragraph in the revised manuscript.
  
  (24) " Figure 4B: The logic and assumptions of this assay are not explained. How do ribosomes traverse the uORF, by leaky scanning or by stop codon read-through that is impeded by a ribosome stalled at the uORF stop codon? Presumably, it can't be read through as the uORF is out of frame and translation would likely terminate quickly.
  
  The rationale of Figure 4B is very similar to Figure 4A, except for the presence of the stop codon UAG. Under efficient termination, a monosome enrichment is expected, which could be promoted by termination pausing or structural hinderance by G-rich sequences. In contrast, stop codon readthrough or reinitiation would lead to polysome enrichment. We have thoroughly revised this paragraph in the main text.
  
  (25) Figure 4B results: It's unclear why M/P ratios are so low in Figure 4B vs Figure 4A as all constructs in 4B contain a stop codon and should have the high M/P ratios seen for the constructs in panel (A) with stop codons inserted. It's also unclear why the high M/P ratio should be so limited to GGG triplets vs. other triplets that conform to the GA-rich motifs identified above, and also why this triplet would not function at codon position 6. Similarly, it's unclear why only CCG and CCC and not CCU and CCA have an effect, and why only 3 of 9 codons with 2 or more C's have the effect, all suggesting that specific sequences and not just C-rich sequences are promoting read-through. Yet, no C-rich motif was discernible in the profiling experiments above.
  
  We appreciate the Reviewer’s careful reading of our manuscript. In profiling experiments shown in Figure 2, we did observe C-rich codons albeit with variations. Possible reasons include sequence differences between human genome and randomized sequence combinations. In addressing the Reviewer's question 23, we have thoroughly revised this paragraph in the main text.
  
  (26) Page 9: "These results are in line with the sequence specificity in termination pausing revealed by Ribo-seq and eRF1-seq." This is unjustified as the results in 4B are restricted to only GGG triplets rather than numerous triplets that equally conform to the AAGAAGA motif defined above.
  
  We apologize for the overstatement in this sentence. In addressing the Reviewer's question 23 and 24, we have thoroughly revised this paragraph in the main text.
  
  (27) Page 9: "This result is congruent with the MPRA assay, suggesting that the C-rich coding sequence preceding the stop codon not only reduces termination pausing, but also promotes downstream translation." This is unjustified as the single C-rich sequence chosen for the analysis in Figure 4C is not representative of the two C-rich triplets identified in Figure 4B, showing strong evidence of read-through.
  
  In Figure 4C, both C-rich and GA-rich sequences were chosen from shared elements between eRF1-seq and MPRA as they represent physiological sequences associated with termination pausing. The reporter assay is crucial in linking the lack of termination pausing with 3’UTR translation. We thank the Reviewer for understanding.
  
  (28) The analyses in Figures 4C-D suffer from a lack of the no-stop codon controls to allow the standard quantification of read-through as a percentage of continuous translation in the zero frame in the absence of a stop codon.
  
  The Reviewer might have missed the no-stop codon control in Figure 4C, which contains reporters with (bottom) and without (top) UAG stop codon. In Figure 4D, it is not feasible to include no-stop codon control for frameshifting reporters as the HiBiT value will be out-of-chart several orders of magnitude.
  
  (29) Page 10: "Therefore, the C-rich coding sequence triggers ribosome sliding at the stop codon, resulting in 3'UTR translation in all three reading frames." Sliding is an imprecise term. It is presumably a stop codon readthrough accompanied by frameshifting.
  
  We agree with the Reviewer’s suggestion and have replaced the word of “sliding” with “readthrough”.
  
  (30) Page 10: The citation to Figure S3H is incorrect, as there is no panel H.
  
  We are glad to have this opportunity to fix this error. We have now added panel H into the Figure S3 in the revised manuscript.
  
  (31) Page 10: "When the ribosome occupancy in the CDS was normalized, loss of ABCE1 led to a modest increase of stop codon peaks (Figure S4C)". Is this increase reproducible in replicates and statistically significant, as it seems very slight?
  
  The increased ribosome peak at stop codons in cells lacking ABCE1 is not significant, partly due to incomplete depletion of ABCE1 as shown in Figure S4A. Since ABCE1 is not the focus of this study, we did not attempt to knock out ABCE1, which could cause cellular toxicity.
  
  (32) Page 11: "Notably, the elevated ribosome density occurred at all stop codons, an indication of global effects." Where are the data substantiating this claim?
  
  We apologize for the confusion here. In the revised manuscript, we have deleted this sentence from the main text.
  
  (33) Page 11: "A closer look revealed that silencing ABCE1 increased the ribosome density at the -15 nt position". This claim is not convincing in the 29 nt read data, where it should be observed.
  
  We agree with the Reviewer that the increased ribosome density at the -15 nt position is more evident for shorter footprints. We have revised the sentence in the main text.
  
  (34) Page 11: "Since the 3' end of 18S rRNA contains a highly conserved U-rich sequence (GAUCAUUA), the GA- rich sequence element of mRNA could follow U:A and U:G base pairing near the exit site" (Figure 5A and S5A). By contrast, the C-rich sequence motif on mRNA would escape the 18S rRNA checkpoint, resulting in faster mRNA passthrough." This seems simplistic, as there would also be three G-A or A-G mispairings with 18S rRNA at other positions of the (G/A)AAGAAGA motif. Also unclear what the C-rich motif actually is, making it impossible to determine how many pairings it could make with the 18S rRNA sequence.
  
  Unlike base pairing on RNA structures, the putative rRNA:mRNA interaction is dynamic because of the continuous movement of mRNA along the ribosome channel. In fact, perfect base pairing might not be instrumental. Therefore, the difference between GA-rich and C-rich sequences is reflected in the accumulated effect. As mentioned above, the C-rich sequences are derived from both eRF1-seq and MPRA.
  
  (35) Figure S5B: Showing this sequence is misleading. While not described, it is presumably the DNA sequence of the plasmid, not the rRNA sequence, as there is 100% of the mutant sequence. They need to sequence the 3' end of rRNA isolated from ribosomes to confirm the presence of mutant ribosomes at appreciable levels.
  
  The Reviewer is correct that the sequences shown in Figure S5B are from the plasmids. To avoid such confusion, we have removed the sequences in the updated Figure S5B.
  
  (36) Page 12: "When mRNAs are stratified based on the sequence motif upstream of stop codons, we found that overexpression of the 18S mutant reduced the differential termination pausing between GA-rich and C-rich sequences (Figure 5C)". It is not explained what GA-rich or C-richness means precisely. Moreover, the same kind of analysis done in Figure 1C should have been conducted here to determine the LOGOs for high and low pausing for WT vs mutant 18S rRNA.
  
  We understand why the Reviewer repeatedly ask about the GA-rich and C-rich sequences, partly due to the lack of clarity in our original description of the analysis. The GA-rich transcripts were defined as those have the upstream 15-nt sequence with G or A nucleotides more than 65% (9 nt); whereas C-rich transcripts were defined as those with C more than 40% (6 nt). We have now updated the methods section in the revised manuscript.
  
  (37) Page 12: "Notably, the 3' end sequence of 18S rRNA is highly conserved (Figure S5D)". There is no Figure S5D in the figures.
  
  We are glad to have this opportunity to fix this error. We have now added panel D and E into Figure S5 in the revised manuscript.
  
  (38) Page 13: "Further supporting the sequence specificity of termination pausing, testis mRNAs with prominent stop codon peaks are enriched with GA-sequences upstream of the stop codon (Figure S6C). The same group of mRNAs, however, barely exhibit termination pausing in liver." Again, motif analysis of high and low pausing should have been done here.
  
  The motif analysis in mouse tissue samples is less informative because GA-rich sequences will be over-represented in testis, whereas the same group will be under-represented in liver. We had to select the shared mRNAs for comparative analysis. We thank the Reviewer for understanding.
  
  (39) Page 13: "While liver exhibited a similar distribution of Rps26 and RACK1 in polysome fractions, testis showed an evident depletion of Rps26 in polysome (Figure 6C). Notably, a substantial amount of Rps26 is present in the ribosome-free fraction of testis." They failed to normalize Rps26 levels in polysomes for bulk polysome levels, as indicated by the A260 tracings to determine if polysomes are depleted of Rps26, or rather, there is less polysomal Rps26 simply because polysomes are less abundant.
  
  We agree with the Reviewer’s notion regarding different polysome traces between testis and liver. Because the polysome volume is difficult to normalize, we used RACK1, a constitutive component of ribosome, to quantify the amount of polysome.
  
  (40) Page 14: "Indeed, normal mode analysis (NMA) by anisotropic network models suggests that, in the absence of Rps26, both the -3 to -9 extension of the mRNA and the 3' end of 18S rRNA can twist and approximate to each other with improved mutual parity (Figure 7B)." It is unclear what this means.
  
  Normal Mode Analysis (NMA) by Anisotropic Network Model (ANM) is a coarse-grained computational method used to study biomolecular dynamics by modeling proteins as a network of nodes connected by springs. Unlike the Gaussian Network Model (GNM), ANM calculates the full 3D directional preference of motion, enabling characterization of conformational changes, domain movements, and flexibility in large macromolecules. We have added a citation (Bahar, I. et al. 2005) in the revised manuscript.
  
  (41) Page 14: "To investigate whether Rps26 haploinsufficiency affects ribosome dynamics at stop codons, we knocked down Rps26 from HEK293 cells using shRNA (Figure S7A)". Haploinsufficiency properly refers to a heterozygous null/WT genotype, not shRNA knockdown.
  
  The Reviewer is correct in terms of haploinsufficiency. We have replaced the word of “haploinsufficiency” with “reduced Rps26 levels” in the revised manuscript.
  
  (42) Page 14: "The reciprocal change echoes the tissue-specific differences in initiation and termination (Figure 6A). " It's unclear why these peaks should be reciprocally related mechanistically, so examining changes in their ratio may not be incisive. Rps26 KD could reduce the efficiency of termination independently of pausing. And does Rps26 KD affect eRF1 occupancies in parallel with 80S occupancies?
  
  A prior study reported that Rps26 regulates translation initiation by recognizing Kozak sequence elements (Ferretti, et al. NSMB 2017). We therefore speculate that the role of Rps26 in termination might be correlated, although we don’t have direct evidence. We have further clarified this point in the discuss section of the revised manuscript.
  
  (43) Page 14: "The increased termination pausing, once again, primarily occurs at stop codons preceded by GA-rich sequences (Figure 7C)". No statistical analysis of replicates was done to see if the increase is significant, as it is quite small. They could have stratified mRNAs according to the number of base-pairs they can form with 18S rRNA rather than using this nebulous GA-richness, and see if the conclusion still holds.
  
  The metagene analysis shown in Figure 7C is standard for comparison of ribosome footprint distribution. We agree that the increase of termination peak at stop codons preceded by GA-rich sequences is not as striking as it should be, this is an underestimate because only a small fraction of ribosomes have sub stoichiometry of Rps26.
  
  (44) Page 14: "Remarkably, when mRNAs are stratified based on the sequence motif upstream of stop codons, we found that overexpression of Rps26 reduced the ribosome density (>50%) at stop codons preceded by the GA-sequence (Figure 7E)." They failed to normalize reads to the CDS occupancies to control for fewer ribosomes reaching the stop codons, especially considering that depletion of elongating 80S appeared to occur just upstream of stop codons on Rps26 OE. The same problem exists for the C-rich mRNAs. Also, their interpretation of the effects of Rps26 OE depends on there being Rps26-lacking 40S subunits in WT unstressed cells, which seems unlikely and has not been established directly. Finally, they didn't show increased Rps26 content in 40S subunits on Rps26 OE, which is also required.
  
  This question is the same as #7, which we have fully addressed in this letter (page 7).
  
  (45) Page 15: "To affirm the mechanistic connection between stop codon pausing and termination fidelity, we conducted HiBiT reporter assays that showed increased 3'UTR translation in cells with Rps26 overexpression (Figure 7F)." But both the C-rich and GA-rich reporters show increased expression on Rps26 OE. Why should that be if the C-rich sequences don't base pair with 18S rRNA in WT cells and are unaffected by Rps26 depletion? These data suggest that some other mechanism underlies the increased expression of the GA-rich reporters seen on Rps26 OE.
  
  The Reviewer’s concern is valid, and we agree that additional mechanisms might contribute to the increased reporter expression. The simplest explanation is that Rps26 overexpression promotes ribosome biogenesis, which globally increases mRNA translation. Supporting this notion, more polysome could be observed in cells with Rps26 overexpression (Figure S7E).
  
  (46) Page 15: "Without pausing at stop codons, terminating ribosomes are likely to undergo incomplete dissociation, resulting in continuous translation in 3'UTR." The language here is imprecise. Are they proposing reinitiation by unrecycled 80S ribosomes, or stop codon read-through with or without frameshifting, or both?
  
  This question is the same as #2, which we have fully addressed in this letter (page 3).
  
  (47) Page 15: "Importantly, lack of termination pausing leads to stop codon-associated random translation, giving rise to mixed C-terminal extension." Again, what does this mean? Read-through generally accompanied by frameshifting?
  
  Stop codon-associated random translation differs from ribosome readthrough, reinitiation, or frameshifting. We have extensively clarified this confusion in the revised manuscript.
  
  (48) Page 16: "For terminating ribosomes, the prolonged dwell time at stop codons offers an extended window for eRF1 loading, peptide cleavage, and ribosome recycling." This sentence is confusing because the eRF1-Seq data suggest that the pause occurs after eRF1 decodes the stop codon, with delayed peptide cleavage and recycling.
  
  We thank the Reviewer’s effort to improve our manuscript. We have rephrased the entire paragraph in the revised manuscript.
  
  Reviewer #3 (Recommendations for the authors):
  
  The manuscript is well-written, and the conclusions are overall well-supported by the data. I have only a few relatively minor questions and comments:
  
  (1) For termination sites overlapping with coding regions, the lack of 3-nt periodicity downstream of these sites could result from overlapping translation of multiple ORFs, rather than indicating that translation readthrough events can happen in multiple frames. Could the authors clarify this interpretation?
  
  We appreciate the Reviewer’s positive comments on our manuscript. The Reviewer is correct that overlapping ORFs would result in the lack of 3-nt periodicity. Although it is common for overlapping ORFs near the canonical start codons, ORFs overlapping the canonical stop codons are rare. Nevertheless, we have rephrased the statement in the revised manuscript.
  
  (2) The observation that multiple eRF1-seq peaks are located within CDS regions suggests that eRF1 may compete with A-site tRNAs during elongation. This is an interesting finding. Do the authors think this competition could lead to premature termination, or is it more likely to represent elongation pausing? Additionally, do the authors observe corresponding ribosome pausing peaks at these sites in conventional Ribo-seq data?
  
  The Reviewer’s comment on eRF1-seq peaks in CDS is insightful. We agree that pre-mature termination is possible because of competition. However, we do not observe corresponding ribosome pausing peaks in regular Ribo-seq, presumably due to low frequency of which events.
  
  (3) Regarding the regulation of ribosome pausing across tissue types, how robust are these results? For example, are the tissue-specific effects (such as stronger pausing in the testis) consistent among different mice or across age groups, given that many aspects of translational regulation are known to change with aging?
  
  We found that tissue-specific distribution of ribosome footprints is highly reproducible, especially liver and testis. Notably, the lack of termination peaks in liver is also reported by other independent studies (Gobert, et al. PNAS 2020), arguing that such effect is not a result of sequencing bias. We haven’t compared mice with different ages, but aging-associated translational regulation is an interesting topic awaits further investigation.
  
  Reviewer #4 (Recommendations for the authors):
  
  (1) Translation termination has been studied by ribose in several organisms, including mammalian cells and yeast. In those cases, what is analyzed is not the peak height at the stop codon, but rather the difference in the ribosome density before and after the stop. Thus, analyzing peak height is not validated. I understand that this is relevant only for the ribosome profiling experiments (and Ezra-seq), not the RF1 profiling. But the large majority of the data was acquired that way.
  
  With due respect, we disagree with the Reviewer’s point regarding how to study ribosome dynamics at stop codons. Comparing footprint density before and after stop codons does not infer dynamics of terminating ribosomes. By establishing eRF1-seq, we are for the first time able to analyze ribosome behaviors at stop codons, which represents a significant advancement of technological development.
  
  (2) Moreover, the data do not reproduce previous findings, and no attempt is made to connect them to previous data. Previous data have shown that stop codon efficacy varies. This is not reproduced (S1C). Similarly, an effect from the +1 residue is not reproduced. The data isn't stratified by different stop codons, and previous work has shown that different surrounding residues have different effects in the context of different stop codons. Thus, none of the sequencing data is validated or trusted and does not reproduce previous findings.
  
  We are certainly aware of previous findings regarding stop codon readthrough. We would like to emphasize that our findings do not contradict established principles of translation termination. Rather, enabled by the development of eRF1-seq, we provide new insights into termination dynamics that extend existing models.
  
  (3) The GA-rich sequence identified by Ezra-Seq and RF1 seq is not the same, and it differs from previous sequences (Wangen &Green).
  
  We don’t quite understand why the Reviewer is preoccupied with prior studies without accepting new results obtained from newly developed technology. The GA-rich sequences identified by Ezra-Seq and eRF1-seq are similar, albeit not identical. This is simply because eRF1-seq offers much higher resolution to reveal termination pausing than regular Ribo-seq.
  
  (4) The authors claim that the majority of Rf1 peaks are at stop codons, but that is not true. It is only about 30% of the peaks. Also, not all mRNAs have peaks at the stop codons. That is, at best, problematic. Finally, there are mRNAs that are known to "suffer" from NMD. What do these look like in the Ezra-Seq and RF1-Seq? How about mRNAs that have programmed frameshifts? The eRF1 data is invalid.
  
  The Reviewer is confused about the eRF1 peak density versus frequency, which has totally different meanings. Additionally, the Reviewer seems to be surprised that not all mRNAs have peaks at the stop codons. The differential ribosome dynamics at stop codons is an exciting feature previously unappreciated, rather than problematic. Regarding programmed frameshifting, we argue that such events are rare in mammalian cells.
  
  (5) Figure 4 has many flaws; it is hard to know where to start. First, instead of the M/P ratio, one should analyze M/M+P, to normalize out differences in the loading and effects from collisions, which are guaranteed to occur here, but not considered or analyzed. Second, the data are analyzed as if what matters are codons in the P and E site (and beyond, where there are definitely NOT recognized codons). While there is evidence for some interactions, one would think that an additional analysis based on sequence would be helpful. Also, the supplemental data indicate that very rarely are there reciprocal changes (as should be the case), as seen for stop codons. Thus, the assay is at best questionable and likely worse.
  
  The Reviewer appears to be unfamiliar with massively parallelled assay, which has been widely used to uncover sequence elements crucial in translational regulation. We urge the Reviewer to read our prior study using MPRA to investigate alternative translation initiation (Jia, et al. NSMB 2020). The similar approach has also been used to decipher 5’ UTR sequence elements in mRNA engineering (Sample, et al. Nat Biotech 2019).
  
  (6) Things do not look up for the HiBit reporter assay. The two sequences clearly have effects on translation without considering stop codon context (Figure 4C), which need to be taken into account. Also, the effect from the sequences varies in the context of the assay in 4C and 4D (2-fold vs. 5-fold), further questioning the assay. Moreover, the authors claim that re-initiation cannot account for Hibit levels, but that is clearly incorrect. The western in Figure 4E does not reproduce the data in 4D. While Hibit goes up (as in 4D, the putative GFP-fusion goes down. Finally, while the second reading frame should be more efficient, it is not explained and further argues for an artifact. Previous work (and work herein) suggests that read-through occurs equally in each reading frame.
  
  The Reviewer is confused about the HiBiT-based reporter assay shown in Figure 4C-4E. First, we have included important controls, i.e., same reporters without stop codons, to normalize sequence variation. Second, Figure 4C and 4D used totally different reporters and it is not appropriate to directly compare their values. Third, re-initiation events would not generate fusion proteins containing the N-terminal GFP. The Reviewer is encouraged to re-examine the results presented in Figure 4.
  
  (7) No controls for these assays are presented: e.g., stimulation by antibiotics, ABCE1 depletion, etc.
  
  We are not sure which assay the Reviewer is referring to. For reporter assays shown in Figure 4, we focused on effects of cis-sequence elements, rather than trans-acting factors. We thank the Reviewer for understanding.
  
  (8) Figure 5 has similar problems. I don't understand how Figure 5A is made, but when one overlays the cited structures on Rps26, the molecules are identical. I guess the authors chose to build non-existing sequences differently into the structure. There is no basis for that. In panel C, and the same in Figure 7, the number of analyzed mRNAs varies. This could influence the outcome, and the EXACT same set of mRNAs should be analyzed. But the main problem here is that the authors need to analyze readthrough and not peak height, as detailed above. Essential controls are missing that show what fraction of the 18S rRNA is mutated. Previous work has shown that 2 nt-truncated 18S rRNA is actively degraded. It is hard to believe how 15% of altered ribosomes can abolish 100% of the effect from the C-rich sequences. Important validation is missing: the authors should analyze rRNA sequences in their ribo-seq dataset to demonstrate that they have the mutated rRNAs, and that these enrich and de-enrich as predicted.
  
  The Reviewer’s comment on Figure 5A is baseless. As indicated in the Figure legend, Figure 5A was made from the existing cryoEM structure (PDB: 6ZMW). Regarding 18S rRNA mutants, we simply followed prior studies (Burman and Mauro. NAR 2012) and there is no evidence indicating degradation of such rRNA mutants. Given the low percentage of ribosomes incorporated with the rRNA mutants, the observed effect on termination pausing represent an underestimation, rather than an overstatement.
  
  (9) In Figures 5-7, the authors develop a model that the sequence selectivity arises from base pairing between 18S rRNA and the mRNA. If so, then they should really stratify the data by the number of WC pairs that can be formed. And only WC pairs, as GU pairs have a totally different geometry that will likely be discriminated against in this context. Also, the mutation is in a part of the helix that has no effect (Figure S3G). Thus, the data within the manuscript are inconsistent.
  
  As the Reviewer might be aware, GU pairs are commonly found in tRNA and rRNA structures. Since both WC and GU pairs contribute to mRNA:rRNA interaction, there is no point to stratify sequences based on different pairing format. Additionally, we would like to point out that the putative mRNA:rRNA interaction is not static, considering the continuous movement of mRNA along the ribosome channel.
  
  (10) Figure 6 does not agree with published data (Li et al., Nature 2022). Previous work did not show testis depletion of Rps26 in purified ribosomes. This is the critical difference, as the authors here did not purify ribosomes. Also, another Rps is an essential control, even if purified ribosomes are used. This dataset should not be shared. Depletion from polysomes is hard to believe, as overall, there is less signal in the polysomes.
  
  The Reviewer finally made a good point regarding Rps26 in testis. In our study, we did not separate different cell types such as spermatocytes and therefore we do not know which cell type dominantly influences termination pausing.
  
  Regarding varied Rps26 levels in different tissues, we noticed different polysome between testis and liver. Because the polysome volume is difficult to normalize, we used RACK1, a constitutive component of ribosome, to quantify the amount of polysome.
  
  (11) Figure 7 has similar problems to Figure 5. Different pools of mRNAs are analyzed; peak height is not validated. Overexpression of Rps26 is not shown, as only Myc is shown, not Rps26. Beyond that, increased occupancy in ribosomes needs to be shown for the effect to come from ribosomes. Given how sick the cells are, it is most likely that all effects are secondary and arise from whatever else is going on in the overexpression or depletion of Rps26. No controls are presented to show specific effects from Rps26.
  
  We are surprised that the Reviewer ignored the supplementary data that shows Rps26 levels. Regarding controls, it is not appropriate to use different ribosomal proteins because every ribosomal protein has its won functionality. We acknowledge that experiments by gene knockdown is not perfect, but the results are still informative especially when different mRNA pools from the same cells are compared.
  
  (11) The authors need to check Rli1/ABCE levels in their cells. Their data have features that are indicative of low ABCE1 levels. These include a very small effect from ABCE1 depletion. These could be responsible for some of the effects they observe.
  
  Once again, we are surprised that the Reviewer ignored the supplementary data that already shows ABCE1 levels in cells with or without ABCE1 knockdown (Figure S4A). Constantly addressing the Reviewer’s lack of careful reading of our manuscript is frustrateing. Nevertheless, we have thoroughly revised the entire manuscript by clarifying interpretations, moderating mechanistic claims, and expanding relevant discussion.
  
  AuthorResponse
Visit annotations in context

Tags

Review 1

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.16.676599v2
www.biorxiv.org www.biorxiv.org

Identification of nuclear pore proteins at plasmodesmata: potential role in intercellular transport?

1
1. Public_Reviews 16 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  Plasmodesmata are channels that allow cell-cell communication in plants; based on the functional similarities between facilitated transport within plasmodesmata and into the nucleus, the authors speculate that nuclear pore complex proteins might be involved in plasmodesmata function. If supported, this would transform our understanding of cell-to-cell communication in plants. The authors localize nuclear pore complex proteins to plasmodesmata using proteomics and heterologous overexpression; however, the data are incomplete since key controls for localization, functionality, and expression level of fluorescent protein fusions are absent.
  
  Thank you for the constructive reviews. We have tried to address the comments as outlined below. Specifically, we added new data to the manuscript with respect to the assessment of the protein levels of three independent stable Arabidopsis lines expressing NUP62-GFP from its own promoter using mass spectrometry quantification. These experiments were carried out to evaluate whether the observed PD localization of NUP62-GFP to peripheral puncta might be an artifact caused by inadvertent overexpression and resulting mistargeting. Quantitative analysis shows no indication for significant overexpression of NUP62-GFP.
  
  To assess whether the localization of NUPs is distinct from localization of an ER marker, we have now included a comparison of the NUP43-mVenus localization with that of the mCherry-HDEL luminal ER marker, revealing distinct localization patterns. The peripheral puncta thus do not appear to be due to simple ER accumulation.
  
  To evaluate whether the CPR5-mCitrine fusion is functional, we tested whether the fusion construct was able to complement the loss-of-function cpr5-1 mutant. In two independent complementation lines (cpr5-1/CPR5:CPR5-mCitrine), the roots of 14-d old seedlings were significantly longer compared to the cpr5-1 mutant, and four-week-old plants showed a more WT-like growth phenotype. Although we did not detect CPR5-mCitrine fluorescence, the construct appears to be able to restore the wild type phenotype, indicating that the lines express a functional CPR5 protein.
  
  We have restructured the figures and provided additional information in the figure legends.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Plasmodesmata are channels that allow cell-cell communication in plants; based on the functional similarities between facilitated transport within plasmodesmata and into the nucleus, the authors speculate that nuclear pore complex proteins might be involved in plasmodesmata function. In this manuscript, they localize nuclear pore complex proteins to plasmodesmata using proteomics and heterologous overexpression. They also document a possible plasmodesmata transport defect in a mutant affecting one nuclear pore complex protein.
  
  Strengths:
  
  The main strength of this manuscript is the interesting and novel hypothesis. This work could open exciting new directions in our understanding of plasmodesmata function and cell-cell communication in plants. They also localized many NUPs (12/35 Arabidopsis NUPs).
  
  Weaknesses:
  
  The main weakness of this manuscript is that the data are incomplete. While the authors appropriately and frequently acknowledge caveats to their data, two controls are essential to interpret the results that fluorescently-tagged NUPs localize to the plasmodesmata: (1) assessment of the expression level of these fluorescently-tagged NUPs to determine whether the plasmodesmata localization might be an overexpression artefact;
  
  As we outlined in the manuscript, we also considered the possibility that the peripheral localization could be a consequence of overexpression, in particular in the transient expression system. To be able to control the levels, NUP genes were expressed under the control of the b-estradiol-inducible XVE promoter which allows for b-estradiol dose dependent gene expression (Bashandy et al., 2015; Schlücking et al., 2013). We assessed the dependence of localization on expression levels by studying NUP localization under conditions of reduced estradiol concentrations for induction and shortened incubation time. We validated that the fluorescence was substantially reduced relative to the standard estradiol concentration experiments, however we still detected both nuclear and peripheral localization of the NUPs (Figure 4C-F).
  
  We also considered that in stable transformants the expression of one extra copy of a NUP62-GFP fusion under the control of the native promoter could cause a moderate overexpression and as a consequence lead to artifactual accumulation in the periphery (Figure 3C-E).
  
  To evaluate the level of NUP62-GFP fusion protein relative to untransformed controls, we quantified the levels of NUP62 in three independent transgenic fluorescent WT/NUP62p:NUP62-GFP Arabidopsis lines and in Arabidopsis WT using mass spectrometry (new Figure 3F). The new data indicate that there is no significant increase in NUP protein amounts in the lines expressing the fusion construct relative to WT.
  
  We now write in the revised manuscript (line 200-205):
  
  “NUP62 protein abundance in two-week-old cotyledons of the stable NUP62p:NUP62-GFP transformants was not statistically different to NUP62 protein levels in WT (Figure 3F). Notably, the punctate fluorescence at the cell periphery, encompassing both PD-associated and non-PD-associated localization, were not detectable or absent in roots and young leaves of four-day-old seedlings (Figure 3D). However, it cannot be excluded that the GFP fusion impacts NUP62 localization.” We provide a new Method section for the mass spec analysis of the cotyledons in lines 582-590.
  
  The use of antibodies in wild type tissue would be a potential way to avoid overexpression when trying to detect the localization of NUPs in planta. To investigate the localization of NUPs at physiological expression levels, we attempted to immunolocalize NUPs using antibodies. However, the anti-NUP antibodies available to us were not optimized for immunolocalization and we were unable to detect any fluorescence in the cells at the NPC nor the periphery.
  
  (2) assessment of the function of the fluorescently-tagged NUPs, either by molecular complementation of a knockout mutant phenotype or by biochemical methods to test whether the fluorescently-tagged NUP incorporates into nuclear pore complexes. Conducting these experiments for even one fluorescently-tagged NUP would substantially strengthen this manuscript.
  
  We agree with the reviewer that validation of the functionality of NUP fusion proteins would be valuable. Previously, C-terminally fused Arabidopsis NUPs, such as NUP93a-GFP, GP210-GFP, NUP58-GFP were reported to localize to the nuclear envelope when stably expressed in transgenic Arabidopsis lines (Tamura et al., 2010). As reported for transmembrane NUP GP210 and CPR5 fusion proteins (Gu et al., 2016; Tamura et al., 2010), C-terminally fused GP210 and CPR5 localized to the nuclear envelope but not to the nucleoplasm when expressed heterologously in N.benthamiana (see Figure 3-figure supplement 1). We found several soluble NUPs to also localize to the nucleoplasm (PpNUP98.1, PpNUP62, AtNUP62, AtHOS1) (Figure 1-figure supplement 1, Figure 3, Figure 3-figure supplement 1). Previous studies have reported that several FG NUPs (i.e. NUP98a/b or NUP62) and Y-complex NUPs (i.e. HOS1, NUP96, and NUP107) have been found to also localize in the nucleoplasm rather than specifically to the nuclear envelope when expressed as fusion proteins (Chen et al., 2023; Gallemí et al., 2016; Huang et al., 2024; Lazaro et al., 2012). Of note, for NUP98a, Gallemi and colleagues (2016) discussed the localization to the nucleoplasm as confirmation that, like vertebrate NUP98, Arabidopsis NUP98a is a dynamic NUP rather than just a key structural element of the NPC. HOS1 was reported to interact with ICE1, CO, FVE, and HDA6 in the nucleoplasm (Dong et al., 2006; Jung et al., 2012; Lazaro et al., 2012), indicating that HOS1 might dynamically shuttle between the nuclear pore and nucleoplasm, which could also explain the observed nucleoplasmic localization. In Drosophila, the FG-NUPs NUP98, NUP62, and NUP50 localized in the NPC, and also in the nucleoplasm and interacted with genes (Kalverda et al., 2010). The nucleoplasmic localization could thus have a functional relevance. Yet we cannot rule out, whether soluble NUPs mislocalize in overexpression conditions as we state multiple times in the manuscript.
  
  For this revision, we generated two new independent transgenic Arabidopsis lines stably expressing CPR5-mCitrine under control of its own promoter in the cpr5-1 mutant background (cpr5-1/CPR5p:CPR5-mCitrine). The roots were significantly longer in the two independent transgenic cpr5-1/CPR5p:CPR5-mCitrine Arabidopsis lines compared to the cpr5-1 mutant, and four-week-old plants showed a more WT-like growth phenotype (new Figure 7-figure supplement 1, G–I). However, we could not detect fluorescence in the 10-14 day old seedlings, which could be due to a variety of reasons, such as cleavage of the FP and degradation of the FP without accumulating elsewhere in the cells.
  
  In the new manuscript we write in lines 275-283:
  
  “To assess whether the CPR5-mCitrine fusion protein is functional in Arabidopsis, we tested whether CPR5p:CPR5-mCitrine (including all introns) expression in the cpr5-1 mutant background results in a rescue of the severe growth phenotype of the cpr5-1 loss-of-function mutant (Bowling et al., 1997). Indeed, roots were significantly longer in the two independent transgenic cpr5-1/CPR5p:CPR5-mCitrine Arabidopsis lines compared to the cpr5-1 mutant, and four-week-old plants showed a more WT-like growth phenotype (Figure 7-figure supplement 1, G–I). However, we could not detect fluorescence in 10-14 day old seedlings, which could be due to a variety of reasons, such as cleavage of the FP and degradation of the FP without accumulating elsewhere in the cells. The lack of fluorescence in the transgenic lines requires further investigation.“
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors aim to address whether nuclear pore complex components localize and function at PD in plant cells to mediate cell-to-cell communication.
  
  Strengths:
  
  (1) Novelty and Significance:
  
  The core hypothesis, drawing parallels between PD and NPC transport, is highly original and addresses a critical gap in understanding plant intercellular communication. The idea that phase-separated domains formed by FG-NUPs could act as diffusion barriers at PD offers a plausible and sophisticated explanation for their complex transport properties, including size exclusion and facilitated translocation. This could fundamentally change how we view PD function.
  
  (2) Comprehensive Evidence:
  
  The study employs a rigorous and diverse set of experimental approaches, including a comprehensive bioinformatic analysis of both moss and Arabidopsis NUPs in available PD proteomic datasets, extensive imaging analysis of Nup localization in vivo, and functional transport assays using a loss-of-function nup mutant (cpr5). The transport assay is particularly important to provide functional evidence linking CPR5 to PD-mediated transport. The finding that callose levels were not significantly different in cpr5 mutants under these conditions is helpful and supports a distinct, callose-independent mechanism of transport regulation.
  
  (3) Objectivity:
  
  The authors are forthright in discussing the limitations and potential artifacts of their own data, clearly distinguishing between observations and definitive conclusions.
  
  Weaknesses:
  
  While the claims are generally justified as hypotheses or consistent observations, the authors themselves extensively detail the caveats, which are worth reiterating for clarity:
  
  (1) Potential Overexpression Artifacts in Localization:
  
  Although efforts were made to control expression levels, the authors acknowledge that transient overexpression could still lead to NUP accumulation at PD, either as a physiologically relevant accumulation under excess conditions or due to mis-targeting, or even as storage depots. The resolution of confocal microscopy also does not allow for a definitive conclusion on the nature of the location.
  
  We would like to add that in addition to the experiments using estradiol-controlled transient overexpression for localizing NUP fusions, we also provided localization data obtained from Arabidopsis transformants that stably express one extra copy of a NUP62-GFP fusion under the control of the native promoter. In cotyledons, NUP62-GFP localized to the nucleus and in the periphery, and in many cases to PD (Figure 3C-E). In the course of the revision we tested whether the extra copy of NUP62 could cause overexpression that might lead to artifactual accumulation in the periphery.
  
  To evaluate the level of NUP62-GFP fusion protein relative to untransformed controls, we quantified the levels of NUP62 in three independent transgenic fluorescent WT/NUP62p:NUP62-GFP Arabidopsis lines and in Arabidopsis WT using mass spectrometry (new Figure 3F). The new data indicate that there is no significant increase in NUP protein amounts in the lines expressing the fusion construct relative to WT.
  
  We now write in the revised manuscript (lines 200-205):
  
  “NUP62 protein abundance in two-week-old cotyledons of the stable NUP62p:NUP62-GFP transformants was not statistically different to NUP62 protein levels in WT (Figure 3F). Notably, the punctate fluorescence at the cell periphery, encompassing both PD-associated and non-PD-associated localization, were not detectable or absent in roots and young leaves of four-day-old seedlings (Figure 3D). However, it cannot be excluded that the GFP fusion impacts NUP62 localization.“ We provide a new Method section for the mass spec analysis of the cotyledons in lines 582-590.
  
  (2) Proteomics Purity:
  
  The authors note that the presence of NUPs in PD fractions/proteomics cannot definitively rule out contamination, as PD cannot currently be purified to absolute homogeneity and is often contaminated with other organelles, including the nucleus.
  
  We would like to add that despite their low abundance in plant cells, NUPs were found to be enriched in cell wall, and PD fractions relative to total cell extracts (revised Figure 2-supplement 2). To evaluate whether NUP enrichment might be a consequence of contamination by nuclear fractions, for the revision, we evaluated the enrichment of nucleolar proteins and histones. As shown in the revised Figure 2–figure supplement 2, other nuclear proteins did not show a significant enrichment, supporting the notion that NUPs were specifically enriched in PD fractions, consistent with the localization of NUP-FP fusions. We note however, that these data do not demonstrate unambiguously that NUPs are bona fide PD components.
  
  (3) CPR5 Mutant Interpretation:
  
  While cpr5 mutants exhibited reduced macromolecular transport, the authors state that they cannot exclude that the reduced transport is due to secondary effects in the cpr5 mutants, which show rather severe phenotypic defects. This is an important distinction, as CPR5 has known roles in defense responses and hormone signaling that could indirectly influence PD integrity, independent of callose deposition. The lack of effect on small molecule transport is a good control, but the broader pleiotropic effects of cpr5 mutants remain a consideration.
  
  We agree with the assessment of the reviewer. The mutant is compromised in many ways and thus the effects we observe could be indirect. This is stated also in the manuscript (lines 314-317).
  
  (4) Conceptual Distinction between NPC and PD:
  
  The authors correctly point out that while similarities exist, the physical assembly of NUPs at PD must differ from that at the NPC due to the presence of the desmotubule and smaller cytoplasmic sleeve width at PD. Moreover, nucleocytoplasmic transport depends on karyopherin proteins that interact with the NPC central channel to complete the transport. Yet the role of karyopherins in this case is not clear. Therefore, the proposed "PD pore complex" may bear some NPC features, but not be identical.
  
  Reviewer 2 summarized the key concerns that we highlighted and discussed in the manuscript, which addressed differences in PD and NPC architecture. In particular, we noted that one of the major differences in PD is the presence of the desmotubule (in lines 370-372). We also highlighted that we did not detect all NUPs at PD (in lines 375-376). While a negative result, this observation may also be consistent with differences regarding the assembly of NUPs in or near PD vs the NPC. We fully agree with the reviewer that the proposed “PD pore complex” may be not identical to the NPC, and we also discussed that the NUPs seen at PD could represent sites of accumulation in the ER near PD.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This manuscript presents a step towards testing the hypothesis that plasmodesmata have homology to nuclear pores. The similarities between the two structures have long been noted as both structures allow the transport of proteins and nucleic acids, and both structures are composed of curved membranes. The manuscript has identified nuclear pore proteins (NUPs) in plasmodesmal protein fractions and uses live imaging in a non-endogenous system and functional assays of a mutant to propose that this might be a bona fide association.
  
  The conclusions the authors seek to draw are that: NUPs are present in plasmodesmal protein fractions; NUPs localise at plasmodesmata; NUPs might form a pore-gating complex at plasmodesmata, regulating non-specific (2xGFP) and specific (SHR) transport through plasmodesmata
  
  The authors then use these conclusions to propose the possibility that phase separation mediates transport through plasmodesmata. If there is phase separation at plasmodesmata or a nuclear pore-like complex, it would revolutionise the community. However, this data is insufficient to act as a cornerstone for such a discovery.
  
  Strengths:
  
  The strength of the manuscript lies in the boldness and novelty of the idea.
  
  Weaknesses:
  
  The weaknesses lie in the lack of informative controls. The authors' own assessments of their data suggest they agree with this - in their abstract alone, they point out that the transport defects they observe might be off-target effects, and suggest there is a requirement in the future to determine whether the NUPs are bona fide PD components.
  
  Across the proteomic and live imaging experiments, the conclusions could be stronger if they compared the NUP localisation and accumulation with ER proteins - the question of whether NUPs behave like other ER proteins is not addressed. As NUPs reside in the nuclear envelope, continuous with the ER, and the ER traverses plasmodesmata, a comparison between the NUPs and ER proteins would be extremely informative.
  
  We agree with the comments of the reviewer. To assess whether NUPs show localization patterns that are similar to ER proteins, we transiently co-expressed NUP43-mVenus fusions with the mCherry-HDEL luminal ER marker in N.benthamiana. Comparison of the localization patterns reveals distinct patterns of NUP43-mVenus and mCherry-HDEL (see the new Figure5, new Figure 5-figure supplement 1). NUP43-mVenus appears to be associated with the ER, however restricted to subregions that partially overlay with aniline blue-labeled pit fields (new Figure 5, new Figure 5-figure supplement 1).
  
  In the new version of the manuscript, we write (lines 209-214):
  
  “We assessed whether NUP localization is distinct from ER localization in N. benthamiana leaves that heterologously co-expressed NUP43-mVenus and the ER luminal marker mCherry-HDEL. The localization patterns of NUP43-mVenus and of the mCherry-HDEL luminal ER marker were clearly distinct (Figure 5, Figure 5-figure supplement 1). NUP43-mVenus may be associated to the ER, however restricted to subregions of the ER, which partially overlay with aniline blue-labeled pit fields (Figure 5, Figure 5-figure supplement 1).”
  
  Regarding the proteomic identification of NUPs in plasmodesmal fractions, the authors place significant weight on their own metric for PD enrichment, the PD score. As I understand it, this a metric derived from addition of two factors: a two component enrichment score that is the difference between intensity of peptides of a given protein in the PD fraction and cell wall fraction, added to the difference between intensity of peptides of a given protein in the PD fraction and total cell fraction, and a feature score that is a factor that describes representation of protein domains contained in said given protein in the plasmodesmal fraction relative to the representation of that domain in proteins in the whole proteome. The features chosen for analysis are not indicated, and the feature factor, as I understand it, is a score common to all proteins with a given feature. While each of the factors carries a measure of meaning and information, I do not understand how adding them is mathematically or biologically meaningful.
  
  The feature score was defined based on PD proteome analysis previously described (Gombos et al., 2023). Features of known PD proteins were extracted and weighted against the entire Arabidopsis proteome. Structural features included Pfam domains PF00722 (GHL), PF06955 (XET_C), PF08372 (PRT_C), PF00335 (Tetraspanin), and PF00168 (C2 domain). Subcellular localization features included plasma membrane (PM), endoplasmic reticulum (ER), extracellular space (EX), and cell wall (CW). Functional features were assigned according to MapMan categories bin 10, 15, 26, and 30. To clarify the approach, we added a more detailed explanation to the feature score in the Methods of the revised manuscript.
  
  We agree with the reviewer that experimental values and feature factors represent two distinct, independent parameters. The PD score aims to identify proteins that are not only experimentally enriched in the plasmodesmal fraction but also share structural features characteristic of bona fide plasmodesmata-associated proteins, reducing the number of false positive candidates driven by either parameter alone in PD proteome lists. From a mathematical standpoint, we combined the two normalized factors in the PD score by summation, treating them as contributing equally to a protein’s PD association tendency.
  
  Conclusion:
  
  The conclusions of the study are not fully supported in the absence of ER controls. Of note, the imaging is ambiguous because the proteins do not show a discrete plasmodesmal association. This is a localisation reminiscent of cortical ER association and needs to be further investigated to determine whether it is a true and specific plasmodesmal association.
  
  We agree with the reviewer’s comments. In the revised version of the manuscript, we have now included a comparison of the NUP43-mVenus localization with that of the mCherry-HDEL luminal ER marker, which reveals distinct localization patterns (see new Figure5, new Figure 5-figure supplement 1). NUP43-mVenus may be associated with the ER; however, NUP43 is restricted to subregions of the ER, which partially overlay with aniline blue-labeled pit fields (new Figure 5, new Figure 5-figure supplement 1). Whether NUP localization is distinct from cortical ER requires further investigation.
  
  The conclusions drawn from Figure 1, Figure Supplement 4 are confusing. The text describing this data says that "NUPs were enriched in cell wall and PD fractions compared to total cell extract, while the abundance of other nuclear envelope proteins was unaffected by the PD purification and showed no enrichment in PD fractions". However, the data show that there is no difference in the normalised protein intensity for the NUPs across TC, CW, and PD fractions. The only sample that shows enrichment in PDs is the PDLP/MCTPs.
  
  To address this point, we rephrased the text (line 146-152). Among all NUPs identified in our PD proteome, 75% were more abundant in PD fractions (Figure 2-figure supplement 2), exceeding the proportions observed in TC (60%) and CW (~50%) fractions. In contrast, other nuclear proteins such as nuclear envelope proteins, nucleolar proteins, or histones showed PD intensities that fell within or overlapped the ranges observed in TC or CW. The native abundance of NUPs was lower compared to that of proteins from other compartments, which may explain why the enrichment significance was not statistically significant (p = 0.24 for PD vs. TC). By comparison, the corresponding p-values for other nuclear compartment proteins were higher, ranging from 0.5 to 0.9.
  
  Regarding the possibility that there is a pore-gating complex at plasmodesmata. If NUPs are specifically located at plasmodesmata, this is a strong hypothesis. The authors approach this functionally by assaying for protein and dye movement through plasmodesmata in the cpr5 mutants. These experiments suggest that cpr5 mutants have reduced transport through plasmodesmata for both proteins, but not for a smaller dye. They infer that the latter finding suggests that the cpr5 mutant has no alterations in plasmodesmal number, but this is completely unsupported - in their introduction, the authors identify how PD structure can modify transport capacity, so there are many technical and biological phenomena that could explain these data.
  
  We wrote in the manuscript: “The cpr5 mutants showed no detectable defect in small molecule transport indicative of WT-like PD density and preservation of the capability to mediate small molecule transport as shown by ‘Drop-ANd-See’ trans-leaf diffusion assays.”
  
  Indeed, we did not study PD density by e.g. quantification of a PD-marker fluorescence. Theoretically, PD density might be changed and permeabilities adjusted by unknown mechanisms to allow for WT-like small molecule transport. Strikingly, we observed transport differences for larger cargo. As we cannot exclude potential changes in PD density, we have rewritten and deleted the conclusion on PD density and now write: “The cpr5 mutants showed no detectable defect in small molecule transport indicative of preservation of the capability to mediate small molecule transport as shown by ‘Drop-ANd-See’ trans-leaf diffusion assays”. (Lines 310-312)
  
  I note for their DANS assays that the diffusion of dye from ad- to abaxial surface varies in the path followed (indicated by the asymmetry of the surfaces) and is not consistent within a leaf, let alone between leaves. This presents challenges in quantification and data interpretation that have not been addressed, and so the data cannot be confidently concluded to be an indicator of a different phenomenon rather than a less sensitive measure of the same.
  
  Indeed, in our hands, the spread of the small molecule dye did not proceed radially and was very often asymmetrical. Therefore, we quantified the fluorescent area by identifying pixels with fluorescence above a threshold, instead of determining a diameter of the fluorescent area. We describe the analysis in the figure legend and briefly mention it in the method section.
  
  “Fluorescent areas on the abaxial side were identified using auto threshold and Fiji YEN-algorithm with user modifications. The same threshold setting was used for the adaxial side. The extent of dye diffusion was quantified by the ratio between the areal spread of fluorescence on the abaxial side and the areal spread of fluorescence on the adaxial side.” (Figure 7)
  
  Furthermore, to avoid any positional artifacts in the comparison between different plants and genotypes, we only assessed the 4th leaf and 24 hours later the 5th leaf with the same labelling position on the leaf.
  
  Further, as the authors themselves acknowledge, altered protein movement might also arise from an off-target developmental phenotype. Many proteins have been shown to have no association with plasmodesmata but an indirect effect on their function. This hasn't been investigated and so cannot be ruled out.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  This is a really interesting hypothesis, but the support is incomplete.
  
  (1) P. 5 "Although the single insertion Arabidopsis lines tested here should have FG-NUP62-GFP levels closer to native conditions than the heterologous overexpression of FG-NUP62-mVenus in N. benthamiana, it cannot be excluded that the levels in tested lines are still higher than the native levels, or that the fluorescence protein fused to the NUP affects localization." I appreciate the authors' cautious interpretation of their results, but they could exclude both of these possibilities. The first is relatively easy: test the expression level of the transgene compared to endogenous NUP expression; although transcript and protein levels are not tightly correlated, this can give some estimate of whether the transgene is overexpressed. The second would be to conduct complementation assays of a knockout mutant. I understand that this would be difficult if nup mutants are lethal, but it is pretty common practice to transform heterozygotes and isolate homozygotes expressing fluorescent protein to conduct complementation assays. Anyhow, there is a defect in the cpr5 mutants that the authors could assess in complementation assays. Alternatively, the authors could use biochemical approaches to determine whether FP-tagged NUPs are incorporated into nuclear pore complexes. These three experiments, even for only one NUP, would provide compelling evidence that the authors are localizing a functional NUP fusion protein at near-native expression levels. This is essential to support their speculation that NUPs play a biological role in PD.
  
  Thank you for these three important recommendations: the quantification of NUP FP expression, the complementation of a mutant phenotype with NUP FP expression, and the assessment whether NUP FPs are incorporated into the NPC.
  
  First, to evaluate the abundance of NUP62-GFP fusion protein relative to untransformed controls, we quantified the abundance levels of NUP62 in three independent transgenic fluorescent WT/NUP62p:NUP62-GFP Arabidopsis lines and in Arabidopsis WT using mass spectrometry (new Figure 3F). The new data indicate that there is no significant increase in NUP protein amounts in the lines expressing the fusion construct relative to wild type.
  
  We now write in the revised manuscript (line 200-205):
  
  “NUP62 protein abundance in two-week-old cotyledons of the stable NUP62p:NUP62-GFP transformants was not statistically different to NUP62 protein levels in WT (Figure 3F). Notably, the punctate fluorescence at the cell periphery, encompassing both PD-associated and non-PD-associated localization, were not detectable or absent in roots and young leaves of four-day-old seedlings (Figure 3D). However, it cannot be excluded that the GFP fusion impacts NUP62 localization.” We provide a new Method section for the mass spec analysis of the cotyledons in lines 582-590.
  
  Second, to tested whether a NUP fusion is functional we assessed whether CPR5-mCitrine can complement the cpr5-1 mutant phenotype in complementation lines. We generated two new independent transgenic Arabidopsis lines stably expressing CPR5-mCitrine under control of its own promoter in the cpr5-1 mutant background (cpr5-1/CPR5p:CPR5-mCitrine). The roots were significantly longer in the two independent transgenic cpr5-1/CPR5p:CPR5-mCitrine Arabidopsis lines compared to the cpr5-1 mutant, and four-week-old plants showed a more WT-like growth phenotype (new Figure 7-figure supplement 1, G–I). However, we could not detect fluorescence in the 10-14 day old seedlings, which could be due to a variety of reasons, such as cleavage of the FP and degradation of the FP without accumulating elsewhere in the cells.
  
  In the new manuscript we write in lines 275-283:
  
  “To assess whether the CPR5-mCitrine fusion protein is functional in Arabidopsis, we tested whether CPR5p:CPR5-mCitrine (including all introns) expression in the cpr5-1 mutant background results in a rescue of the severe growth phenotype of the cpr5-1 loss-of-function mutant (Bowling et al., 1997). Indeed, roots were significantly longer in the two independent transgenic cpr5-1/CPR5p:CPR5-mCitrine Arabidopsis lines compared to the cpr5-1 mutant, and four-week-old plants showed a more WT-like growth phenotype (Figure 7-figure supplement 1, G–I). However, we could not detect fluorescence in 10-14 day old seedlings, which could be due to a variety of reasons, such as cleavage of the FP and degradation of the FP without accumulating elsewhere in the cells. The lack of fluorescence in the transgenic lines requires further investigation.”
  
  Third, to assess whether NUP FP fusions are also detectable specifically in nuclei, we have provided example images for potential nuclear localization of NUP62-GFP in the stable Arabidopsis line (Figure 3C), and for AtGP210-mVenus, AtNUP98b-mVenus, AtCPR5-mCitrine, and At NUP43-mCitrine in transient expression experiments in N. benthamiana (Figure 3-figure supplement 1).
  
  (2) The rationale for experiments was sometimes unclear. For example, why study Physcomitrium NUPs, then switch to Arabidopsis? Why use heterologous overexpression lines for SIM, rather than the stable Arabidopsis line for NUP62-GFP?
  
  Our initial work focused on the PD proteome in Physcomitrium patens. We had identified NUPs in PD-enriched fractions of the moss (Gombos et al., 2023). To evaluate whether this was a specific feature of the moss, or a technical artifact of PD enrichment in moss extracts, we extended the study to Arabidopsis thaliana and subsequently focused on the higher plant. The text in the manuscript reflects this flow.
  
  The NUP62-GFP stable transgenic Arabidopsis line was generated after the SIM experiments with CPR5-mCitrine. We plan to follow the suggestion of the reviewer to perform SIM experiments with the stable Arabidopsis NUP62p:NUP62-GFP lines.
  
  (3) The organization of the figures was confusing. Why present transient Physco NUP localization, and also Arabidopsis proteomics in Figure 1? Why split the results on transient localization of Arabidopsis NUPs in benth across Figures 2 & 3?
  
  We reorganized the Figures and created a separate proteome main figure (now Figure 2 with 2 figure supplements). We classified Arabidopsis NUPs in FG-NUPs and structural NUPs. Thus, we present the data also in two separate Figures: Figure 3 and supplements, dedicated to FG-NUPs, and Figure 4, dedicated to structural NUPs. According to the NPC, FG-NUPs play a direct role in transport facilitation, setting them apart from the structural NUPs.
  
  (4) Why are several NUPs localized to the interior of the nucleus and not restricted to the nuclear membrane (e.g., Figure 1 Sup 1 top two rows, Figure 2)? How does this unusual nuclear localization alter the authors' interpretation of their results?
  
  We observed that the transmembrane NUPs tested localized to the nuclear envelope and not to the nucleoplasm (see Figure 3-figure supplement 1 for example AtGP210 and AtCPR5). We found several soluble NUPs to also localize to the nucleoplasm (PpNUP98.1, PpNUP62, AtNUP62, AtHOS1). Previous studies had reported that several FG NUPs (i.e. NUP98a/b or NUP62) and Y-complex NUPs (i.e. HOS1, NUP96, and NUP107) also localized in the nucleoplasm rather than specifically to the nuclear envelope when expressed as fusion proteins (Chen et al., 2023; Gallemí et al., 2016; Huang et al., 2024; Lazaro et al., 2012). Of note, for NUP98a, Gallemi and colleagues (2016) discussed the localization to the nucleoplasm as confirmation that, like vertebrate NUP98, Arabidopsis NUP98a is a dynamic NUP rather than just a key structural element of the NPC. HOS1 was reported to interact with ICE1, CO, FVE, and HDA6 in the nucleoplasm (Dong et al., 2006; Jung et al., 2012; Lazaro et al., 2012), indicating that HOS1 might dynamically shuttle between the nuclear pore and nucleoplasm, which could also explain the observed nucleoplasmic localization. In Drosophila, the FG-NUPs NUP98, NUP62, and NUP50 localized in the NPC, and also in the nucleoplasm and interacted with genes (Kalverda et al., 2010). The nucleoplasmic localization could thus have a functional relevance. Yet we cannot rule out, whether soluble NUPs mislocalize in overexpression conditions as we state multiple times in the manuscript.
  
  (5) Figure legends are insufficiently detailed. Figure legends should be sufficiently detailed to explain the figure without consulting the main text. For example,
  
  (a) Figure 1A, 3C don't describe the cell type or even the organism that is being imaged. Are Physco proteins expressed in Physco? Arabidopsis? Benth? Leaves?
  
  We added the missing information including cell types and organism.
  
  (b) In Figure 1 Supplement 3, many abbreviations are not defined (HC, MC, etc).
  
  We now define the abbreviations in the figure legend.
  
  (c) In Figure 2B, the legend says "At least 15 images from 3 biological replicates were analyzed for each NUP", but there are MANY more than 15 datapoints in Figure 2B. What do the points represent?
  
  We obtained at least three independent replicates for each data set we show here. We analyzed 15 ROIs derived from three biological replicates of AtNUP50b. In the other cases, a larger number of experiments was performed resulting in more ROIs being analyzed.
  
  (d) For all microscopy images, are they single images or reconstructions (e.g., maximum projections)?
  
  We now specify single confocal optical section or maximum projections.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) PD index shall be measured for data in Figures 3D and 3E.
  
  To address this question, we have performed PD index quantification for the data in Figures 4D and 4E and added the information to the main text (lines 178-184):
  
  “In leaves transiently expressing NUP43-mCitrine or CPR5-FP fusions, the fluorescence intensity correlated with the estradiol concentration used, with decreased fluorescence intensity for samples where 2µM estradiol was applied versus the intensity in samples exposed to 20µM estradiol (Figure 4 D,E). Notably, the fluorescence ratio between periphery and nucleus did not differ significantly after expression induction by 2 µM compared to 20 µM β-estradiol (Figure 4F) and PD localization was not eliminated (example for localization of NUP43-mCitrine in Figure 4C; PD index(NUP43, 2µM) = 1.42, PD index(CPR5, 2µM) = 1.40).”
  
  (2) The expression level of the native promoter-driven Nup62-GFP shall be measured and compared with the native level using RT-qPCR. Even if this turns out to be an overexpression line, it would still be useful to support the hypothesis.
  
  To evaluate the level of NUP62-GFP fusion protein relative to untransformed controls, we quantified the levels of NUP62 in three independent transgenic fluorescent WT/NUP62p:NUP62-GFP Arabidopsis lines and in Arabidopsis WT using mass spectrometry (new Figure 3F). The new data indicate that there is no significant increase in NUP protein amounts in the lines expressing the fusion construct relative to WT.
  
  We now write in the revised manuscript (line 200-205):
  
  “NUP62 protein abundance in two-week-old cotyledons of the stable NUP62p:NUP62-GFP transformants was not statistically different to NUP62 protein levels in WT (Figure 3F). Notably, the punctate fluorescence at the cell periphery, encompassing both PD-associated and non-PD-associated localization, were not detectable or absent in roots and young leaves of four-day-old seedlings (Figure 3D). However, it cannot be excluded that the GFP fusion impacts NUP62 localization.“ We provide a new Method section for the mass spec analysis of the cotyledons in lines 582-590.
  
  (3) Last sentence in the introduction: Nup136 has been considered as the plant homolog of Nup153.
  
  In the manuscript we wrote:
  
  “The majority of the FG-NUPs were conserved, with only three FG-NUPs lost in the green lineage (NUP153, POM121, NUP358).“
  
  As the FG-NUP136 is the plant homolog to NUP153, we now write (lines 90-92):
  
  “The majority of the FG-NUPs were conserved, with two FG-NUPs apparently lost in the green lineage (POM121, NUP358).“
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Generally, my interpretation of the images in this manuscript is that many of the localisations are not clean and discrete plasmodesmal associations and are rather more consistent with cortical ER association. As the ER is a component of plasmodesmata, the ER is continuous with the nuclear envelope, and the authors also predict and show ER localisation of one of their key NUPs, CPR5 in Figure 4B. This is not necessarily surprising. However, what becomes essential is that the authors need to determine whether NUPs behave any differently from other ER proteins. To that end, I think co-localisations with ER-located proteins would be helpful in interpreting these ambiguous localisations.
  
  To address this point, we performed additional colocalization experiments using an ER marker. In the new version of the manuscript, we now include a comparison of the NUP43-mVenus localization with that of the mCherry-HDEL luminal ER marker, which reveals distinct localization patterns (see new Figure5, new Figure 5-figure supplement 5-1). NUP43-mVenus may be associated with the ER; however, NUP43 is restricted to subregions of the ER, which partially overlay with aniline blue-labeled pit fields (new Figure 5, new Figure 5-figure supplement 1).
  
  (2) The super-resolution images of CPR5 show some clear structures peripheral to plasmodesmata. However, again, I would like to see what an ER protein looks like at this location, as the ER feeds into the plasmodesmata. Is this a specific structure or a general feature of the localisation of an ER protein?
  
  Since mCherry-HDEL (see above) did not show a similar localization or enrichment at PD, we did not perfrom SIM analyses with the marker.
  
  (3) The authors support their use of the PD score using validated PD proteins as the positive control and contaminants from mitochondria and other organelles as the negative control. No mention is made of where ER proteins are classified. The ER passes through plasmodesmata but might also represent a contaminating pool. As NUPs reside in the nuclear envelope, continuous with the ER, a comparison between the NUPs and ER proteins would be extremely informative.
  
  To evaluate a potential enrichment of ER proteins in the plasmodesmata fraction, we analyzed ER protein enrichment and added the new data as a graph in Figure 2-figure supplement 2. ER-resident proteins did not show significant enrichment in the cell wall fraction relative to total cell extract, while displaying a slight but consistent enrichment in the plasmodesmata fraction. Notably, NUPs enrichment was higher in both cell wall fraction and plasmodesmata fraction compared to transmembrane ER-resident proteins. While ER membrane co-purification cannot be entirely excluded, the enrichment of NUPs in the plasmodesmata fraction may not be due to desmotubule membrane carryover alone. The analysis was incorporated into the revised manuscript (lines 152-155).
  
  (4) Regarding the data analysis and use of the Kruskal-Wallis test, the Kruskal-Wallis test tests differences in the distribution of the data, not differences in the mean or median values. In many cases, it can be inferred that the median changes when the data distribution does, but this is not as confident an inference for means. There are other methods available to compare the means of such datasets.
  
  We used the Kruskal–Wallis test for statistical comparison of more than two nonparametric data sets. However, we did not state in the manuscript that we performed a Dunns´ test for the post hoc pairwise comparison after the Kruskal-Wallis test. In the revised manuscript, we added this information in the Methods, Results and Figure legends. For the bombardment experiment data, we now added mean bootstrapping, as used previously in this context (Johnston and Faulkner, 2021). Mean bootstrap analysis for the bombardment data set was performed with n=5000 resamples and we provide the p values and confidence intervals in the figure legend (Figure 7 B):
  
  “Mean fluorescent cell counts: n<sub>(WT)</sub> = 2.67, n<sub>(cpr5-T3)</sub> = 1.59, n<sub>(cpr5-1)</sub> = 0.68; median fluorescence cell counts: n<sub>(WT)</sub> = 2, n<sub>(cpr5-1)</sub> = 0, n<sub>(cpr5-T3)</sub> = 1. Based on Bonferroni-corrected Dunn´s test for pairwise comparison after Kruskal-Wallis test: a indicates significant difference to WT with p(<sub>cpr5-1</sub>) < 10<sup>-15</sup>; b indicates significant difference to WT with p<sub>(cpr5-T3)</sub> = 0.0004; c indicates p(cpr5-1 vs. cpr5-T3) = 0.0002. Mean bootstrap analysis according to (Johnston and Faulkner, 2021) with 95% confidence interval (CI) and bootstrap resampling of B = 5000: CI<sub>WT vs. cpr5-1</sub> [1 x 10<sup>-5</sup> , 0.001], p<sub>(cpr5-1)</sub> = 0002 ; CI<sub>WT vs. cpr5-T3</sub> [1 x 10<sup>-5</sup> , 0.001], p<sub>(cpr5-T3)</sub> = 0.0002; CI<sub>cpr5-1 vs. cpr5-T3</sub>, p<sub>(cpr5-1 vs. cpr5-T3)</sub> = 0.0002 [1 x 10<sup>-5</sup> , 0.001].“
  
  (5) The comments that estradiol induction prevents over-expression, or allows for controlled expression, are not experimentally supported or widely established outside this manuscript. I suggest they tone this claim down.
  
  As outlined above the reduction in estradiol concentration lead to reduced fluorescence intensity for the NUP-FP fusions as one would expect; here notably with a reduction at both nuclei and periphery (Figure 4C-F). The system has been used previously in the Simon lab, from whom we obtained the constructs. There is substantial literature regarding the use of the b-estradiol-inducible XVE promoter system, specifically for b-estradiol dose-dependent gene expression in N. benthamiana leaves (Bashandy et al., 2015; Bleckmann et al., 2010; Borghi, 2010; Schlücking et al., 2013). We assessed the dependence of localization on expression levels by studying NUP localization with a lower estradiol concentration for induction and shortened incubation time. Interestingly, despite the apparent lower expression, we still find NUPs at PD.
  
  References
  
  Bashandy H, Jalkanen S, Teeri TH. 2015. Within leaf variation is the largest source of variation in agroinfiltration of Nicotiana benthamiana. Plant Methods 11:47. DOI: https://doi.org/10.1186/s13007-015-0091-5
  
  Bleckmann A, Weidtkamp-Peters S, Seidel CAM, Simon R. 2010. Stem Cell Signaling in Arabidopsis Requires CRN to Localize CLV2 to the Plasma Membrane. Plant Physiology 152:166–176. DOI: https://doi.org/10.1104/pp.109.149930
  
  Borghi L. 2010. Inducible gene expression systems for plants. In: Hennig L, Köhler C (Eds). Plant Developmental Biology: Methods and Protocols. Humana Press. p. 65–75. DOI: https://doi.org/10.1007/978-1-60761-765-5_5
  
  Bowling SA, Clarke JD, Liu Y, Klessig DF, Dong X. 1997. The cpr5 mutant of Arabidopsis expresses both NPR1-dependent and NPR1-independent resistance. The Plant Cell 9:1573–84.
  
  Chen G, Xu D, Liu Q, Yue Z, Dai B, Pan S, Chen Y, Feng X, Hu H. 2023. Regulation of FLC nuclear import by coordinated action of the NUP62-subcomplex and importin β SAD2. Journal of Integrative Plant Biology 65:2086–2106. DOI: https://doi.org/10.1111/jipb.13540
  
  Dong C-H, Agarwal M, Zhang Y, Xie Q, Zhu J-K. 2006. The negative regulator of plant cold responses, HOS1, is a RING E3 ligase that mediates the ubiquitination and degradation of ICE1. Proceedings of the National Academy of Sciences 103:8281–8286. DOI: https://doi.org/10.1073/pnas.0602874103
  
  Gallemí M, Galstyan A, Paulišić S, Then C, Ferrández-Ayela A, Lorenzo-Orts L, Roig-Villanova I, Wang X, Micol JL, Ponce MR, Devlin PF, Martínez-García JF. 2016. DRACULA2 is a dynamic nucleoporin with a role in regulating the shade avoidance syndrome in Arabidopsis. Development 143:1623–1631. DOI: https://doi.org/10.1242/dev.130211
  
  Gombos S, Miras M, Howe V, Xi L, Pottier M, Kazemein Jasemi NS, Schladt M, Ejike JO, Neumann U, Hänsch S, Kuttig F, Zhang Z, Dickmanns M, Xu P, Stefan T, Baumeister W, Frommer WB, Simon R, Schulze WX. 2023. A high-confidence Physcomitrium patens plasmodesmata proteome by iterative scoring and validation reveals diversification of cell wall proteins during evolution. New Phytologist 238:637–653. DOI: https://doi.org/10.1111/nph.18730
  
  Gu Y, Zebell SG, Liang Z, Wang S, Kang B-H, Dong X. 2016. Nuclear pore permeabilization is a convergent signaling event in effector-triggered immunity. Cell 166:1526-1538.e11. DOI: https://doi.org/10.1016/j.cell.2016.07.042
  
  Huang P, Zhang X, Cheng Z, Wang X, Miao Y, Huang G, Fu Y-F, Feng X. 2024. The nuclear pore Y-complex functions as a platform for transcriptional regulation of FLOWERING LOCUS C in Arabidopsis. The Plant Cell 36:346–366. DOI: https://doi.org/10.1093/plcell/koad271
  
  Johnston MG, Faulkner C. 2021. A bootstrap approach is a superior statistical method for the comparison of non-normal data with differing variances. New Phytologist 230:23–26. DOI: https://doi.org/10.1111/nph.17159
  
  Jung J-H, Seo PJ, Park C-M. 2012. The E3 ubiquitin ligase HOS1 regulates Arabidopsis flowering by mediating CONSTANS degradation under cold stress. Journal of Biological Chemistry 287:43277–43287. DOI: https://doi.org/10.1074/jbc.M112.394338
  
  Kalverda B, Pickersgill H, Shloma VV, Fornerod M. 2010. Nucleoporins directly stimulate expression of developmental and cell-cycle genes inside the nucleoplasm. Cell 140:360–371. DOI: https://doi.org/10.1016/j.cell.2010.01.011
  
  Lazaro A, Valverde F, Piñeiro M, Jarillo JA. 2012. The Arabidopsis E3 ubiquitin ligase HOS1 negatively regulates CONSTANS abundance in the photoperiodic control of flowering. The Plant Cell 24:982–999. DOI: https://doi.org/10.1105/tpc.110.081885
  
  Schlücking K, Edel KH, Köster P, Drerup MM, Eckert C, Steinhorst L, Waadt R, Batistič O, Kudla J. 2013. A new β-estradiol-inducible vector set that facilitates easy construction and efficient expression of transgenes reveals CBL3-dependent cytoplasm to tonoplast translocation of CIPK5. Molecular Plant 6:1814–1829. DOI: https://doi.org/10.1093/mp/sst065
  
  Tamura K, Fukao Y, Iwamoto M, Haraguchi T, Hara-Nishimura I. 2010. Identification and characterization of nuclear pore complex components in Arabidopsis thaliana. The Plant Cell 22:4084–4097. DOI: https://doi.org/10.1105/tpc.110.079947
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.09.02.610746v5
osf.io osf.io

The view-tolerance of human identity recognition depends on horizontal face information.

1
1. Public_Reviews 16 Jun 2026
  
  in eLife (unscoped)
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.
  
  Strengths:
  
  I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes, and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function, favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.
  
  Weaknesses:
  
  I will start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare the target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints, and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. Again, this is sort of interesting and the very different behavior of the model is neat to discuss, but it doesn't seem easy to align with any theoretical perspective on face recognition. My thinking here is that it might be useful to consider an additional alternate model that doesn't specifically exclude the best-matching viewpoint, but perhaps condenses appearance across views into something like a prototype. I could even see an argument for something like the yaw-averages presented earlier in the manuscript as the basis for such a model, but this might be too much of a stretch. Overall, what I'd like to see is some kind of alternate model that incorporates the existence of the best-match viewpoint somehow, but without the explicit exemplar structure of the view-specific model.
  
  The design of the view-tolerant model aligned with the requirements of tolerant recognition and revealed the stimulus information enabling to abstract identity away from variations in face appearance. However, it did not involve the notion that such ability may depend on a prototype or summary representation of face identity built up through varied encounters (Burton, Jenkins, & Schweinberger, 2011; Burton et al., 2016; Jenkins et al., 2011; Menon, Kemp, & White, 2018; Mike Burton, 2013).
  
  We agree with the Reviewer that the average of the different views of a face is a good proxy of its central tendency (i.e., stable identity properties; Figure 1). We thus followed their suggestion and included an additional model observer that compared specific views to full-spectrum view-averaged identities. The examination of the orientation tuning profile of this so-called view-average model observer confirmed the crucial contribution of horizontal identity cues to view-invariant recognition as the horizontal range best predicted the average summary of full-spectrum face appearances across views. This additional model observer is now presented in the Discussion and Supplementary files 2 and 3.
  
  Besides this larger issue, I would also like to see some more details about the nature of the cross-correlation that is the basis for this model comparison. I mostly think I get what is happening, but I think the authors could expand more on the nature of their noise model to make more explicit what is happening before these cross-correlations are taken. I infer that there is a noise-addition step to get them off the ceiling, but I felt that I had to read between the lines a bit to determine this.
  
  In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’
  
  We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers: ‘Sensitivity d’ of the view-tolerant model was much lower than view-selective model and human sensitivity (Supplementary File 2), even without noise. The view-tolerant model therefore processed fully visible stimuli (SNR of 1). This decreased sensitivity in the view-tolerant compared to the view-selective model is expected, as none of the probes exactly matched the target at the pixel level due to viewpoint differences. In contrast to humans who rely on internally stored representations to match identity across views, the model observer lacks such internal representations and entirely relies on (less efficient) pixelwise comparisons.’
  
  Another thing that I think is worth considering and commenting on is the stimuli themselves and the extent to which this may limit the outcomes of their behavioral task. The use of the 3D laser-scanned faces has some obvious advantages, but also (I think) removes the possibility for pigmentation to contribute to recognition, removes the contribution of varying illumination and expression to appearance variability, and perhaps presents observers with more homogeneous faces than one typically has to worry about. I don't think these negate the current results, but I'd like the authors to expand on their discussion of these factors, particularly pigmentation. Naively, surface color and texture seem like they could offer diagnostic cues to identity that don't rely so critically on horizontal orientations, so removing these may mean that horizontal bias is particularly evident when face shape is the critical cue for recognition.
  
  Our stimuli were originally designed by Troje and Bulthoff (1996). These are 3D laser scans of white individuals aged between 20 and 40 years, posing with a neutral expression. Different views of the faces were shot under a fixed illumination. Ears and a small portion of the neck were visible while the hair region was removed. All face images had a normalized skin color and we further converted them to grayscales
  
  While we agree that this stimulus set offers a restricted range of within- and between-identity variations compared to what is experienced in natural settings, we believe that the present findings generalize to more ecological viewing conditions. Indeed, past evidence showed that the recognition of face pictures shot under largely variable pose, age, expression, illumination, hair style is tuned to the horizontal range of the face stimulus (Dakin & Watt, 2009; Dumont, Roux-Sibilon, & Goffaux, 2024). In other words, our finding that view-tolerant identity recognition is mainly driven by horizontal face information would likely replicate with the use of a more ecological stimulus set.
  
  Moreover, the skin color normalization and grayscale conversion, while limiting the range of face variability, did not eliminate the contribution of surface pigmentation in our study. It is thus unlikely that our findings exclusively reflect the orientation dependence of face shape processing. Pigmentation refers to all surface reflectance properties (Russell et al., 2006) and hue (color) is only one among others. The grayscaled 3D laser scanned faces used here contained natural variations in crucial surface cues such as skin albedo (i.e., how light or dark the surface appears) and texture (i.e., spatial variation in how light is reflected); they have actually been used to disentangle the role of shape and surface cues to identity recognition (e.g., Jiang et al., 2009; Russell et al., 2007; Russell et al., 2006; Troje & Bulthoff, 1996; Vuong et al., 2005). Moreover, a past study of ours demonstrated that the diagnosticity of the horizontal range of face information is not restricted to face shape cues; the specialized processing of face shape and surface both selectively rely on horizontal information (Dumont, Roux-Sibilon, & Goffaux, 2024).
  
  For these reasons, the present findings are unlikely to be fully determined by shape processing, and we expect them to generalize to more ecological stimulus sets. We discuss these aspects in the revised manuscript.
  
  Reviewer #2 (Public review):
  
  This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.
  
  One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.
  
  In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’
  
  We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers.
  
  Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.
  
  All stimuli were matched for luminance and contrast. It is crucial to normalize image energy across orientations as natural image energy is disproportionately distributed across orientations (e.g., Hansen et al., 2003). Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Goffaux & Greenwood, 2016; Keil, 2008, 2009). If not normalized after orientation filtering, such uneven distribution of energy would boost recognition performance in the horizontal range across views. Normalization was performed across our experimental conditions merely to avoid energy from explaining the influence of viewpoint on the orientation tuning profile.
  
  We were not aware of any systematic natural variations of energy across face views. To address this, we measured face average energy (i.e., RMS contrast) in the original stimulus set, i.e., before the application of any image processing or manipulation. Background pixels were excluded from these image analyses. Across yaws, we found energy to range between .11 and .14 on a 0 to 1 grayscale. This is moderate compared to the range of energy variations we measured across identities (from .08 to .18). This suggests that variations in energy across viewpoints are moderate compared to variations related to identity. It is unclear whether these observations are specific to our stimulus set or whether they are generalizable to faces we encounter in everyday life. They, however, indicate that RMS contrast did not substantially vary across views in the present study and suggest that RMS normalization is unlikely to have affected the influence of viewpoint on recognition performance.
  
  In the revised methods section, we explicitly motivate energy normalization: ‘Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Goffaux, 2019; Goffaux & Greenwood, 2016; Keil, 2009). Across yaws, we found face energy to range between .11 and .14 on a 0 to 1 grayscale, which is moderate compared to the range of face energy variations we measured across identities (from .08 to .18). To prevent energy from explaining our results, in all images, the luminance and RMS contrast of the face pixels were fixed to 0.55 and 0.15, respectively, and background pixels were uniformly set to 0.55. The percentage of clipped pixel values (below 0 or above 1) per image did not exceed 3%.’.
  
  Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 degrees), but other viewpoints had biases that were slightly off horizontal (e.g., right profile: 80 degrees, left profile: 100 degrees). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.
  
  Indeed, human performance data indicates that while identity recognition remains tuned to horizontal information, horizontal tuning peak shows some variation across viewpoints. We primarily focused on the first aspect because of its direct relevance to our research objective, but also discussed the second aspect: with yaw rotation, certain non-horizontal morphological features such as the jaw line or nose bridge, etc. may increasingly contribute to identity recognition, whereas at frontal or near frontal views, features are mostly horizontally-oriented (e.g., Keil, 2008, 2009). In the revised Discussion, we directly relate the modest fluctuations of peak location to yaw differences in face feature appearance.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  Based on a discussion with the reviewers, we integrated the recommendations and reached a consensus on the eLife assessment. To move from a "solid" to a "compelling/convincing" strength-of-evidence rating, please address the reviewers' comments. Key points are to clarify and test the plausibility of the models (e.g., effects of different noise-addition steps, inclusion/exclusion of specific orientation channels in the view-dependent comparison, and alternative decision criteria), and to address or discuss the limitations of the stimulus set in capturing recognition under more naturalistic scenarios, for example, including texture cues.
  
  Reviewer #1 (Recommendations for the authors):
  
  I generally found the paper to be very well-written, so I have only a few minor comments here.
  
  (1) I didn't really follow why the estimation of the Gaussian functions described in the text was preferred over a simpler ML framework. Do these approaches differ that much? I see references to prior studies in which these were applied, so I can certainly go check these out, but I could see value in adding just a bit of text to briefly make the case that this is important.
  
  Employing a simpler linear framework, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze. The interaction term would almost certainly reach significance but its interpretation would be limited. We would either have to rely on numerous local comparisons, which are not particularly informative for our research objectives (e.g., knowing whether d’ differs significantly between two adjacent orientations at a given viewpoint is of little relevance), or to use a polynomial contrast approach (testing the linear, quadratic, … up to the 7th order trends), which would also be difficult to interpret. For such complex, approximately Gaussian-shaped data, the highest-order polynomial trend would likely provide the best fit, but without offering meaningful insight.
  
  In contrast, a nonlinear approach appears more appropriate. The Gaussian model we used allows us to characterize the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation (or bandwidth) and base amplitude. These parameters are not merely statistical parameters. Rather, they are directly interpretable in cognitive/functional terms. The peak location corresponds to the orientation at which the Gaussian curve is centred, i.e. the preferred orientation band for identity recognition. The standard deviation represents the width of the curve, reflecting the strength or selectivity of the tuning. The base amplitude is the height of the Gaussian curve base, indicating the minimum level of sensitivity, typically found near vertical orientation. Finally, the peak amplitude refers to the height of the Gaussian curve relative to its baseline, that is, it captures the advantage of horizontal over vertical orientations.
  
  Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin & Watt, 2009; Goffaux & Greenwood, 2016). Orientation selectivity at primary stages of visual processing has also been modelled using Gaussian (or Difference of Gaussians; Ringach, Hawken, & Shapley, 2003).
  
  We revised the data analysis section to include a justification for our use of a Gaussian model: “Therefore, fitting the human sensitivity data could be fitted using a simple Gaussian model. seemed most appropriate as it allows characterizing the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation and base amplitude, which are directly interpretable in cognitive/functional terms. Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin & Watt, 2009; Goffaux & Greenwood, 2016). Simpler frameworks, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze and interpret.”
  
  (2) When reporting the luminance and contrast of your stimuli, please make clear what these units and measures are. This was a case where I had to take a second to assure myself that I knew what the values meant.
  
  We clarified that the luminance and contrast values reported in the manuscript are on a grey scale ranging from 0 to 1.
  
  (3) In your Procedure section, I think describing the familiarization task right away would help the text flow more clearly. At present, you began talking about the old/new task, and I was immediately wondering how familiarization worked!
  
  The procedure section now starts with the description of the familiarization task.
  
  (4) p. 3 - "Culminates" doesn't seem like the right word here.
  
  We agree and rephrased this way: ‘The tolerance of face identity recognition is stronger for familiar than unfamiliar faces’.
  
  (5) p. 5 - I think "with the multiple" shouldn't have "the".
  
  Indeed, we removed the “the”.
  
  Reviewer #2 (Recommendations for the authors):
  
  I enjoyed reading the manuscript, but thought the Introduction was a bit long. I wasn't sure about the relevance of the section on temporal contiguity. I think this might have been more relevant if this had been a manipulation in the design. So, I wonder if this might be shortened or removed to focus on the key questions. On the other hand, I found the overview of the view-selective and view-tolerant to be a bit brief. There is plenty of detail here, but I found it difficult to break down what was done when I first read it. It might be good to provide an overview in the Discussion too.
  
  While past research on the contribution of temporal contiguity to face identity recognition brings interesting insights into the nature of the visual experience leading to view-tolerant performance, we agree with the Reviewer that this aspect is not directly at stake here. We reduced the review of this literature in the Introduction.
  
  We clarified the description of the model observers as suggested by the reviewer and made sure to provide an overview of the model observers in the Discussion as well.
  
  References.
  
  Burton, A. M., Jenkins, R., & Schweinberger, S. R. (2011). Mental representations of familiar faces. Br J Psychol, 102(4), 943-958. https://doi.org/10.1111/j.2044-8295.2011.02039.x
  
  Burton, A. M., Kramer, R. S., Ritchie, K. L., & Jenkins, R. (2016). Identity From Variation: Representations of Faces Derived From Multiple Instances. Cogn Sci, 40(1), 202-223. https://doi.org/10.1111/cogs.12231
  
  Collin, C. A., Rainville, S., Watier, N., & Boutet, I. (2014). Configural and featural discriminations use the same spatial frequencies: a model observer versus human observer analysis. Perception, 43(6), 509-526. https://doi.org/10.1068/p7531
  
  Dakin, S. C., & Watt, R. J. (2009). Biological "bar codes" in human faces. J Vis, 9(4), 2 1-10. https://doi.org/10.1167/9.4.2
  
  Dumont, H., Roux-Sibilon, A., & Goffaux, V. (2024). Horizontal face information is the main gateway to the shape and surface cues to familiar face identity. PLOS ONE, 19(10), e0311225. https://doi.org/10.1371/journal.pone.0311225
  
  Goffaux, V., & Greenwood, J. A. (2016). The orientation selectivity of face identification [Article de recherche] [peer-reviewed]. Scientific Reports, 6(34204), 34204. https://doi.org/10.1038/srep34204
  
  Gold, J., Bennett, P. J., & Sekuler, A. B. (1999). Identification of band-pass filtered letters and faces by human and ideal observers. Vision Research, 39(21), 3537-3560. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=10746125
  
  Hansen, B. C., Essock, E. A., Zheng, Y., & DeFord, J. K. (2003). Perceptual anisotropies in visual processing and their relation to natural image statistics. Network, 14(3), 501-526. http://www.ncbi.nlm.nih.gov/pubmed/12938769
  
  Jenkins, R., White, D., Van Montfort, X., & Mike Burton, A. (2011). Variability in photos of the same face. Cognition, 121(3), 313-323. https://doi.org/10.1016/j.cognition.2011.08.001
  
  Jiang, F., Dricot, L., Blanz, V., Goebel, R., & Rossion, B. (2009). Neural correlates of shape and surface reflectance information in individual faces. Neuroscience, 163(4), 1078-1091. https://doi.org/10.1016/j.neuroscience.2009.07.062
  
  Keil, M. S. (2008). Does face image statistics predict a preferred spatial frequency for human face processing? Proc Biol Sci, 275(1647), 2095-2100. https://doi.org/10.1098/rspb.2008.0486
  
  Keil, M. S. (2009). "I look in your eyes, honey": internal face features induce spatial frequency preference for human face processing. PLoS Comput Biol, 5(3), e1000329. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19325870
  
  Menon, N., Kemp, R. I., & White, D. (2018). More than a sum of parts: robust face recognition by integrating variation. R Soc Open Sci, 5(5), 172381. https://doi.org/10.1098/rsos.172381
  
  Mike Burton, A. (2013). Why has research in face recognition progressed so slowly? The importance of variability. Quarterly journal of experimental psychology, 66(8), 1467-1485. https://doi.org/10.1080/17470218.2013.800125
  
  Näsänen, R. (1999). Spatial frequency bandwidth used in the recognition of facial images. Vision Research, 39(23), 3824-3833. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=10748918
  
  Oruc, I., Shafai, F., Murthy, S., Lages, P., & Ton, T. (2019). The adult face-diet: A naturalistic observation study. Vision Res, 157, 222-229. https://doi.org/10.1016/j.visres.2018.01.001
  
  Ringach, D. L., Hawken, M. J., & Shapley, R. (2003). Dynamics of orientation tuning in macaque V1: the role of global and tuned suppression [Research Support, Non-U.S. Gov't
  
  Research Support, U.S. Gov't, P.H.S.]. Journal of neurophysiology, 90(1), 342-352. https://doi.org/10.1152/jn.01018.2002
  
  Russell, R., Biederman, I., Nederhouser, M., & Sinha, P. (2007). The utility of surface reflectance for the recognition of upright and inverted faces. Vision Res, 47(2), 157-165. https://doi.org/10.1016/j.visres.2006.11.002
  
  Russell, R., Sinha, P., Biederman, I., & Nederhouser, M. (2006). Is pigmentation important for face recognition? Evidence from contrast negation. Perception, 35(6), 749-759. https://doi.org/10.1068/p5490
  
  Troje, N. F., & Bulthoff, H. H. (1996). Face recognition under varying poses: the role of texture and shape. Vision Res, 36(12), 1761-1771. https://doi.org/10.1016/0042-6989(95)00230-8
  
  Vuong, Q. C., Peissig, J. J., Harrison, M. C., & Tarr, M. J. (2005). The role of surface pigmentation for recognition revealed by contrast reversal in faces and Greebles. Vision Res, 45(10), 1213-1223. https://doi.org/10.1016/j.visres.2004.11.015
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

osf.io/preprints/psyarxiv/8au9j_v7
www.biorxiv.org www.biorxiv.org

Bidirectional redistribution of actomyosin drives epithelial invagination in ascidian siphon tube morphogenesis

1
1. Public_Reviews 16 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper investigates the physical basis of epithelial invagination in the morphogenesis of the ascidian siphon tube. The authors observe changes in actin and myosin distribution during siphon tube morphogenesis using fixed specimens and immunohistochemistry. They discover that there is a biphasic change in the actomyosin localization that correlates with changes in cell shapes. Initially, there is the well-known relocation of actomyosin from the lateral sides to the apical surface of cells that will invaginate, accompanied by a concomitant lengthening of the central cells within the invagination, but not a lot of invagination. Coincident with a second, more rapid, phase of invagination, the authors see a relocalization of actomyosin back to the lateral sides of the cells. This 2nd "bidirectional" relocation of actin appears to be important because optogenetic inhibition of myosin in the lateral domain after the initial invaginations phase resulted in a block of further invagination. Although not noted in the paper, that the second phase of siphon invagination is dependent on actomyosin is interesting and important because it has been shown that during Drosophila mesoderm invagination that a second "folding" phase of invagination is independent of actomyosin contraction (Guo et al. elife 2022), so there appear to be important differences between the Drosophila mesoderm system and the ascidian siphon tube systems.
  
  Using the experimental data, the authors create a vertex model of the invagination, and simulations reveal a coupled mechanism of apicobasal tension imbalance and lateral contraction that creates the invagination. The resultant model appears to recapitulate many aspects of the observed cell behaviors, although there are some caveats to consider (described below).
  
  We thank the reviewer for the insightful summary and for bringing the important study by Guo et al. (2022) to our attention. We have now added a dedicated comparison with Drosophila ventral furrow invagination in the Discussion, explicitly highlighting that the second rapid folding phase in Drosophila does not require lateral contractility, whereas in our system lateral contractility is obligatory for the accelerated invagination stage.
  
  Strengths:
  
  The studies and presented results are well done and provide important insights into the physical forces of epithelial invagination, which is important because invaginations are how a large fraction of organs in multicellular organisms are formed.
  
  Thank you for this positive assessment and for recognizing the significance of our work in elucidating the physical mechanisms underlying fundamental morphogenetic processes. We have striven to provide a comprehensive and rigorous analysis, and are grateful for this encouraging feedback.
  
  Weaknesses:
  
  (1) This reviewer has concerns about two aspects of the computational model. First, the model in Figure 5D shows a simulation of a flat epithelial sheet creating an invagination. However, the actual invagination is occurring in a small embryo that has significant curvature, such that nine or so cells occupy a 90-degree arc of the 360-degree circle that defines the embryo's cross-section (e.g., see Figure 1A). This curvature could have important effects on cell behavior.
  
  Thank you for bringing up the issue of tissue curvature. In the initial version of our model, we treated the tissue as flat based on the local geometry of the anterior epidermis. Although the embryo at 13 hpf indeed possesses significant curvature, its overall transverse cross-section is approximately elliptical, and the region undergoing invagination is situated in a relatively low-curvature zone, occupying only a 30° ∼ 40° arc of the entire tissue. More importantly, the embryo undergoes anisotropic elongation and expansion, becoming significantly flattened during the accelerated invagination stage, eventually adopting a very flat geometry by 18 hpf. We have now included Figure 5—figure supplement 1 to clarify these global morphological transitions.
  
  Nevertheless, the curvature does exist during the early stages, and we agree that clarifying its potential role is essential. Therefore, in the revised manuscript, we have updated our vertex model to incorporate a simplified circular geometry. Furthermore, unlike Drosophila ventral furrow formation (Guo et al., eLife, 2022), the invagination here eventually forms a hollow tubular structure, which led us to introduce a surface bending stiffness term into the mode. Although global tissue growth is not explicitly modeled, we explored the impact of curvature by varying the initial system size. Our results demonstrate that the invagination process, driven by apico-basal tension imbalance and lateral contraction, is highly localized and remains robust across different curvatures.
  
  (2) The second concern about the model is that Figure 5 D shows the vertex model developing significant "puckering" (bulging) surrounding the invagination. Such "puckering" is not seen in the in vivo invagination (Figure 1A, 2A). This issue is not discussed in the text, so it is unclear how big an issue this is for the developed model, but the model does not recapitulate all aspects of the siphon invagination system.
  
  Thank you for pointing out this. In our experiments, the similar "puckering" shape is observed during the early stages of morphogenesis (~17 hpf, as seen in Figure 1A) when the tissue size is relatively small. However, this feature rapidly disappears as the tissue grows and the overall geometry becomes flatter. This suggests that "puckering" is more pronounced in highly curved epithelia, a phenomenon that aligns with mechanical expectations. Previous vertex models of Drosophila ventral furrow formation do not exhibit this effect (Brodland et al., 2010; Polyakov et al., 2014), because they modeled cells within a rigid unmovable boundary. However, in our system of siphon morphogenesis, a tubular structure ultimately forms in the epithelium without strong boundary constraints. Thus, the mechanical boundary conditions are basically different.
  
  Also, the formation of a hollow tubular structure—supported by strong F-actin accumulation at the tissue surface—indicates a bending stiffness of surface tissue (Figure 1), which we have incorporated into the model. This bending term enforces smooth curvature transitions, which can manifest as a "puckering" shape surrounding the invagination. In our previous flat-geometry model, this significant bending stiffness led to a "puckering" effect surrounding the invagination. In our updated curved vertex model, this phenomenon also exists and is found to be related to tissue curvature. By simulating a larger system with low curvature (N = 324 cells in Figure 6D), we find that this puckering is significantly reduced. This confirms that the shape discrepancy is a size-dependent effect of the bending constraints within a fixed system size that did not account for tissue growth. In biological development, continuous growth and flattening of the embryo diminish this effect (Figure 5—figure supplement 1), aligning our model's predictions.
  
  Furthermore, we note that the cell-cell adhesion between the surface epithelium and the internal bulk cells (a factor not explicitly captured in our current model) likely further suppresses such evagination in vivo, as outward puckering would necessitate the coordinated deformation of the underlying tissues. We aim to investigate the interplay between global growth and local active forces in future work. We have added a detailed description and mechanical explanation of these simulated shapes in the revised manuscript.
  
  (3) In Figure 2A, Top View, and the schematic in Figure 2C, the developing invagination is surrounded by a ring of aligned cell edges characteristic of a "purse string" type actomyosin cable that would create pressure on the invaginating cells, which has been documented in multiple systems. Notably, the schematic in Figure 2C shows myosin II localizing to aligned "purse string" edges, suggesting the purse string is actively compressing the more central cells. If the purse string consistently appears during siphon invagination, a complete understanding of siphon invagination will require understanding the contributions of the purse string to the invagination process.
  
  Thank you for this excellent observation. We agree that the ring-like actomyosin structure is a prominent feature during the initial stages of invagination, and its potential role warrants discussion. We carefully re-examined our data. Our analysis confirms that this myosin ring is most pronounced during the early initial invagination stage. This inward compression from the periphery would work in concert with apical constriction to help shape the initial invagination. However, this ring-like myosin pattern significantly diminishes during the accelerated invagination stage, indicating that sustained compression from the purse string is not required for the entire process. We have added a discussion of this point in the revised manuscript. We also agree with that future experiments using laser ablation or optogenetic inhibition specifically targeting this actomyosin ring would be valuable to further dissect its precise contribution during the early invagination stage, and we have noted this as a future direction in the Discussion.
  
  (4) The introduction and discussion put the work in the context of work on physical forces in invagination, but there is not much discussion of how the modeling fits into the literature.
  
  We thank the reviewer for this suggestion. We have now incorporated additional references and discussion regarding existing theoretical models and the physical forces involved in tissue invagination. These previous studies provided the foundational framework for our updated curved vertex model. We have also added an explanation of how our model differs from these existing works and discussed potential future directions for further investigation.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors propose that bidirectional translocation of actomyosin drives tissue invagination in Ciona siphon tube formation. They suggest a two-stage model where actomyosin first accumulates apically to drive a slow initial invagination, followed by translocation to lateral domains to accelerate the invagination process through cell shortening. They have shown that actomyosin activity is important for invagination - modulation of myosin activity through expression of myosin mutants altered the timing and speed of invagination; furthermore, optogenetic inhibition of myosin during the transition of the slow and fast stages disrupted invagination. The authors further developed a vertex model to validate the relationship between contractile force distribution and epithelial invagination.
  
  Thank you for your thoughtful and accurate summary of our work and for your constructive critique.
  
  Strengths:
  
  (1) The authors employed various techniques to address the research question, including optogenetics, the use of MRLC mutants, and vertex modelling.
  
  (2) The authors provide quantitative analyses for a substantial portion of their imaging data, including cell and tissue geometry parameters as well as actin and myosin distributions. The sample sizes used in these analyses appear appropriate.
  
  (3) The authors combined experimental measurements with computer modeling to test the proposed mechanical models, which represents a strength of the study. It provides a framework to explore the mechanical principles underlying the observed morphogenesis.
  
  We are grateful for your positive assessment of the multidisciplinary approaches, quantitative analyses, and the integration of modeling with experiments.
  
  Weaknesses:
  
  (1) The concept of coordinated and sequential action of apical and lateral actomyosin in support of epithelial folding has been documented through a combination of experimental and modeling approaches in other contexts, such as ascidian endoderm invagination (PMID: 20691592) and gastrulation in Drosophila (PMIDs: 21127270, 22511944, 31273212). While the manuscript addresses an important question, related findings have been reported in these previous studies. This overlap reduces the degree of novelty, and it remains to be clarified how their work advances beyond these prior contributions.
  
  We thank the reviewer for raising this important point. In the revised Introduction and Discussion, we have explicitly distinguished our findings from prior studies. Specifically: (1) Unlike ascidian endoderm invagination, where actomyosin shifts from apical to basolateral (Sherrard et al., 2010), our system exhibits a bidirectional redistribution between apical and lateral domains, with the basal domain playing a passive role. (2) Unlike Drosophila ventral furrow invagination, where lateral contractility is not essential for the second folding phase (Guo et al., 2022), our optogenetic inhibition demonstrates that lateral contractility is obligatory for the accelerated invagination stage. These comparisons, now clearly stated in the Introduction and Discussion, establish bidirectional actomyosin redistribution as a distinct mechanical paradigm for sequential morphogenesis. We believe these revisions adequately clarify how our work advances beyond prior contributions.
  
  (2) One of the central statements made by the authors is that the translocation of actomyosin between the apical and lateral domains mediates invagination. The use of the term "translocation" infers that the same actomyosin structures physically move from one location to another location, which is not demonstrated by the data. Given the time scale of the process (several hours), it is also possible that the observed spatiotemporal patterns of actomyosin intensity result from sequential activation/assembly and inactivation/disassembly at specific locations on the cell cortex, rather than from the physical translocation of actomyosin structures over time.
  
  We thank the reviewer for this important point. We agree that our data do not demonstrate physical translocation of actomyosin structures, and that the observed patterns could arise from sequential assembly/disassembly over time. To avoid overinterpretation, we have replaced “translocation” with “redistribution” throughout the manuscript (including the title) and toned down the language in the Results and Discussion.
  
  (3) Some aspects of the data on actomyosin localization require further clarification. (1) The authors state that actomyosin translocation is bidirectional, first moving from the lateral domain to the apical domain; however, the reduction of the lateral actomyosin at this step was not rigorously tested. (2) During the slow invagination stage, it is unclear whether myosin consistently localizes to the apical cell-cell borders or instead relocalizes to the medioapical domain, as suggested by the schematic illustration presented in Figure 2C. (3) It is unclear how many cells along the axis orthogonal to the furrow accumulate apical and lateral myosin.
  
  Thank you for your insightful comments, which will help us significantly improve the clarity and rigor of our actomyosin localization analysis. To address the points raised, we undertake several key revisions: First, we have added new quantitative analyses of active myosin intensity from earlier time points (14-15 hpf) to rigorously support the initial lateral-to-apical redistribution phase (Figure 2B). Second, the schematic in Figure 2C has been corrected to show myosin at the apical cell‑cell borders. We have clarified that redistribution occurs in a domain of approximately 15‑20 cells (the invagination primordium), not only the center cell.
  
  (4) The overexpression of MRLC mutants appears to be rather patchy in some cases (e.g., in Figure 3A, 17.0 hpf, only cells located at the right side of the furrow appeared to express MRLC T18ES19E). It is unclear how such patchy expression would impact the phenotype.
  
  Thank you for your observation. We acknowledge that mosaic expression is common in Ciona electroporation. For all quantitative analyses, we only selected embryos in which the central cell, along with more than half of the surrounding cells in the primordium, showed clear expression of the plasmid. This selection criterion has been added to the Materials and Methods section.
  
  (5) In the optogenetic experiment, it appears that after one hour of light stimulation, the apical side of the tissue underwent relaxation (comparing 17 hpf and 16 hpf in Figure 4B). It is therefore unclear whether the observed defect in invagination is due to apical relaxation or lack of lateral contractility, or both. Therefore, the phenotype is not sufficient to support the authors' statement that "redistribution of myosin contractility from the apical to lateral regions is essential for the development of invagination".
  
  We have performed the additional immunostaining experiment of myosin II. The new data (Figure 4—figure supplement 2) showed that light stimulation specifically reduced lateral myosin intensity without significantly affecting apical myosin compared to the dark control. Therefore, the observed block of invagination is primarily due to loss of lateral contractility.
  
  (6) The vertex model is designed to explore how apical and lateral tensions contribute to distinct morphological outcomes. While the authors raise several interesting predictions, these are not further tested, making it unclear to what extent the model provides new insights that can be validated experimentally. In addition, modeling the epithelium as a flat sheet and not accounting for cell curvature is a simplification that may limit the model's accuracy. Finally, the model does not fully recapitulate the deeply invaginated furrow configuration as observed in a real embryo (comparing 18 hpf in Figure 5D and 18 hpf in Figure 1A) and does not fully capture certain mutant phenotypes (comparing 18 hpf in Figure 5F and 18 hpf in Figure 3B right panel).
  
  Thank you very much for these helpful and constructive comments. We have addressed your concerns through the following model updates and clarifications.
  
  First, we have reformulated our vertex model from a flat sheet to a curved geometry that incorporates initial tissue curvature. We found that the core mechanical mechanism, mediated by the coupling of apical and lateral active contraction, consistently recapitulates the experimental invagination process. By independently inhibiting apical or lateral contractions in the model, we further clarified their distinct mechanical contributions to tissue bending and cell shortening.
  
  Regarding the model predictions concerning the apical-to-lateral redistribution of actomyosin in the original version (previously shown in Figure 6E-H), we agree that these lacked direct experimental validation in the current study and may have strayed from the primary focus on the invagination mechanism itself. Therefore, we have removed these predictive components from the revised manuscript. Instead, we have refocused our analysis on the robustness of the localized active process across tissues of varying sizes and curvatures, particularly because the in vivo invagination is accompanied by global tissue growth and geometry changes.
  
  Finally, we acknowledge that the simulated final shapes do not perfectly match the experimental geometry in every detail. We attribute these discrepancies to the omission of global tissue growth and the simplification of cell-cell adhesions between the surface epithelium and internal bulk cells. While these factors are not the primary drivers of the invagination, they undoubtedly refine the local morphology. We have added discussions of these limitations in the revised manuscript and aim to incorporate precise experimental measurements of tissue growth and inter-layer interactions in future modeling efforts.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this manuscript by Qiao et al., the authors seek to uncover force and contractility dynamics that drive tissue morphogenesis, using the Ciona atrial siphon primordium as a model. Specifically, the authors perform a detailed examination of epithelial folding dynamics. Generally, the authors' claims were supported by their data, and the conceptual advances may have broader implications for other epithelial morphogenesis processes in other systems.
  
  Thank you for your positive summary and for recognizing the broader implications of our work.
  
  Strengths:
  
  The strengths of this manuscript include the variety of experimental and theoretical methods, including generally rigorous imaging and quantitative analyses of actomyosin dynamics during this epithelial folding process, and the derivation of a mathematical model based on their empirical data, which they perturb in order to gain novel insights into the process of epithelial morphogenesis.
  
  Thank you for highlighting the strengths of our multidisciplinary methodology.
  
  Weaknesses:
  
  There are concerns related to wording and interpretations of results, as well as some missing descriptions and details regarding experimental methods.
  
  We have revised the manuscript to address your concerns regarding the wording and the details of the methodology.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  Based on the feedback from the reviewers, a focus on the following major points has the potential to improve the overall assessment of the significance of the findings and the strength of the evidence:
  
  (1) It would be helpful to clearly articulate how these findings advance the field beyond what has already been demonstrated or suggested in other systems.
  
  We thank the editor for this helpful suggestion. To better articulate how our findings advance the field, we have revised both the Introduction and Discussion to explicitly contrast our system with previously studied invagination models. Specifically, we highlight that our work demonstrates a bidirectional redistribution of actomyosin between apical and lateral domains, which differs from the apical-to-basolateral shift reported in ascidian endoderm invagination. Moreover, we emphasize that lateral contractility is obligatory for the accelerated invagination stage in our system, whereas in Drosophila ventral furrow invagination the second folding phase can proceed without it. These comparisons have been clearly presented in the revised manuscript. We think our findings represent a distinct mechanical paradigm for sequential epithelial morphogenesis.
  
  (2) It would be helpful to clarify the meaning of "translocation" and more explicitly describe the temporal and spatial patterns of active myosin localization during the two steps of invagination.
  
  We have replaced the term “translocation” with “redistribution” throughout the manuscript, including the title. We have also added new quantitative analyses of active myosin intensity from earlier time points (14–15 hpf) to rigorously support the initial lateral-to-apical redistribution phase (Figure 2B). High-resolution top-view images have been included to show the ring‑like localization of myosin at the apical cell‑cell junctions during the initial stage (Figure 2A). The schematic in Figure 2C has been corrected to accurately reflect the predominant localization of active myosin at the apical cell‑cell borders.
  
  (3) It would be helpful to explain how the optogenetic data support the conclusion that "redistribution of myosin contractility from the apical to lateral regions is essential for the development of invagination".
  
  We have performed additional experiments combining optogenetic inhibition with subsequent immunostaining of active myosin II (anti-pS19 MRLC). We quantitatively compared the distribution of actomyosin in light‑stimulated versus dark‑control embryos. The new data show that after light exposure, lateral myosin intensity is significantly reduced compared to the dark control, whereas apical myosin levels decrease similarly in both groups. This indicates that the optogenetic manipulation effectively attenuates lateral contractility during the accelerated invagination stage without affecting concurrent apical contractility changes. These results directly support the conclusion that lateral contractility acquisition is essential for invagination progression. (Figure 4—figure supplement 2)
  
  (4) It would be helpful to describe how the modeling work fits within the existing literature on modeling epithelial folding and to address discrepancies between the model and the actual biological observations, such as tissue curvature, limited invagination depth in the model, and the "puckering" surrounding the invagination. In addition, certain descriptions of the modeling results should be clarified, as suggested by Reviewer #3.
  
  We thank the referees for the detailed and constructive comments on our modeling work. In response to these suggestions, we have significantly updated the theoretical section of the manuscript. Specifically, we have reformulated the vertex model within a curved geometry that represents the entire tissue, and revised the subsequent analyses to better clarify the mechanical principles driving the observed morphogenesis. We have added relevant references and discussed the mechanistic connections and distinctions between our model and previous studies on epithelial invagination. We hope that our point-by-point responses of the modeling work and the corresponding revisions in the manuscript adequately address the reviewers’ concerns.
  
  (5) It would be helpful to elaborate on the methods for quantitative image analysis and statistical tests.
  
  We have thoroughly expanded the Materials and Methods section by adding a dedicated subsection “Quantification and statistical analysis”. This subsection provides step‑by‑step descriptions of how apical, lateral, and basal domains were defined (segmented line, width 1 μm), how normalization was performed (basal intensity set to 1), how center cell height, invagination depth, and lateral cell distance were measured (referencing Figure 1B), and what statistical tests were used (two‑tailed Student’s t‑test, with significance levels indicated). (see revised Materials and Methods, “Quantification and statistical analysis” subsection)
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) This reviewer has concerns about two aspects of the model. First, the model in Figure 5D shows a simulation of a flat epithelial sheet creating an invagination. However, the actual invagination is occurring in a small embryo that has very significant curvature, such that nine or so cells occupy a 90-degree arc of the 360-degree circle that defines the embryo's section (e.g., see Figure 1A). This curvature could potentially have important effects on cell behavior. Ideally, the developed model would reflect the actual geometry of the observed behavior. A more nuanced analysis would provide important insight into whether the embryo's curvature makes a difference. Importantly, any result comparing the planar versus curved system would be interesting because if the model worked equally well in the high curvature or planar systems, the model is robust, or if invagination requires different strategies for high curvature and for planar systems, this is an important finding that reveals the importance of local geometries. I don't think the consideration of invagination from a planar vs curved epithelium has been previously modeled.
  
  We fully agree with the reviewer that comparing planar versus curved systems provides valuable insights into the invagination mechanism. As we addressed in our response to Reviewer #1 (Public Review) - Weakness (1), we have now updated our vertex model to incorporate curved geometries and introduced surface bending stiffness to better reflect the embryo's actual shape. Our systematic comparison reveals that the invagination process, driven by apico-basal tension imbalance and lateral contraction, is indeed highly localized and remains robust across different initial curvatures. We have added Figure 5—figure supplement 1 and corresponding discussions in the revised manuscript to highlight these findings on model robustness and the role of local geometry.
  
  (2) The second concern about the model is that Figure 5D shows the vertex model developing significant "puckering" (evagination) surrounding the invagination. Such "puckering" is not seen in the in vivo invagination (Figures 1A, 2A). This issue is not discussed in the text, so it is unclear how big an issue this is for the developed model. A discussion of this issue in the text would be appropriate. Maybe puckering goes away if a curved epithelium is modeled?
  
  Thank you for this comment. In our model, the "puckering" effect naturally arises due to the presence of surface bending stiffness and the absence of rigid boundary constraints, which resembles the tissue morphology observed at 17 hpf in our experiments. However, our updated simulations show that this effect significantly diminishes as the tissue curvature decreases. We have addressed this concern in detail in our response to Reviewer #1 (Public Review) - Weakness (2) and have included the relevant analysis and discussions in the revised manuscript.
  
  (3) Because of the puckering, it is unclear in the model what measurement is being used to define the invagination depth in Figure 5E. Is the depth from the maximal height of the surrounding epithelial cells? Or the location of the apical surface before invagination begins? It would be helpful to have that parameter better defined, and it would also be helpful to add a line to Figure 5D showing how the reference point for invagination depth.
  
  Thank you for your suggestion. We measured the vertical distance from the baseline connecting the maximal height of apical midpoints of the surrounding cells to the apical surface of the center cell, which is consistent with our experimental measurements. We have now added a schematic line and indicators to Figure 5D.
  
  (4) In Figure 2A Top View, as well as the schematic in Figure 2C, the developing invagination is surrounded by a ring of aligned cell edges characteristic of a "purse string" type actomyosin cable that would create pressure on the invaginating cells, which have been documented in multiple systems. Notably, the schematic in Figure 2C shows myosin II localizing to aligned "purse string" edges, suggesting the purse string is actively compressing the more central cells. If the purse string consistently appears during siphon invagination, a complete understanding of siphon invagination will require understanding the contributions of the purse string to the invagination process. For this paper, a discussion of the possible involvement of a purse string would be helpful for the readers, but follow-up work could include laser cutting or optogenetic blockage of the purse string contractility.
  
  Thank you for your suggestion. We agree that the ring-like actomyosin structure is a prominent feature during the initial stages of invagination, and its potential role warrants discussion. We carefully re-examined our data. Our analysis confirms that this myosin ring is most pronounced during the early initial invagination stage (Figure 2A). This inward compression from the periphery would work in concert with apical constriction to help shape the initial invagination. However, this ring-like myosin pattern significantly diminishes in the accelerated invagination stage. We propose that the purse string may play a collaborative role in the early phase. We agree that follow‑up work (e.g., laser cutting or optogenetic manipulation) would be valuable and have noted this as a future direction in the Discussion.
  
  (5) The introduction and discussion put the work in the context of work on physical forces in invagination, but there is not much discussion of how the modeling fits into the literature. Did the current work advance the state of modeling of such phenomena? What were the strengths and limitations of the modeling in this paper compared to what has been done previously?
  
  Thank you for this suggestion. While we have incorporated additional literature in the revised manuscript as mentioned in our response to Reviewer #1 (Public Review) - Weakness (4), we would like to further clarify the specific advances and limitations of our modeling framework. Our updated vertex model builds upon established foundational frameworks but advances the state of modeling by: (i) incorporating dynamic apico-lateral tension variations coupled with actomyosin signals, and (ii) achieving localized, activity-mediated morphogenesis without the need for external rigid boundary constraints—a feature that distinguishes it from many classical models. We also recognize the model's current limitations. Specifically, it does not explicitly account for compressive stress and global geometric changes induced by tissue growth. The mechanical interactions between surface epithelial cells and the underlying internal bulk cells are also simplified. These factors represent important directions for our future work. We have added a dedicated paragraph in the Modeling and Discussion sections to contrast our model with existing literature and to explicitly state these strengths and limitations.
  
  (6) Figure 4D. Minor point, but the labeling on the X-axis is out of register with the bar graphs.
  
  We have corrected the alignment of the X‑axis labels with the bar graphs in Figure 4D. The figure has been updated accordingly.
  
  (7) Figure 4B does not have a scale bar.
  
  We have added a scale bar to Figure 4B (10 μm).
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Live imaging is necessary to demonstrate bidirectional translocation by visualizing the movement of the actomyosin network between the apical and lateral domains. Alternatively, a term other than "translocation" should be used to describe the observation.
  
  We agree that live imaging of actomyosin movement would be ideal but is technically challenging in this system. Instead, we have replaced the term “translocation” with the more accurate and conservative term “redistribution” throughout the manuscript, including the title, to avoid implying physical movement of the same molecules. This addresses the reviewer’s concern.
  
  (2) The optogenetic tool could be used to its full potential by manipulating myosin spatially or temporally, for example, by inhibiting myosin at various stages or subcellular locations, which would provide an opportunity to thoroughly test the domain and stage-specific needs for actomyosin. That said, I recognize that such experiments may be challenging in the model system used in this study.
  
  We thank the reviewer for this suggestion. We have indeed attempted spatially restricted optogenetic activation in the Ciona atrial siphon system, but found it technically very challenging due to tissue geometry and light scattering. We appreciate the reviewer's understanding of these technical limitations.
  
  (3) Some additional characterization of the optogenetics tool, such as the distribution of active myosin and F-actin post-stimulation, could further strengthen the interpretation of the inhibitory effect on invagination.
  
  We thank the reviewer for this suggestion. After optogenetic inhibition, we fixed and stained embryos for active myosin II. The results (Figure 4—figure supplement 2) show that light exposure significantly reduces lateral myosin intensity compared to the dark control, while apical myosin decreases similarly in both groups. This confirms that the optogenetic manipulation selectively attenuates lateral contractility without affecting apical changes. We have added this data to the Results section.
  
  (4) It would be helpful to address how heterogeneity in MRLC mutant overexpression might impact the interpretation of the outcome.
  
  We acknowledge that mosaic expression is common in Ciona electroporation. For all quantitative analyses, we only selected embryos in which the center cell and more than half of the surrounding cells in the primordium showed clear expression of the plasmid. This selection criterion has been added to the Materials and Methods section.
  
  (5) For Figure 2, it would be helpful to include the en face view of the cells at different apical-basal depths to better demonstrate the changes in the subcellular localization of myosin at different stages.
  
  We have added top‑view images in Figure 2A at both the apical and a deeper (lateral) plane. These images clearly show the ring‑like localization of active myosin at the apical cell‑cell junctions during the initial stage. Together with the cross‑sectional views, they adequately demonstrate the subcellular localization changes.
  
  (6) The Methods section should include more detailed descriptions of image quantification procedures. For example, for Figure 2B, how were the apical and lateral signals defined, and how were background intensities determined? In addition, the methods used for statistical tests should be clearly stated.
  
  We agree that detailed quantification procedures are essential. We have therefore expanded the Materials and Methods with a new subsection “Quantification and statistical analysis”. This subsection includes precise definitions of apical, lateral, and basal domains (segmented line, width 1 μm), background subtraction (region outside the tissue), normalization (basal intensity set to 1), and descriptions of how cell height, invagination depth, and lateral distance were measured (referencing Figure 1B). Statistical tests (two‑tailed Student’s t‑test) and significance levels are clearly stated.
  
  (7) The discrepancies between the model and experimental data, as described above, should be acknowledged. Commentary on how the model's assumptions and setup might contribute to these differences would be helpful.
  
  We thank the reviewer for this suggestion. As detailed in our response to Reviewer #2 (Public Review) - Comment (6), we have included the discrepancies between the model and experimental results in the Modeling and Discussion sections. We have added comments explaining how our key modeling assumptions might contribute to these differences. Specifically, while we have updated the model to a curved geometry, the omission of continuous global tissue growth and expansion could affect the final invagination depth and shape. Meanwhile, the neglect of mechanical interactions between the surface epithelium and the internal bulk cells prevents the model from fully capturing the constraints that refine the local furrow configuration in vivo. By clarifying these limitations, we now provide a more balanced view of the model's scope and its role in identifying the primary mechanical drivers of invagination.
  
  Reviewer #3 (Recommendations for the authors):
  
  General comments:
  
  (1) Methods: More information is needed to describe how imaging and quantification were performed. A couple of examples:
  
  (a) In Figure 1, how were the apical and basal surface area of the center cell quantified?
  
  (b) In Figure 1, Supplement 1, how was fluorescence intensity measured? Was there a constant area or volume that was quantified between samples? This is important because a decreasing apical surface can cause the signal to appear "concentrated" and increased.
  
  We thank the reviewer for this important suggestion. We have added a dedicated subsection “Quantification and statistical analysis” in the Materials and Methods. This subsection includes precise definitions of apical, lateral, and basal domains (segmented line, width 1 μm), background subtraction (region outside the tissue), normalization (basal intensity set to 1), and descriptions of how cell height, invagination depth, and lateral distance were measured (referencing Figure 1B). Statistical tests (two‑tailed Student’s t‑test) and significance levels are also stated.
  
  (2) The manuscript could use some editing and proofreading for grammar.
  
  The manuscript has been carefully edited for grammar and clarity. We thank the reviewer for the suggestion.
  
  Specific points:
  
  (1) Figure 1A: Could the authors please annotate the location of the center cell throughout the time course? This would make it easier for the reader to understand what is being quantified.
  
  We have added arrows to indicate the center cell at each time point in Figure 1A. This makes it easier for readers to follow the quantification.
  
  (2) Figure 1 Supplement 1A, Line 143, "...before 15 hpf, F-actin concentration decreased at the lateral domains..."
  
  It is not clear that the graph shows a decrease in the lateral domains when taking the error bars into account. It is possible that the F-actin concentration is stable in the lateral domains before 15 hpf. Are there some statistical analyses that can be performed?
  
  We re-analyzed the F-actin data and agree that the change before 15 hpf is not statistically convincing given the error bars. However, we have added new quantitative analysis of active myosin (p-MLC) at 14–15 hpf (Figure 2B), which shows a clear and significant shift from lateral to apical enrichment during this early phase. This myosin dynamic strongly supports our hypothesis of bidirectional redistribution. The corresponding text has been updated in the Results section.
  
  (3) Figure 1 Supplement 1A, Line 147-148, "...after 16 hpf, during which apical F-actin levels showed a gradual decline." Based on the graph, it does not appear that apical F-actin levels show a gradual decline after 16 hpf; rather, they may be steady or slightly increase.
  
  We agree with the reviewer. Our original statement was inaccurate. What we intended to emphasize was that at 16 hpf, the F-actin level at the lateral domain exceeded that at the apical domain. The detailed changes of F-actin after 16 hpf were not a focus of our discussion. We have revised the text accordingly to avoid any misinterpretation. The correction has been made in the Results section.
  
  (4) Figure 2C Hypothesis and line 169-170, "Initially, actomyosin translocated from the lateral regions to the apical domains..."
  
  Related to the comment above, it is not clear that one can state that the actomyosin "translocated". The quantification does not necessarily demonstrate a loss of actin at the lateral domain in the initial stage, and even if there was a loss of lateral actomyosin, one would require experiments (perhaps photoconversion experiments) to demonstrate that machinery from the lateral region was transferred to the apical surface, rather than a process of new assembly at the apical surface.
  
  We fully agree with the reviewer. We have replaced the term “translocation” with “redistribution” throughout the manuscript, including the title, to avoid implying physical movement of the same actomyosin structures. The text in the Results and Discussion has been revised accordingly.
  
  (5) A similar comment is relevant to the subsequent statement in line 175, "actomyosin translocated from the apical domains to the lateral regions." Without direct experiments to demonstrate movement of the actomyosin machinery, it is possible that there is de novo assembly of actomyosin in the lateral region rather than translocation.
  
  This wording ("translocation") becomes important primarily because it is in the title and appears to be one of the authors' major conclusions.
  
  We fully agree with the reviewer that the wording is critical given our main conclusion. We have therefore systematically replaced “translocation” with “redistribution” across the manuscript (title, results, and discussion).
  
  (6) Figure 4, Lines 215-216, "These results confirm that the redistribution of myosin contractility from the apical to lateral regions is essential for the development of invagination."
  
  This experiment did not specifically test the redistribution of myosin; rather, the authors demonstrated that myosin contractility globally is necessary for invagination. In these experiments, is it known where the myosin is?
  
  We have performed additional immunostaining experiments (new Figure 4—figure supplement 2) to directly examine myosin distribution after optogenetic inhibition. The results show that light exposure specifically reduces lateral myosin intensity compared to the dark control, while apical myosin decreases similarly in both groups. This demonstrates that the optogenetic manipulation selectively attenuates lateral contractility. We have revised the conclusion to state that the acquisition of lateral contractility is essential for invagination progression. The new data and revised text are in the Results section.
  
  (7) Figure 4B, minor point: It would be helpful if the authors included a timestamp for the bottom row images (Dark 1 h).
  
  Thank you for pointing out this typo. Timestamps have been added to the bottom row images (Dark 1 h) in Figure 4B.
  
  (8) Figure 5E, F, minor point: It seems that the label on the red curve has a typo; it should be T18ES19E (rather than T18AS19E).
  
  Thank you for pointing out this typo. We have corrected it in the revised manuscript (now Figure 6A, B).
  
  (9) Figure 5F and corresponding text: Can the authors please clarify what is meant by "Coupled mode" as marked in the schematic? Is this meant to refer to simultaneous apical constriction and lateral contraction? Or sequential?
  
  We thank the reviewer for this question. By "coupled mode," we refer to the mechanical synergy between apical and lateral contractions in driving the final invagination. As observed in our experimental data and recapitulated in the model, these two processes occur sequentially rather than simultaneously. We have revised the corresponding text to explicitly clarify this sequential process.
  
  (10) Figure 6A, B, Lines 274-275: "...the invagination depth increased significantly under higher alphaa (Figure 6A), while the central height remained relatively independent of alphaa (Figure 6B)." This caused me some confusion until I realized that "Figure 6B" might be a typo and should be Figure 6C.
  
  We sincerely apologize for this confusion. In the revised manuscript, this specific section and the corresponding figures have been updated.
  
  (11) Line 287, typo: I believe that "Figure 5B" should be Figure 6B.
  
  We sincerely apologize for this confusion. In the revised manuscript, this specific section and the corresponding figures have been updated.
  
  (12) Figure 6A, B, comparing invagination depth with varying apical or lateral actomyosin intensity: The authors state that "invagination depth increased significantly under higher alphaa", but describe "mild invagination depth variation" with varied lateral actomyosin intensity. The graphs seem to suggest that there is increased invagination depth when either apical or lateral actomyosin intensity is increased, and that the increase is to a similar extent. Can the authors comment on what they think the differences are, if the apical effect is "significant" but the lateral effect is "mild"?
  
  We thank the reviewer for this meticulous observation. We agree and feel sorry that our original description was not sufficiently precise. In the revised manuscript, we have re-analyzed the distinct contributions of apical and lateral tensions using the updated curved vertex model, which provides a more accurate mechanical decoupling. We have accordingly replaced the previous wording with a more rigorous description of the simulations and streamlined the corresponding figures to ensure the conclusions are clearly supported.
  
  (13) Figure 6H, Lines 307-309, "...stronger regional translocation and redistribution contribute to the rapid reduction in height of invaginating cells..."
  
  It appears from the graph that this is really only apparent at high alpha (total actomyosin); at empirically determined levels (alpha = 1), the effect of varying ratio is less dramatic. Can the authors comment on how significant they consider this effect?
  
  We thank the reviewer for this insightful comment. We agree that the theoretical predictions regarding translocation strength in the original model lacked sufficient experimental validation. To maintain the scientific rigor of our study, we have removed the sections concerning the translocation ratio and the corresponding Figure 6H from the revised manuscript. Instead, we now refocus our analysis on the core mechanical drivers of invagination that are directly supported by our observations. We also have added discussions acknowledging other factors not fully captured in the current model (e.g., tissue growth), which we aim to investigate in future work.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.26.672310v2
www.biorxiv.org www.biorxiv.org

FMRP Regulates Neuronal RNA Granules Containing Stalled Ribosomes, Not Where Ribosomes Stall

1
1. Public_Reviews 16 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We have addressed all the reviewers’ comments through new experiments, additional analyses, or, in some cases, additional text. Below is a summary of the major changes in the manuscript.
  
  (1) We have added a considerable amount of new characterization of the biochemical enrichment of the ribosome clusters, including EM of the ribosome clusters, UV absorbance profiles, immunoblots of additional targets, and additional replicates (new Figure 1). In summary, we provide better evidence that (i) the biochemical enrichment is working and (ii) that the loss of FMRP has no effect on this biological enrichment of ribosomal clusters.
  
  (2) We have now reanalyzed all of the data in Figs. 5-8 using only the data after removing PCR duplicates from the RPFs. Other than the comparison between the nuclease treatments (Fig. 3), only this data is now used. Moreover, we have reanalyzed this data using suggestions from the reviewers, including providing PCA analysis (Fig S5-1), GSEA analysis (Fig 5), and normalizing for group size when comparing significance to total mRNAs, (Fig 6-7). We now also include a new analysis (Fig S7-1) to better explain how the loss of FMRP affects mainly FMRP targets defined by CLIP, but not all mRNAs resistant to run-off.
  
  (3) We are now more conservative in our nomenclature; we use "pellet" instead of "RNA granule (RG)" and "fraction 5/6" instead of "ribosome clusters (RC)". We have added a section to the discussion about the relationship between the RNA granules measured using imaging of hippocampal neurites and the biochemical purification of ribosome clusters in the pellet, as requested by the reviewers.
  
  (4) We have made many other minor changes to the text and analysis, which can be found in the specific response to the reviewers.
  
  (5) One major additional requested change that was not implemented was to repeat our experiments at different time points. We have added a paragraph to the discussion outlining (i) why this was not done and (ii) the caveats of our conclusions without this data being present.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors have investigated the role of FMRP in the formation and function of RNA granules in mouse brain/cultured hippocampal neurons. Most of their results indicate that FMRP does not have a role in the formation or function of RNA granules with specific mRNAs, but may have some role in distal RNA granules in neurons and their response to synaptic stimulation. This is an important work (though the results are mostly negative) in understanding the composition and function of neuronal RNA granules. The last part of the work in cultured neurons is disjointed from the rest of the manuscript, and the results are neither convincing nor provide any mechanistic insight.
  
  Strengths:
  
  (1) The study is quite thorough, the methods and analysis used are robust, and the conclusion and interpretation are diligent.
  
  (2) The comparative study of Rat and Mouse RNA granules is very helpful for future studies.
  
  (3) The conclusion that the absence of FMRP does not affect the RNA granule composition and many of its properties in the system the authors have chosen to study is well supported by the results.
  
  (4) The difference in the response to DHPG stimulation concerning RNA granules described here is very interesting and could provide a basis for further studies, though it has some serious technical issues.
  
  Thank you for these positive comments on the paper.
  
  Weaknesses:
  
  (1) The system used for the study (P5 mouse brain or DIV 8-10 cultured neuron) is surprising, as the majority of defects in the absence of FMRP are reported in later stages (P30+ brain and DIV 14+ neurons). It is important to test if the conclusions drawn here hold good at different developmental stages.
  
  Unfortunately, myelin strongly interferes with the ability to use this protocol to purify ribosome clusters in older brains (See Khandjian et al., 2004). It is possible to redo the ribopuromycylation results at later times in culture, but since we cannot compare this to a comparable time in the brain, we have chosen not to do this experiment. We acknowledge this limitation in the discussion, noting that our results are only a snapshot of development and that different results may be observed at different times.
  
  (2) The term 'distal granules' is very vague. Since there is no structural or biochemical characterization of these granules, it is difficult to understand how they are different from the proximal granules and why FMRP has an effect only on these granules.
  
  We agree with the reviewer and have removed all references to distal granules. We clarified that we did not measure RPM puncta close to the neuron because the much stronger RPM signal made defining puncta more difficult, and thus, we cannot determine if there are differences between proximal and distal puncta.
  
  (3) Since the manuscript does not find any effect of FMRP on neuronal RNA granules, it does not provide any new molecular insight with respect to the function of FMRP
  
  We would respectfully disagree that the study does not provide molecular insight into the function of FMRP, as disproving that FMRP is important for stalling and determining the position of stalling would remove one of the major hypotheses about the function of FMRP, and showing that a major hypothesis in the literature is unlikely to be correct, is at least to me, providing insight. Moreover, we do show an effect of the loss of FMRP on the RPM puncta that represent neuronal RNA granules containing stalled ribosomes. This also provides insight.
  
  Reviewer #2 (Public review):
  
  In the present manuscript, Li et al. use biochemical fractionation of "RNA granules" from P5 wildtype and FMR1 knock-out mouse brains to analyze their protein/RNA content, determine a single particle cryo-EM structure of contained ribosomes, and perform ribo-seq analysis of ribosome-protected RNA fragments (RPFs). The authors conclude from these that neither the composition of the ribosome granules, nor the state of their contained ribosomes, nor the mRNA positions with high ribosome occupancy change significantly. Besides minor changes in mRNA occupancy, the one change the authors identified is a decrease in puromycylated punctae in distal neurites of cultured primary neurons of the same mice, and their enhanced resistance to different pharmacological treatments. These results directly build on their earlier work (Anadolu et al., 2023) using analogous preparations of rat brains; the authors now perform a very similar study using WT and FMR1-KO mouse brains. This is an important topic, aiming to identify the molecular underpinnings of the FMRP protein, which is the basis of a major neurological disease. Unfortunately, several limitations of this study prevent it from being more convincing in its present form.
  
  In order to improve this study, our main suggestions are as follows:
  
  (1) The authors equate their biochemically purified "RG" fraction with their imaging-based detection of puromycin-positive punctae. They claim essentially no differences in RGs, but detect differences in the latter (mostly their abundance and sensitivity to DHPG/HHT/Aniso). In the discussion the authors acknowledge the inconsistency between these two modalities: "An inconsistency in our findings is the loss of distal RPM puncta coupled with an increase in the immunoreactivity for S6 in the RG." and "Thus, it may be that the RG is not simply made up of ribosomes from the large liquid-liquid phase RNA granules."
  
  How can the authors be sure that they are analysing the same entities in both modalities? A more parsimonious explanation of their results would be that, while there might be some overlap, two different entities are analyzed. Much of the main message rests on this equivalence, and I believe the authors should show its validity.
  
  Thank you for your comments. We have been more conservative in the revised paper, referring to the pellet fraction as the pellet fraction rather than the RNA granule fraction to acknowledge the possibility that these two modalities differ. However, we would respectfully disagree that our main message requires RPM-labeled RNA granules in neurites and the ribosome clusters isolated by sedimentation to be “equivalent”. We do believe they are related and added a section in the discussion on this important point.
  
  (2) The authors show that increased nuclease digestion (and magnesium concentration) led to a reduction of their RPF sizes down to levels also seen by other researchers. Analyzing these now properly digested RPFs, the authors state that the CDS coverage and periodicity drastically improved, and that spurious enrichments of secretory mRNAs, which made up one of the major fractions in their previous work, are now reduced. In my opinion, this would be more appropriately communicated as a correction to their previous work, not as a main Figure in another manuscript.
  
  We have removed all discussion of the secretory mRNAs, as our attempts to obtain independent evidence for this finding by examining ribophorin enrichment in the pellet across different Mg<sup>2+</sup> concentrations did not support this interpretation (data not shown in the paper). I understand that the change in nuclease is somewhat out of place narratively, but it is clearly relevant to this work. We would disagree with our previous work requiring a ‘correction’. We believe that the nuclease resistance of the mRNA at the entrance site is important. We reproduce our results from rats with similar nuclease treatment in mice as seen in our previous publication; thus, this work is not wrong. We have a paper in preparation that suggests the secondary structure of the mRNA at this location may be important for stalling and thus feel strongly that this result should remain in the manuscript.
  
  (3) The fold changes reported in Figure 7 (ranging between log2(-0.2) and log2(+0.25)) are all extremely small and in my opinion should not be used to derive claims such as "The loss of FMRP significantly affected the abundance and occupancy of FMRP-Clipped mRNAs in WT and FMR1-KO RG (Fig 7A, 7B), but not their enrichment between RG and RCs".
  
  We agree that the changes are small and indeed did not appear in the DEG analysis. However, because we are analyzing a large set of mRNAs in this analysis, the results are highly significant and remain significant when using the new statistical tests suggested by the reviewer below. We now emphasize that these are small changes and remind readers that none of the individual mRNA changes were significant in the DEG analysis.
  
  (4) Figure 8 / S8-1 - The authors show that ~2/3 of their reads stem from PCR duplicates, but that even after removing those, the majority of peaks remain unaltered. At the same time, Figure S8-1 shows the total number of peaks to be 615 compared with 1392 before duplicate removal. Can the authors comment on this discrepancy? In addition, the dataset with properly removed artefacts should be used for their main display item instead of the current Figure 8.
  
  We now use only the data after removing PCR duplicates for all the analyses except in Figure 3. The number of peaks observed is determined mainly by the threshold used, as stated in the methods “To be identified as a peak, the zenith of an abundance site for the reads must be 4x higher of the average of the total transcript.” Due the lower number of reads after the PCR duplicates fewer peaks reached this threshold.
  
  (5) Figure 9 / S9-1, the density of punctae in both WT and FMR1-KO actually increases after treatment of HHT or Anisomycin (Figure S9-1 B-C). Even if a large fraction would now be "resistant to run-off", there should not be an increase. While this effect is deemed not significant, a much smaller effect in Figure 9C is deemed significant. Can the authors explain this? Given how vastly different the sample sizes are (ranging from 23 neurites in Figures S91 to 5,171 neurites in Figure 9), the authors should (randomly) sample to the same size and repeat their statistical analysis again, to improve their credibility.
  
  The box and whisker plots emphasize the median and not the average. We now also show the averages in Figure S9-1, which indicate a slight decrease for both HHT and anisomycin.
  
  We apologize for the typo in the figure legend in Figure 9, 171, not 5171. We now use random sampling in Figures 6 and 7, where the sample sizes differ substantially.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Li et al describe a set of experiments to probe the role of FMRP in ribosome stalling and RNA granule composition. The authors are able to recapitulate findings from a previous study performed in rats (this one is in mice).
  
  Strengths:
  
  (1) The work addresses an important and challenging issue, investigating mechanisms that regulate stalled ribosomes that are part of stress granules, and focusing on the role of FMRP. This is a complicated problem, given the heterogeneity of the granules and the challenges related to their purification. This work is a solid attempt at addressing this issue, which is widely understudied.
  
  (2) The interpretation of the results could be interesting if supported by solid data. The idea that FMRP could control the formation and release of stress granules, rather than the elongation by stalled ribosomes, is of high importance to the field, offering a fresh perspective into translational regulation by FMRP.
  
  (3) The authors focused on recapitulating previous findings, published elsewhere (Anadolu et al., 2023) by the same group, but using rat tissue, rather than mouse tissue. Overall, they succeeded in doing so, demonstrating, among other findings, that stalled ribosomes are enriched in consensus mRNA motifs that are linked to FMRP. These interesting findings reinforce the role of FMRP in the formation and stabilization of RNA granules. It would be nice to see extensive characterization of the mouse granules as performed in Figure 1 of Anadolu et al., 2023.
  
  (4) Some of the techniques incorporated aid in creating novel hypotheses, such as the ribopuromycilation assay and the cryo-EM of granule ribosomes.
  
  Thank you for these positive comments. We have now added a more extensive characterization in Figure 1.
  
  Weaknesses:
  
  (1) The RNA granule characterization needs to be more rigorous. Coomassie is not proper for this type of characterization, simply because protein weight says little about its nature. The enrichment of key proteins is not robust and seems not to reach significance in multiple instances, including S6 and UPF1. Furthermore, S6 is the only proxy used for ribosome quantification. Could the authors include at least 3 other ribosomal proteins (2 from the small, 2 from the large subunit)?
  
  We have increased N to improve the robustness of the enrichment analysis and added several additional RBPs. Along with Coomassie we now include analysis of UV absorbance and include EMs from these fractions showing the presence of 80S ribosomal clusters in the fractions we are using.
  
  (2) Page 12-13 - The Gene Ontology analysis is performed incorrectly. First, one should not rank genes by their RPKM levels. It is well known that housekeeping genes, such as those related to actin dynamics, molecular transport, and translation, are highly enriched in sequencing datasets. It is usually more informative when significantly different genes are ranked by p-adjust or log2 Fold Change, then compared against a background to verify enrichment of specific processes. However, the authors found no DEGs. I would suggest the removal of this analysis and the incorporation of a gene set enrichment analysis (ranked by p-adjust). I further suggest that the authors incorporate a dimensionality reduction analysis to demonstrate that the lack of significance stems from biology and not experimental artifacts, such as poor reproducibility across biological replicates.
  
  Thank you for the suggestion. We now use GSEA analysis to examine differences in gene sets between WT and FMR1- mice and find some significant changes (new Fig. 5). The old analysis is still included for comparison to our earlier paper as a supplemental figure. We have now included a PCA analysis (FigS5-1) to show reproducibility across biological replicates.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) RNA sequencing comparison between WT and FMR1 KO mice should be carried out at a later developmental stage, which may provide a better difference between these two groups
  
  There are a number of studies that have already done this analysis and in specific brain regions 10.1016/j.neuron.2017.07.013; 10.7554/eLife.46919; 10.3389/fnmol.2017.00340; https://doi.org/10.1016/j.neuron.2023.06.009. The main goal of our RNA-seq was to standardize for the RPF studies, not to identify differences in RNA-seq between WT and FMRP. In the response to public review point 1 we explain why we do not look at later developmental timepoints.
  
  (2) The same is true in characterizing the effect of FMRP on the RNA granules.
  
  See response to public review point 1, which addresses this point.
  
  (3) No evidence is provided for the effectiveness of DHPG stimulation in DIV8-10 neurons; this is needed for justification using neurons at this stage.
  
  We have previously shown that DHPG stimulation in these neurons at this developmental time from cultures made from rat brain is sufficient to decrease the number of RPM puncta and to induce an increase in the synthesis of proteins in an initiation resistant manner (Graber et al, 2013; Graber et al, 2017). This is now more clearly stated in the manuscript. Moreover, here we replicate the result of DHPG in WT mice at reducing the number of RPM puncta.
  
  (4) In Figure 9 B, it is not clear whether the neurites indicated are axons or dendrites. Since neurons are still in the early stages of dendritogenesis/synaptogenesis, it is important to make that distinction.
  
  We have previously characterized RNA granules in axons and dendrites in hippocampal cultures from rats at this time (Miller et al, 2009, MCN 40:485-495)) and they are similar. While it is likely that the vast majority of the neurites at this time are dendrites, since we did not use markers, we conservatively just use the term neurites.
  
  (5) In Figure 1 (and elsewhere), fraction 5/6 is used as a polysome or RNA cluster. The authors have not provided a UV absorption profile and only have s6 as evidence to say this polysome. In the Coomassie gel, this fraction is any different than fractions 7/7 or 9/10; what is the justification for using this fraction?
  
  The main justification for these fractions is to be consistent with our previous paper (Anadolu et al, 2023) and the Khandian study comparing polysomes to pellet using the same fractionation protocol (El-Fatimy et al, 2016). We now provide a UV absorption profile (Fig. 1C) and EM pictures (Fig. 1D) to show the ribosome clusters in this fraction. We do not believe our results would be fundamentally different from those obtained if we had used other heavy fractions.
  
  Minor comments
  
  (1) The font size very small in the figures, please increase it.
  
  We have worked hard to increase the font size in all the figures.
  
  (2) In the result section for Figure 3B - it is written 'majority of these mRNA are non-coding mRNA' - this doesn't make sense.
  
  Corrected
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) There are lots of mistakes (e.g. word omissions or duplications, grammatical errors) throughout the text, too many to list here.
  
  We have carefully edited the text to try to minimize these mistakes.
  
  (2) In many positions related to their improved nuclease digestion protocol, samples are labelled "M ...", which apparently stands for "high magnesium and high nuclease treatment group". I would suggest switching to something more intuitive, such as "... (improved digestion)".
  
  We have removed most of the comparisons between these samples. What remains (Figure 3), we just use Low Nuclease when we refer to the sample with low Magnesium and low nuclease.
  
  (3) Figure 1,3 - It would be tremendously illuminating to see a polysome trace (UV260 absorbance) in addition to Coomassie-stained SDS-PAGE to underscore the interpretation of the different fractions by the authors. As it stands, there is no way of telling whether there are any polysomes present at all. This can also be done by hand using a UV absorption reader if no built-in device is available to the authors.
  
  We have now done this (Fig. 1C) and also provided EM of this fraction to show the presence of ribosomes in this fraction.
  
  (4) I don't understand why the authors switched from calling fraction 5/6 the "polysome fraction" in their previous work to calling it "ribosome cluster fraction" in this work. The argument given "[...] due to its structural similarity to ribosomes in RNA Granules (Anadolu et al., 2023), we conservatively call this the ribosome cluster fraction (RC)." does not instill confidence that these two fractions are indeed distinct.
  
  We agree with the reviewer and regret this decision. We now call the pellet, the pellet and Fraction 5/6, fraction 5/6.
  
  (5) Figure 1C - There are clear scanning or compression artefacts in the blot images (most prominently in the eEF2 lanes) that should be corrected.
  
  We have replaced all images in Figure 1 and have increased the N of this experiment considerably.
  
  (6) Figure 1C - The authors claim that WT mouse RG is enriched in FMRP compared to RC or starter fraction, but there is also a lot more protein loaded in the RG (especially when compared to RC). It is also hard to believe from the Coomassie staining that despite the much stronger presence of low MW bands (which is where ribosomal proteins migrate) in fraction 5/6, the s6 western blot signal is actually comparable between RC and RG. Can the authors please provide more detail on the loading of these fractions and supply quantification of FMRP in all three fractions, normalized by total protein? This might also be the source of their discrepancy, stating that contrary to their expectation, ribosomes (as measured by s6 signal / s6 signal in starter fraction) are actually increased in FMR1-KO brains.
  
  We have repeated all of these experiments and changed our method of quantification (See methods). We no longer use the starting material in our quantification. Indeed, with the additional data and change in method, we no longer see an increase in S6 in the FMR1- pellet fraction.
  
  (7) Figure 1 - I believe "D-F)" should only read "D-E)" based on the axis titles, and instead "FG)" should be added before the next sentence. Instead of "Staufen" it should be specified in the Figure that "Stau2" was quantified. "Staufen (59kd)" should read "Stau2 (59 kDa)" and "anti-Staufen (52kb)" should read "anti-Stau2 (52 kDa)" and the same for all other similar instances. It is further hard to believe that e.g., "Staufen2 (59kd)" (see above) is not significantly enriched with N=5, a very low spread, and over 1.5x enrichment. The authors should double-check that the appropriate statistical test was employed.
  
  Figure 1 has been completely redone, and the two Staufen bands are enriched in this new analysis.
  
  (8) Figure S4-2 - Most of the detail in the corresponding figure legend should be moved to the Materials and Methods section.
  
  Details relevant to the methods in this figure legend have been now moved to the Material and Methods section.
  
  (9) Figure 4A - The displayed/segmented tRNA densities appear unusually distorted. I would recommend displaying segmented densities of the original homogeneous reconstructions, not of separated and later fused partial maps.
  
  Figure 4 was modified according to the suggestions of this reviewer.’
  
  (10) Figure 9 C-D, S9-1 B-E - Are not all conditions also including puromycin as in B above? If so, it should be added to both the figure and the figure legend.
  
  The reviewer is correct and the figure and legend has been changed to reflect this.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) "Loss of FMRP causes Fragile X syndrome. In humans, the loss of FMRP occurs due to the expansion of a CGG repeat in the 5' untranslated region (UTR) of the gene, leading to excessive methylation and transcriptional inhibition."
  
  Comment: Genes don't have 5'UTR, but exons encoding 5'UTR. I suggest rephrasing this statement.
  
  This sentence has been rephrased.
  
  (2) "Several of these functions have been implicated in Fragile X syndrome, including FMRP's regulation of miRNA repression, splicing, translation initiation, and translational elongation".
  
  Comment: Is this a typo? miRNA instead of mRNA?
  
  No, this is correct. FMRP has been implicated in the regulation of microRNAs (miRNAs) in a number of studies.
  
  (3) "elongation rates are also increased in mouse models of FMRP".
  
  Comment: Mouse models of Fragile X?
  
  This has been corrected.
  
  (4) "Parts of this work were included in the Master's thesis of the first author (Li, 2024)."
  
  This has been removed.
  
  (5) Comment: Graphs in Figure 1 need proper y-axis labeling. What is the normalization method? What are the values presented in the y-axis?
  
  Figure 1 has been completely changed and the Y-axes are now clear in this new version.
  
  (6) "Thus, by looking at the percentage of puromycylation present in the presence of anisomycin, we can estimate the number of ribosomes in this state. "
  
  Comment: Are the authors really estimating the number of ribosomes in a resistant state? One could argue that they are collecting populational information regarding resistance to anisomycin.
  
  We have rephrased this sentence to be more conservative about what we are measuring.
  
  (7) Comment: Page 11 - Why did the authors assume magnesium would affect the conformation state of the ribosomes? What is the rationale behind increasing the [Mg2+]?
  
  Most preparations using ribosomes use 10 mM MgCl<sub>2</sub>. However, most neuroscientists use physiological buffers that contain 2.5 mM MgCl<sub>2</sub>. In bacteria, this makes a large difference, but evidence from eukaryotes is not clear. Since this is a collaboration between these two schools of thought, we decided to switch to 10 mM MgCl<sub>2</sub>, since in the EM, there were some free 60S ribosomes (Anadolu et al, 2024).
  
  (8) Page 11- "In other words, high Mg2+ decreased the abundance of mRNAs normally cotranslationally inserted into the ER which are unlikely to be components of transporting RNA granules containing stalled ribosomes and solidified our focus on the M protocol in the analyses below."
  
  We have removed this from the paper, as additional experiments aimed to solidify this interpretation failed to detect an effect on secretory mRNAs.
  
  (9) Comment: The whole "abundance", "enrichment", and "occupancy" nomenclature is hard to follow.
  
  We have rewritten this section.
  
  (10) Page 13 - "There were only 2 protein coding genes that were significantly different between the abundance of FMR1-KO and WT in protein coding genes - FMR1 and Wdfy1 (Extended Data Table 5-2). There were no significantly different genes between WT and FMR1-KO occupancy and enrichment. Thus, no difference rose to significance, given the large number of mRNAs used in this analysis."
  
  Comment: It seems like this is repeating the same information three times.
  
  This has been changed.
  
  (11) Page 13 - "Similar to previous experiments with rats, the most abundant mRNAs resistant to run off were significantly abundant, occupied and enriched in both WT and FMRP RPFs (Fig 6)"
  
  The Shah et al dataset we use was based on the most abundant mRNAs resistant to run-off. While we agree it is not surprising that they are also abundant in the pellet we observe, this would not necessarily be true unless the pellet is actually enriched in stalled mRNAs.
  
  (12) Page 14 - "These mRNAs had been identified by cross-linking FMRP with mRNA, fragmenting the mRNA, immunoprecipitating the mRNA still associated with FMRP and sequencing this mRNA."
  
  We shortened this description.
  
  (13) Page 14 - "Interestingly, while still significant, there appeared to be a decrease in the relative abundance of these mRNAs in the FMR1-KO RG (Fig 6B)"
  
  Comment: It is hard to observe this decrease in the boxplots. Second, the statistical tests for the bioinformatics analyses are not the most appropriate, given the large discrepancy in the number of mRNAs present in the experimental group ("All mRNAs") and the filtered groups.
  
  We have redone the statistics using multiple random sampling of all the mRNAs such that the total number of mRNAs in the group was the same. This lowered the significance for some groups, but they are mostly still highly significant. This analysis has also been affected by switching to using the data from the PCR-subtracted RPFs. The changes we now observe are more evident in the whisker box plots due to this improvement in the data.
  
  (14) Page 16 - "To rule out that peaks were due to amplification artifacts in the preparation of RPFs we repeated these analyses after removing PCR duplicates (Fig. S8-1; Extended Data Table S8-3) and found over 95% of the peaks identified without removing PCR duplicates were defined as a peak in at least one of the biological replicates after removing duplicates. More importantly, we found similar results with enrichment of FXS motif and enrichment of negatively charged amino acids in the FMR1-KO only, WT only and both peaks after removing PCR duplicates (Fig. S8-1; Extended Data Table S8-3)."
  
  Comment: It is unclear why the authors needed to include the analysis without PCR duplicate removal. This is an essential step to guarantee the robustness of ribo-seq findings. I recommend removing the whole analysis from Figure 8 from the manuscript and including only the post-duplicate removal analysis.
  
  As mentioned above, we completely agree with this statement and now show only this data and moreover have redone all the figures with only this data (except for Fig. 3).
  
  (15) Figure 9 - I am unsure that the data is convincing enough to demonstrate reinitiation of mRNA granules induced by DHPG. I suggest a colocalization experiment with another protein well known to be localized to RNA granules, such as G3BP1. In addition, repeat the experiment with an additional group where elongation is blocked after the addition of DHPG, which presumably would prevent the reduction in the WT puncta density.
  
  These are interesting additional experiments, but outside the scope of what we can manage. We have previously shown colocalization of Staufen, FMRP and UPF1 to these puncta (Graber et al, 2013; Graber et al, 2017) and shown that these puromycylated puncta also colocalize with nascent peptides detected using the Sun-Tag technique. While we think doing the experiment in the presence of an elongation inhibitor would be interesting, we disagree that it would prevent the reduction in WT puncta density, since we believe what is happening is the loss of the liquid-liquid phase separation of the ribosome clusters due to dephosphorylation of RBPs like FMRP and UPF1 (Graber et al, 2017), and this would reduce the puncta density whether or not the ribosomes were activated for translation.
  
  Nevertheless, we have tried to temper the conclusions made from this result, emphasizing what we know (RPM puncta are decreased) as opposed to actual reactivation of stalled polysomes which we are not measuring.
  
  Discussion - Page 18 - "Nevertheless, if FMRP binding was the critical determinant for presence in neuronal RNA granules, we would have expected to observe more differences." This is not true. If the data is poorly collected, you will not see differences.
  
  This statement was removed.
  
  (16) "A proportion of the stalled ribosomes that are not stored in large RNA granules may still be pelleted in the sucrose gradients. This fraction may be greater in the absence of FMRP."
  
  Comment: The authors are right about this and touch on my original point that the characterization of the biochemical fractionation is not convincing enough. I'd suggest probing against more proteins that are contained in RNA granules.
  
  We have added several proteins to the biochemical characterization shown in Figure 1. We have added a discussion about the relationship between neuronal RNA granules and the sedimented pellet fraction in the discussion section.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.02.21.639553v5
www.biorxiv.org www.biorxiv.org

Divergent C. elegans toxin alleles are suppressed by distinct mechanisms

1
1. Public_Reviews 16 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We incorporated Reviewer #2’s suggestion to change the name of mll-1 because of overlap with a human gene. We used the updated gene names in our responses below to minimize confusion. Below are the updated gene names for the toxin-antidote system we described.
  
  tmrl-1 - Toxin-induced Maternal Rod Lethality (formerly mll-1). After we establish that B0250.8 is also a toxin, we refer to this gene as the “N2 tmrl-1 allele”.
  
  amrl-1 - Antidote of Maternal Rod Lethality (formerly smll-1)
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The article by Zdraljevic et al. reports the discovery of a third toxin-antidote (TA) element in C. elegans, composed of the genes mll-1 (toxin) and smll-1 (antidote). Unlike previously characterized TA systems in C. elegans, this element induces larval arrest rather than embryonic lethality. The study identifies three distinct haplotypes at the TA locus, including a hyper-divergent version in the standard laboratory strain N2, which retains a functional toxin but lacks a functional antidote. The authors propose that small RNA-mediated silencing mechanisms, dependent on MUT-16 and PRG-1, suppress the toxicity of the divergent toxin allele. This work provides insights into the evolutionary dynamics of TA elements and their regulation through RNA interference (RNAi).
  
  Overall, there are many things to like about this paper and only a few small quibbles, which will not require more than a little rewriting or relatively minor analyses.
  
  Strengths:
  
  (1) The discovery of a maternally deposited TA element with delayed toxicity due to delayed mRNA translation of the maternally deposited toxin mRNA is a significant addition to the literature on selfish genetic elements in metazoans.
  
  (2) Identifying three haplotypes at the TA locus provides a snapshot of potential evolutionary trajectories for these elements, which are often inferred but rarely demonstrated in naturally occurring strains. The genomic analysis of 550 wild isolates contextualizes the findings within natural populations, revealing geographic clustering and evolutionary pressures acting on the TA locus.
  
  (3) The study employs various techniques, including CRISPR/Cas9 knockouts, FISH, long-read RNA sequencing, and population genomics. The use of inducible systems to confirm toxicity and antidote functionality is particularly robust. This multifaceted approach strengthens the validity of the findings.
  
  (4) The authors provide compelling evidence that small RNA pathways suppress toxin activity in strains lacking a functional antidote. This highlights an alternative mechanism for neutralizing selfish genetic elements.
  
  Weaknesses:
  
  (1) The introduction focuses strongly (for good reason) on bacterial TA systems and then jumps to TA systems in C. elegans. It's unclear why TA systems in other eukaryotes are not discussed.
  
  We briefly introduced bacterial TA systems because of their ubiquitousness and focused on C. elegans TA systems. We chose certain aspects of previously described Caenorhabditis TA elements that were relevant to the narrative we presented. Furthermore, we have extensively reviewed TA systems previously and have added a citation to that review in the revised manuscript (Burga et al. 2020).
  
  (2) Similarly, there is a missed opportunity to discuss an analogy between the suppressor mechanism discovered here and the hairpin RNA suppressors of meiotic drive identified by Eric Lai and colleagues. Discussing these will provide a fuller context of the present study's findings and will not affect their novelty.
  
  Thank you for pointing this out. We added a mention of the Stellate and Dox systems in our discussion.
  
  (3) While the evidence for RNAi-mediated suppression is strong, the claim that positive selection drove diversification at piRNA binding sites requires further discussion and clarification. The elevated dN and dS are unusual (how unusual relative to other genes in vicinity? What is hyper-divergent statistically speaking?), but there is no a priori reason that there would be selection on piRNA binding sites within the mll-1 transcript to facilitate its recognition by endogenous RNAi machinery; what is the selective pressure for mll-1 to do so? Most TA systems would like to avoid being suppressed by the host. One cannot make the argument that this was motivated by the loss of the antidote because the loss of the antidote would be instantly suicidal, so the cadence of events described requiring hypermutation of the mll-1 transcript does not work.
  
  We largely agree with the reviewer’s point, which we believe is based on the following sentence in the discussion: “We propose that positive selection for piRNA binding sites in the tmrl-1 transcript drove the diversification of this gene toward the N2 version.” We have removed this argument from the discussion in the revised manuscript.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In the manuscript by Walter-McNeill, Kruglyak, and team, the authors provide solid evidence of another toxin-antidote (TA) system in C. elegans. Generally, TA systems involve selfish and linked genetic elements, one encoding a toxin that kills progeny inheriting it, unless an antidote (the second element) is also present. Currently, only two TA systems have been characterized in this species, pointing to the importance of identifying new instances of such systems to understand their transmission dynamics, prevalence, and functions in shaping worm populations.
  
  Strengths:
  
  This novel TA system (mll-1/smll-1) was identified on LGV in wild C. elegans isolates from the Hawaiian islands, by crossing divergent strains and observing allele frequency distortions by high-throughput genome sequencing after 10 generations. These allele frequency distortions were subsequently confirmed in another set of crosses with a separate divergent strain, and crosses of heterozygous males or hermaphrodites resulted in a pattern of L1 lethality in progeny (with a rod arrest phenotype) that suggested the maternal transmission of this TA system from the XZ1516 genetic background. By elegantly combining the use of near-isogenic lines, CRISPR editing to generate knock-outs, and a transgene rescue of the antidote gene, the authors identified the genes encoding the toxin and the antidote, which they refer to as mll-1 and smll-1. Moreover, the specific mll-1 isoform responsible for the production of the toxin was identified and mll-1 transcripts were observed by FISH in early and late embryos, as well as in larvae. Inducible expression of the toxin in various strains resulted in larval arrest and rod phenotypes. The authors then characterized the genetic variation of 550 wild isolates at the toxin/antidote region on LGV and distinguished three clades: (1) one with the conserved TA system, (2) one having lost the toxin and retaining a mostly functional antidote, and (3) one having lost the antidote and retaining a divergent yet coding toxin (this includes the reference strain Bristol N2, in which the homologous toxin gene has acquired mutations and is known as B0250.8). Further, the authors show that this region is under positive selection. These data are compelling and provide very strong evidence of a new TA system in this species.
  
  Weaknesses:
  
  The question remained as to how one clade, including N2, could retain the toxin gene but not possess a functional antidote. In the second part of the manuscript, the authors hypothesized that small RNA targeting (RNAi) of the toxin transcript could provide the necessary repression to allow worms to survive without the antidote. Through a meta-analysis of multiple small RNA datasets from the literature, the authors found evidence to support this idea, in which the toxin transcript is targeted by 22G siRNAs whose biogenesis is dependent on the Mutator foci protein, MUT-16. They note that from previous studies, mut-16 null mutants displayed a varied penetrance of larval arrest. In their own hands, mut-16 mutants displayed 15% varied larval arrest and 2% rod phenotypes. In an attempt to link B0250.8 to mut-16/siRNAs, they made a double mutant and examined body length as a proxy for developmental stage. Here, they observed a partial rescue of the mut-16 size defect by B0250.8 mutation. Finally, the authors also highlight data from further meta-analysis, which predicts the recognition of B0250.8 by several piRNAs. Also based on existing data from the literature, the authors link loss of Piwi (PRG-1), which binds piRNAs, to a depletion of 22G-RNAs targeting B0250.8 and an upregulation of B0250.8 expression in gonads, suggesting that piRNAs are the primary small RNAs that target B0250.8 for downregulation. The data in this portion of the manuscript are intriguing, but somewhat preliminary and incomplete, as they are based on little primary experimentation and a collection of different datasets (which have been acquired by slightly different methods in most cases). This portion of the study would require subsequent experimentation to firmly establish this mechanistic link. For example, to be able to claim that "the N2 toxin allele has acquired mutations that enable piRNA binding to initiate MUT-16-dependent 22G small RNA amplification that targets the transcript for degradation" the identified piRNA sites should be mutated and protein and transcript levels analysed in wild-type and in the strain with mutated piRNA sites. At a minimum, the protein levels in wild-type and mut-16, prg-1, and/or wago-1 mutants should be measured by western blot and/or by live imaging (introducing a GFP or some other tag to the endogenous protein via CRISPR editing) to show that the toxin is not accumulated as a protein in wt, but increases in levels in these mutants. mRNA levels in Figure S5A suggest there is still some expression of the B0250.8 transcript in a wild-type situation.
  
  We thank the reviewer for their thoughtful assessment of our manuscript, and we appreciate that they recognized that the data linking the small RNA machinery to B0250.8 suppression is intriguing. While the reviewer claims our analysis is preliminary and incomplete, we believe we present an appropriate multi-faceted approach for establishing the small RNA-mediated suppression mechanism we describe.
  
  First, the reviewer states that we rely on “little primary experimentation”. Our primary experiments show that loss of the N2 tmrl-1 allele partially rescues ∆mut-16 developmental delay and arrest phenotypes. Therefore, we provide direct evidence that the N2 tmrl-1 functionally contributes to the ∆mut-16 phenotype. Furthermore, we overexpressed the N2 tmrl-1 allele to show that this gene is a toxin.
  
  It is true that we use previously published datasets to establish a small RNA-mediated mechanism that likely explains our observations. The reviewer suggests that our claims are weakened by relying on a “collection of different datasets (which have been acquired by slightly different methods in most cases)”. We believe instead that evidence collected from multiple labs using an array of different techniques strengthens our conclusions. We show that N2 tmrl-1-targeting small RNAs have been identified across multiple datasets (references 26, 32, 33, 34). Taken together, these datasets support a mechanistic framework for the suppression of the N2 tmrl-1 that involves PRG-1-dependent piRNA binding, MUT-16-dependent 22G siRNA, and the secondary Ago WAGO-1 binding.
  
  The reviewer suggests several experiments, but we do not view them as essential to support our claims.
  
  (a) piRNA site mutatagenesis: we present multiple lines of evidence that the N2 tmrl-1 transcript is post-transcriptionally targeted by small RNAs in a piRNA-mediated manner, not that specific piRNA sites are necessary and sufficient for this silencing. The suggested experiment would be valuable for future work, but is beyond the scope of our study.
  
  (b) Characterization of TMRL-1 protein levels: We agree that this experiment would provide definitive evidence of complete small RNA-mediated suppression of the N2 tmrl-1 transcript. As we explain above, however, we do show that removing the N2 tmrl-1 allele partially rescues the ∆mut-16 growth defect, demonstrating that when this gene’s regulation is disrupted, it induces toxicity. Importantly, we observed no tmrl-1-induced toxicity when we overexpressed a version of this gene with a stop codon, indicating that it acts as a protein.
  
  Finally, the reviewer questions our claim that: "the N2 toxin allele has acquired mutations that enable piRNA binding to initiate MUT-16-dependent 22G small RNA amplification that targets the transcript for degradation."
  
  We agree that this statement is too definitive given our current data. We have revised it to: "Multiple lines of evidence suggest that the N2 tmrl-1 allele is recognized by piRNAs, leading to MUT-16-dependent 22G siRNA production and post-transcriptional silencing of the transcript."
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The paper suggests that antidote pseudogenization occurred because RNAi replaced its function, but does not explore whether this process is ongoing or complete across all N2-like strains.
  
  We explored this possibility, but we realize that we did not explicitly state so in the manuscript. The B0250.4 (amrl-1) gene is pseudogenized in all strains within the N2 clade. We have modified the following sentence in the results section to explicitly state this observation:
  
  “While the previously described C. elegans TA elements are characterized by their absence in susceptible strains (2, 3), all members of the N2-like susceptible clade harbor a divergent allele of tmrl-1 with an intact coding sequence, as well as a pseudogenized version of amrl-1.”
  
  (2) Some figures (e.g., allele frequency distortions) could benefit from additional annotations to guide interpretation. In general, the figures make the reader work harder than they need to.
  
  We attempted to add clarity to figure captions for clarity.
  
  Although mll-1 and smll-1 were identified as toxin and antidote genes, their molecular mechanisms remain unclear and are very interesting.
  
  We agree that identifying the molecular mechanism associated with the toxin and antidote would be of interest, but is beyond the scope of the current paper.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Because the rod phenotype was important in identifying the TA system, it seems important to include representative images of this phenotype throughout the paper.
  
  We added a supplemental figure showing the resulting self progeny from a QX1211/XZ1516 heterozygote: Fig S1B
  
  (2) In Figure 2A, we were confused as to why there were so few reads of mll-1. We may be misunderstanding something, so could the authors explain this to us? We would have expected more reads of mll-1, given the diagram showing that the breakpoints of the NIL were beyond (closer to the right end of) the mll-1 locus, and the phenotype correlates with the presence of the toxin (frequency of .20 L1 arrest).
  
  The lack of sequencing depth arises because the sequence divergence between QX1211 and XZ1516 is too high to accurately map short sequencing reads derived from QX1211 to the XZ1516 genome. We added the following sentence to the figure caption to add clarity:
  
  “The XZ1516 and QX1211 genome are so diverged that short reads derived from QX1211 don’t align to the XZ1516 genome in the 200 bp windows with no corresponding read depth, as indicated by a lack of a gray bar.”
  
  (3) The use of TOF in Figure 4 as a proxy of animal length instead of directly indicating or measuring animal length hinders the comparison of these results with other studies (i.e., most often in the literature, we see images of worms and measurements of their sizes or use of some other morphological marker to demonstrate the proportion of worms in a particular developmental stage). Nonetheless, we think the approach is clever and certainly enables analysis of a large sample population. However, a wild-type control is missing from these experiments to give a sense of the typical distribution one would expect. Without this, one interpretation of the B0250.8 knock out data shown in B is that loss of B0250.8 results in ~10% arrested larval, which seems higher than would be expected for a wild type N2 strain, and should be explained-but again, if the wild type control showed the same pattern, that would be useful to know. The title for Figure 4 should be revised, as this figure suggests, but does not provide definitive evidence that B0250.8 is suppressed by sRNAs/sRNA pathways. See the next point for providing more definitive data to support this model.
  
  There is a long list of publications that rely on the large particle sorter to infer how growth rate is affected in various mutants and environmental conditions (See Andersen et al. 2015, ref 28 in the manuscript, and the papers that reference this work). As the reviewer pointed out, the use of time of flight, which is simply the amount of time an object obstructs a laser at a constant flow rate, enables accurate measurement of tens of thousands of individual animals for comparison.
  
  The reviewer is correct to point out that without a wild type N2 control, it is impossible to tell what a typical distribution looks like. However, the experiment includes all strains necessary to make the comparisons that enable us to draw the conclusion that the N2 tmrl-1 allele contributes to larval arrest in the absence of MUT-16.
  
  We agree with the reviewers point that this figure does not provide evidence that B0250.8 is suppressed by small RNAs and we have therefore changed the figure title.
  
  The new figure title: The N2 tmrl-1 allele contributes to larval arrest in the absence of MUT-16
  
  (4) To be able to claim that "the N2 toxin allele has acquired mutations that enable piRNA binding to initiate MUT-16-dependent 22G small RNA amplification that targets the transcript for degradation" the identified piRNA sites should be mutated and protein and transcript levels analysed in wild-type and in the strain with mutated piRNA sites. At a minimum, the protein levels in wild-type and mut-16, prg-1, and/or wago-1 mutants should be measured by western blot and/or by live imaging (introducing a GFP or some other tag to the endogenous protein via CRISPR editing) to show that the toxin is not accumulated as a protein in wt, but increases in levels in these mutants. mRNA levels in Figure S5A suggest there is still some expression of the B0250.8 transcript in a wild-type situation.
  
  The reviewer makes several good suggestions for experiments to determine whether the conclusions we make from publicly available high-throughput sequencing datasets apply in our context. However, we disagree that the quoted statement “the N2 toxin allele has acquired mutations that enable piRNA binding to initiate MUT-16-dependent 22G small RNA amplification that targets the transcript for degradation” is not supported by the evidence we present from Reed et al. 2020. The data presented by Reed et al. clearly show that the N2 tmrl-1 transcript is heavily targeted by 22G siRNAs, and that the accumulation of these siRNAs depends on the presence of MUT-16 and PRG-1. The dependence on PRG-1 implicates piRNAs involvement in the mounting of a 22G response.
  
  (5) Importantly, it is not the mll-1/B0250.8 transcript itself that was not shown to interact with WAGO-1 in the Seroussi et al. eLife paper (Lines 257-259). This study investigated sRNAs associated with every AGO, and computationally inferred the targets of each AGO using those enriched sRNA sequences. Therefore, it is the siRNAs antisense to mll-1/B0250.8 that were detected in association with WAGO-1, making it likely that WAGO-1 is the secondary AGO that targets this transcript. The argument the authors make holds true, but the authors should revise how they describe the evidence supporting that argument to accurately reflect the existing data.
  
  Thank you for catching this mistake. We have updated the text to accurately reflect the results from the Seroussi et al 2023 publication:
  
  “Recent work has shown that the N2 tmrl-1 transcript-derived small RNAs co-immunoprecipitated with WAGO-1, providing additional evidence that this transcript is regulated by the endogenous RNAi machinery”
  
  (6) It seems likely that the authors explored the possibility that another antidote may be present in the third clade. Could they discuss what they did to rule out this explanation in lieu of piRNA/siRNA regulation?
  
  We did not look for another antidote in the third clade because this clade is defined by the presence of an antidote and the absence of a toxin. Figure 3C shows the result of a cross between a third clade strain (NIC195) and XZ1516. The conclusion we draw from this experiment is that the antidote present in NIC195 provides near complete resistance to the XZ1516 toxin.
  
  (7) Line 156, legend of Figure S3, and line 273: There was no marker used to indicate that these are the primordial germ cells. Best practices would indicate using a fluorescent marker (e.g., PIE-1 GFP or PGL-1 GFP or PRG-1 GFP, etc.) to definitively identify these as PGCs.
  
  We agree with the reviewer’s point. As we do not have the perfect experiment, we do not definitively state that tmrl-1 transcripts localize in the primordial germ cells.
  
  Minor comments:
  
  (1) A minor suggestion: incorporating some of the results now shown in the supplementary figures - Figures S1, S3, and S4 - into the main figures may make the manuscript easier to read.
  
  We constructed the manuscript in a way we thought was straightforward. The figures listed by the reviewer are supplemental to the main conclusions of the manuscript, so we decided to leave them as supplemental figures.
  
  (2) Line 87, Figure S1A: include numbers in the y-axis.
  
  The numbers are included on the y-axis and we explain the x-axis tick marks in the figure caption.
  
  (3) Figures 1B, 2B, 3C, 4B, S1B, S4: statistical analyses missing.
  
  We have added a summary of the statistical analysis to the captions of Figures 1B, 2B, 3C, and S1B. We added more detail from the analysis of 4A, which is the figure we draw conclusions from. Figure S4 is observational data, and the only conclusion drawn from that figure is that the N2 tmrl-1 gene encodes a toxin. It is toxic in 100% of individuals we looked at and therefore doesn’t warrant statistics.
  
  (4) Line 100, "The rod progeny were all homozygous for QX1211 alleles at the locus on the right arm of chromosome V that displayed the allele frequency distortion in the mapping populations". Is this supported by data? While there is strong evidence to suggest it, the way it is currently written makes it seem that the rod progeny have been genotyped (by sequencing or PCR?). Is this the case? If not, the authors should revise the statement accordingly.
  
  Yes, this is indeed the case and we have updated the text to reflect that we performed PCR of a QX1211-specific indel to verify the genotypes on the right arm of chromosome V.
  
  (5) Figure 2A: lower panel missing x axis label.
  
  The top panel is a cartoon representation of a NILs, and the x axis is labeled for the top panel, highlighting the mapped element.
  
  (6) Line 140 to 148: The authors should provide data to support these statements.
  
  Realizing i skipped this one – these are the lines they are referring to -> Long-read RNA sequencing revealed two distinct mll-1 isoforms, a short isoform with three predicted exons and a long isoform with eight predicted exons (Fig. S2A). We constructed plasmids with inducible versions of each mll-1 isoform. When we injected susceptible strains with the short mll-1 isoform array, every F1 individual carrying the array died, with 64% of larvae exhibiting the rod phenotype, indicating that uninduced expression levels of the short mll-1 isoform are sufficient to induce lethality. By contrast, we were able to isolate susceptible strains that maintained the long mll-1 isoform array or a short mll-1 isoform array with a premature stop codon in mll-1. We observed no rod progeny upon induction of these arrays, indicating that the short isoform encodes the functional toxin, and that the toxin acts as a protein.
  
  (7) Line 193: It would be interesting to see if there is structural conservation between mll-1 and B0250.8 using alpha-fold. Have the authors done this?
  
  We did attempt to look for structural conservation but we found the confidence in the structural predictions to be very low, which didn’t warrant a comparison.
  
  (8) Line 206-207: Could the authors explain why the frequency of the rod phenotype is so low when presumably over-expressing B0250.8? Does this indicate that B0250.8 is not as functional a toxin as mll-1, or is it sufficiently repressed by sRNAs and not actually overexpressed? Further, what are "abnormal" phenotypes? This should be clarified for the reader.
  
  It is likely that the overexpression and misexpression of toxic proteins is causing the abnormal phenotypes. The rod phenotype probably manifests when the gene is expressed at the appropriate developmental stage and tissue to cause the phenotype, whereas abnormal phenotypes manifest when the expression is not in the correct stage or location. A summary of the observed phenotypes is provided in Supplementary Table 7.
  
  (9) Line 216 and thereafter: indicate that B0250.8 is now referred to as mll-1.
  
  We incorporated this suggestion.
  
  (10) Line 228-231: missing to state that this is shown in Figures 4A-B.
  
  This and the following comment suggests that we did not provide enough clarity in this section. We modified the line to the following:
  
  Consistent with this report, in an agar plate-based preliminary assay we observed that ~15% of ∆mut-16 progeny arrest at various larval stages, and 2% of progeny are rod, which is suggestive of derepression of tmrl-1 in N2.
  
  This lets readers know that this initial characterization of the mut-16 knockout strain is different from the data presented in figure 4.
  
  (11) Line 230: the Figure shows ~25% of arrest for the deletion mutant of mut-16, but the text says ~15%.
  
  The 15% the reviewer points out was obtained from a preliminary agar plate-based experiment where we attempted to characterize the mut-16 deletion strains. We turned to a more high-throughput approach to screen through more animals for each genotype, which we report in figure 4.
  
  (12) Line 233: TOF, and not animal length, was compared. The authors should indicate that TOF is used as a proxy for animal length.
  
  We made the suggested change. The new sentences read:
  
  To do so, we compared time of flight (TOF) measurements—a proxy for animal length, developmental stage, and growth rate (28)—between a strain with a single knockout of mut-16 and one with a double knockout of mut-16 and the N2 tmrl-1 (a strain with a single knockout of the N2 tmrl-1 served as a negative control). We observed a reduction in TOF and an increase in the fraction of worms in larval stages in the mut-16 knockout strain, and these effects were partially rescued in the double knockout strain (Fig. 4).
  
  (13) Line 237-239: This claim may be overstated without additional data. Consider adding a "likely" to the statement.
  
  The line in question:
  
  These results indicate that the reduced growth rate observed in the mut-16 knockout strain is partially mediated by derepression of the N2 mll-1 allele.
  
  We modified it to reflect the reviewer’s concern:
  
  These results indicate that the reduced growth rate observed in the mut-16 knockout strain is partially mediated by the presence of the N2 tmrl-1 allele, likely because tmrl-1 is derepressed in mut-16 knockout strains.
  
  (14) Line 257: Figure S5C should be moved to line 259.
  
  We made the suggested move.
  
  (15) Is the name mll-1 firmly established? We ask because MLL1 is a human mutation commonly associated with leukemia, and it may lead to some confusion in the field. This is a minor point, but we wanted to bring it forth.
  
  This name was not firmly established. We modified the names to not overlap with known gene names:
  
  tmrl-1 - Toxin-induced Maternal Rod Lethality
  
  amrl-1 - Antidote of Maternal Rod Lethality
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.26.591160v3
www.biorxiv.org www.biorxiv.org

Msc1 facilitates glucose starvation-induced remodeling of the nucleus-vacuole junction

2
1. EMBOpress 16 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Note: This preprint has been reviewed by subject experts for Review Commons. COntent has not been altered except for formatting.
  
  Reply to the Reviewers
  
  General Statements
  
  We thank the reviewers for their careful evaluation of our manuscript and for the many constructive suggestions. Overall, the reviewers found the identification of Msc1 as a glucose starvation-responsive NVJ-associated factor to be novel and potentially important, while also raising several important concerns regarding the mechanistic interpretation of our findings and the topology/localization of Msc1. We particularly appreciate the reviewers' comments regarding potential overinterpretation of several conclusions. In the revised manuscript we will substantially revise the wording throughout the text to more carefully distinguish correlation from causation and to avoid unsupported mechanistic conclusions. In addition, we plan to address the reviewers' concerns through a combination of additional experiments, revised data presentation, clarification of methodological details, and expanded discussion of alternative interpretations.
  
  1. Description of the planned revisions
  
  In the point-by-point response below, the reviewers' comments are presented in italics, and our responses are provided below each comment.
  
  Reviewer #1
  
  Comment:
  
  *This is a nice, smallish study of Msc1, a fungal protein of unknown function. The authors show it localises to the NVJ when that expands in late-log/stationary phase, at which stage its transcription is increased 80-fold - an induction one whole order of magnitude greater than shown by Nvj1 itself. This indicates that Msc1 may be a previously unappreciated master regulator of the NVJ. There are some interesting phenotypes of deleting Msc1, including some cell death and loss of Nvj1, mostly through destabilisation since the transcriptional effect is marginal. *
  
  *While no mechanism for Msc1 is discovered, that might be too much to ask for in this first paper. However, there are ways to begin to address this that the authors should look into. *
  
  *My major issue with the paper is that it makes no link to the previously studied homologues of Msc1 in S pombe (Ish1/Les1 - see Asakawa et all 2022). Admittedly, S. pombe has no Nvj1 homolog, but there is a physical relationship between nucleus and vacuole (Chadwick et al (2020) 10.1088/1478-3975/aba510). Also, the paper on Ish1/Les1 developed a phenotype to test Ish1 (toxicity of over expression) that might be useful for studies of Msc1. *
  
  *The current MS should link to work on Ish1/ Les1 in S. pombe, relating to several features: *
  
  *Topology. Given the high similarity between Msc1 and Ish1/Les1, they are (a priori) likely to share considerable form and function. If Msc1 is a soluble protein in the ER lumen, then the previous report that Ish1/Les1 have TMDs is wrong. The report here should make that link and carefully explain how the Pombe paper is wrong. Also explain how is it possible for Msc1 (and Ish1/Les1) to stay restricted to the nuclear envelope? (in many images it is diffuse throughout the NE). The only mechanism I can think of is binding an integral protein that sorts to the inner-NE by known mechanisms (or possibly binding to an outer-NE protein that binds to an inner-NE one, like SUN/KASH). I cannot think of any other example of a soluble proteins restricted to the NE - so this is quite a claim. An alternative view that could be investigated and should definitely be discussed is that Msc1 (and by implication Ish1 and Les1) has a TMD even though it is extracted by carbonate. Something similar has been reported for some single TMD proteins in mitochondria (Kim et al (2015) 10.1002/pro.2817). Investigations would include proteomics showing whether the protein is normally full length (as coded by the open reading frame) or clipped (indicating the signal sequence is removed for a soluble protein). Such data may already be available in published mass spec datasets. *
  
  We agree that the relationship between Msc1 and the previously characterized S. pombe homologs Ish1/Les1 should be discussed more carefully, particularly with respect to membrane topology. In the revised manuscript, we will cite and discuss the Ish1/Les1 studies and further investigate the topology and localization of Msc1 through several additional experiments. First, as suggested by the reviewer, we will examine whether the N-terminal region of Msc1, which is predicted to function as a signal sequence and as a weak transmembrane domain, undergoes proteolytic processing. To address this, we plan to perform mass spectrometry-based analyses and examine whether the N-terminus is retained in the mature protein. As an alternative approach in case the N-terminal peptide cannot be reliably detected by mass spectrometry, we will generate an Msc1 mutant lacking the predicted N-terminal 22-amino-acid signal sequence and compare its migration on SDS-PAGE with that of the full-length protein. This analysis should provide an additional assessment of whether the predicted signal sequence is removed during Msc1 maturation.
  
  In addition, following comments from multiple reviewers, we will repeat the alkaline carbonate extraction experiments using additional ER membrane protein controls to more carefully evaluate the membrane association properties of Msc1.
  
  Furthermore, we plan to perform additional split-GFP localization analyses to test whether Msc1 localizes within the perinuclear ER lumen. Specifically, we will express GFP1-10 either within the ER lumen or within the nucleoplasm and examine in which compartment co-expression of Msc1-GFP11 results in GFP fluorescence.
  
  Finally, as suggested by the reviewer, we agree that interactions with integral membrane proteins may explain the restricted localization of Msc1 within the nuclear envelope/NVJ region. In our preliminary experiments, we obtained results suggesting a physical interaction between Msc1 and Nvj2. Therefore, we plan to further investigate the interaction of Msc1 with Nvj2, as well as with other known NVJ-associated proteins, to better understand the mechanism underlying its localization and enrichment at the NVJ.
  
  *Minor Issues *
  
  *The Abstract switches from response to lack of glucose to terminology about 'stress-response'. This could appear to be an effort to appear more interesting. If the idea is to remain, it needs some support with the introduction of the idea that yeast experiences stress (as opposed to "normal" transcription driven programmatic changes in relation to changing levels of glucose in normal cultures. *
  
  To avoid overstating our findings, we will revise the Abstract and related text to use more precise terminology and to more clearly describe the observed responses.
  
  Introduction para 1 seems to be dedicated to the idea that a set of intracellular structures (here MCS) are 'dynamically and coordinately remodeled in response to metabolic and stress conditions'. This conclusion applies widely and may not be noteworthy. The paragraph needs a bit of rethinking.
  
  While nutrient-dependent changes in the NVJ itself have long been recognized, we believe that dynamic remodeling of multiple MCSs in response to environmental and metabolic conditions has only more recently become appreciated more broadly in the field. We therefore think that discussing the emerging concept that diverse MCSs undergo dynamic reorganization under different physiological conditions provides important context for the present study. Nevertheless, we will revise the Introduction to explain this point more clearly and concisely.
  
  Figure 2D: I could not find Nsg1 result described in the text.
  
  We will repeat the experiment independently and quantify the immunoblot results shown in Figure 2D. The resulting quantitative data will be added to Figure 2D, and the Results section will be revised accordingly to describe these findings, including the Nsg1 phenotype.
  
  P6: "Strikingly, GS-dependent transcriptional activation of NVJ1 was significantly suppressed in msc1∆ cells (Fig. 4B)." This overstates the strength of the result. Instead state that the induction diminishes from 6-fold to 4-fold, and give the p value.
  
  We will revise the text to provide a more quantitative description of the result, including the corresponding p value.
  
  *Language: Avoid use of rhetorical wording (e.g. dramatic): just state the results (e.g. 80-fold induction) and let the results be dramatic/striking etc. all by themselves.
  
  *
  
  We will also revise the text to avoid rhetorical wording and instead describe the results in a more direct and quantitative manner.
  
  Reviewer #2 * The study describes the finding of the nuclear envelope protein Msc1 as a new component of the membrane contact site nucleus vacuole junction (NVJ) under the conditions of glucose starvation. Msc1 has previously only been known as a nuclear envelope protein, presumably localizing to the nuclear lumen, and its role in DNA damage repair. The main finding of this study is the glucose starvation-induced upregulation and NVJ-localization of Msc1 (Figure 1). The second main finding is that the loss of Msc1 results in an impaired induction of the expression of Nvj1 (the main component of the NVJ, responsible for the formation of NVJ via direct interaction with Vac8) upon glucose starvation (Fig. 3 A). The effect of Msc1-loss on the Nvj1 expression levels is transcriptional (Fig. 4 B). The glucose starvation-mediated expression induction of some other previously identified NVJ components, Nsg1 and Nsg2 is also impaired in the msc1D mutant, while the expression of Ypf1 is affected to a lesser degree. The data supporting these two main findings are solid (Figure 1; Figure 3 A; Figure 4 A, B).
  
  The study further shows that the loss of Msc1 results in a loss of NVJ-localization of NVJ components Tsc13, Ypf1 and to a lesser degree Hmg2. The microscopy data looks solid, however the interpretation of this finding is not clear. In my view, the most likely explanation is that the effect of Msc1 loss on the localization of NVJ components to the NVJ is due to the impaired glucose starvation-induced Nvj1 expression in the msc1D mutant.
  
  MAJOR COMMENTS:
  
  Here are suggested experiments that would strengthen the study: - It is difficult to imagine how a NE protein could affect expression levels of other NVj proteins - this key finding would be supported by a complementation experiment where MSC1 is expressed from a vector - to test whether this rescues the phenotype (to make sure that the observed phenotype is not due to an off-target effect of msc1D deletion) *
  
  As suggested by the reviewer, we plan to perform complementation experiments by expressing Msc1 from a plasmid in msc1∆ cells to confirm that the observed phenotypes are specifically caused by loss of MSC1.
  
  *- If technically feasible under the glucose starvation conditions, this hypothesis could be tested by overexpressing Nvj1 from an inducible or some other promoter. *
  
  We agree that this is an important point. As suggested by the reviewer, we plan to overexpress Nvj1 using a constitutive promoter and examine whether this suppresses the phenotypes observed in msc1∆ cells.
  
  *- The effect of msc1D deletion on Tsc13 proteins levels (preferentially using the same Tsc13-GFP strain as used in microscopy - anti Tsc13 or anti-GFP antibodies could be used) *
  
  We will examine Tsc13 protein levels in msc1∆ cells using the same Tsc13-GFP strain used for microscopy.
  
  *- The results concerning the localization of Msc1-GFP in elo3D mutant have been interpreted as "accelerated localization", "expansion of the the size of Msc1-NVJ domain" etc. However, the levels of Msc1-GFP in the elo3D mutant are higher compared to WT (Figure 2 D). Considering this, it is very likely that the larger surface area measured in the elo3D mutant is a consequence of this. This could be potentially checked by comparing images set of WT and elo3D that are set to a similar fluorescence intensity. In any case, this possibility should be definitely addressed in the interpretation of the result. *
  
  We agree that the increased Msc1-GFP signal in elo3∆ cells could contribute to the apparent increase in NVJ area. However, in our previous study (Fujimoto and Tamura, 2026, J. Cell Biol.), we observed accelerated NVJ expansion under glucose starvation and in elo3∆ cells using Ypf1, whose expression levels are largely unchanged under these conditions. We therefore think that the observed phenotype is unlikely to be explained solely by increased Msc1 expression. Nevertheless, because Msc1 protein levels are clearly elevated in elo3∆ cells, we will revise the text to describe these results more carefully and fairly, while citing our previous findings.
  
  *- There is an impression that the data has been overinterpreted, and the conclusions should be written much more carefully. Examples: o "Here, we show that Msc1 is a GS-responsive NVJ factor that plays an important role in functional NVJ remodeling." - based on data shown, the effect of Msc1 could be indirect. The statement above should be re-written or argumented much better. o "we find that GS-dependent induction of NVJ1 transcription is attenuated in msc1Δ cells, suggesting that proper NVJ remodeling contributes to the execution of stress-responsive transcriptional programs" - this is unclear; which data support this? o "Together, these findings position Msc1 as an upstream regulator linking GS signaling to functional maturation of the NVJ and associated cellular adaptation responses." - same comment as above o "...suggesting that Msc1 functions as a GS-responsive regulator of NVJ functions." o "...these findings suggest that Msc1 acts upstream of Ypf1 in orchestrating GS-induced NVJ functional maturation." o "Collectively, these results indicate that Snf1 acts upstream of Msc1 to drive GS-induced NVJ remodeling, whereas reduced Elo3 activity further accelerates this process and promotes Msc1 accumulation." - not sure if the available data support this. o "These results indicate that although Msc1 ...... it is required for efficient GS-dependent functional maturation of the NVJ domain." o "These observations suggest that loss of Msc1 does not cause a general defect in transcriptional activation but rather impairs the proper execution and dynamic range of GS-dependent transcriptional responses." - this is unclear o "Within this context, the robust induction of NVJ1 appears to be particularly sensitive to Msc1 deficiency." - this sentence would benefit from being re-written. o "Together, these results indicate that Msc1 contributes to transcriptional reprogramming associated with NVJ remodeling during GS." - this sounds overstated. o "the observation that loss of Msc1 attenuates GS-dependent induction of NVJ1 raises the possibility that NVJ remodeling influences stress-responsive gene expression programs." *
  
  We appreciate the reviewer's concern that several interpretations in the current manuscript may extend beyond what is directly supported by the available data. We will therefore revise these statements throughout the manuscript to provide more balanced interpretations and avoid overstating our conclusions. In addition, several planned experiments, including complementation analyses, Nvj1 overexpression experiments, additional localization analyses, quantitative protein analyses, and identification of NVJ-associated proteins that interact with Msc1, may further clarify the relationship between Msc1, NVJ remodeling, and glucose starvation responses. We will revise the text accordingly based on the results obtained from these additional experiments.
  
  *OTHER COMMENTS FIGURE BY FIGURE - SOME ARE MAJOR (overlapping to the above comments), SOME ARE MINOR: *
  
  *Figure 1: *
  
  *Figure 1 A and B shows that Msc1-GFP expression is upregulated in cells starved for glucose for 24h, but not in nitrogen-starved cells. *
  
  *o Size of the markers (protein ladder) would be helpful. * We will reprocess the immunoblot images from the original data and revise the figure layout to include molecular weight markers.
  
  *Figure 2: - Comment: It is not clear if these are the same strains as analyzed by microscopy (GFP-tagged Msc1). This should be specified in the Figure legend 2 D. *
  
  *- Comment: o Since the levels of Msc1-GFP in the elo3D mutant are higher compared to WT (Figure 2 D), the larger surface area measured in C may be a consequence of this. *
  
  *o It is not clear if Figure A and D analyze the same strains (western blot and microscopy - do both show GFP-tagged Msc1? - using anti-GFP?). This should be specified in the Figure legend 2 D. Since the increased area measured in Figure 2 C could be due to increased Msc1-GFP levels in this mutant strain, the WB should check the levels of Msc1-GFP in the same strain and under same conditions as analyzed in Figure 2 C.
  
  o Does Tim23 serve as a loading control in Figure 2 D? *
  
  We added "Tim23 was used as a loading control." In the legend of Figure 2D.* o Would be good to have protein ladder sized marked in Western blots o Since the increase in Msc1 levels in the elo3D mutant could be significant for the interpretation of the results, it would be helpful to have quantification of the protein levels in WB (normalized to a loading control). *
  
  We will clarify in the Figure 2 legend that Figure 2A shows GFP-tagged Msc1 expressed cells analyzed by fluorescence microscopy, whereas Figure 2D shows untagged strains analyzed by immunoblotting using an anti-Msc1 antibody. We will also clarify that Tim23 was used as a loading control and add molecular weight markers to the Western blots. We agree that the increased Msc1-GFP levels in elo3∆ cells could influence the apparent increase in NVJ area measured in Figure 2C. As noted above, our previous findings using Ypf1 suggest that accelerated NVJ expansion in elo3∆ cells is unlikely to be explained solely by increased Msc1 expression (Fujimoto and Tamura, J. Cell Biol., 2026). Nevertheless, we acknowledge that elevated Msc1-GFP levels could influence the apparent NVJ area measured in Figure 2C. We will therefore revise the text to more carefully describe these results and discuss them in the context of our previous findings.
  
  In addition, we will quantify the Western blot signals in Figure 2D normalized to the loading control and include these data in the revised manuscript.
  
  Figure 3 ** Together these data show that localization of other NVJ-proteins to the NVJ depends on the presence of Msc1. Comment: - From the available data it is possible that Msc1 recruits these components by direct interaction, or by modifying the structure of NVJ, or functions in an indirect manner - this should be discussed in the Discussion. Comment: - The signal of Tsc1-GFP in log-growing cells is very weak, therefore the quantification may be unreliable. I would remove this condition (log-grown cells) form the quantification in C) due to the low signal, since it is not crucial to the interpretation of the data. If the authors prefer to leave it, that is fine. - The title of the Figure 3 is "Msc1 supports stability and recruitment of NVJ-associated proteins" - I am not sure what "stability" is; the data don't address stability or recruitment in a direct manner - I suggest to change the figure title into a statement describing what is shown in the Figure, for example: "The loss of Msc1 results in decreased Nvj1 levels and a decreased localization of NVJ proteins to the NVJ). And have a comment that this data suggests that Msc1 supports recruitment of NVJ-associated proteins, likely in an indirect manner, based on the finding that the loss of Msc1 leads to a lower expression of Nvj1, in the main text (e.g. in the Discussion). - Is it possible that the loss of Msc1 on the loss of NVJ-localized Tsc13 is due to the downregulation of Tsc13 expression? Considering the effect of msc1D deletion on the expression of some NVJ proteins (Figure 3 A), Tsc13 expression levels would be good to be checked, considering the effect of msc1D on Tsc13-GFP localization. It would be optimal to do the WB with the same Tsc13-GFP-expressing strain and under the same growth conditions as was used in the microscopy in the Figure 3 B. - Expression levels of Ypf1 are lower in the msc1D strain, than in the WT (Fig. 3 A) - could this affect lower NVJ-area in his mutant? (Fig. 3 B)
  
  We agree that the current data do not distinguish whether Msc1 affects localization of NVJ-associated proteins directly, indirectly through changes in NVJ structure, or through other indirect mechanisms. We also agree that the term "stability" used in the current Figure 3 title is not sufficiently supported by the available data, as our experiments do not directly address protein stability. To address this issue, we plan to overexpress Nvj1 in msc1∆ cells and examine the expression and localization of NVJ-associated proteins including Nsg1 and Nsg2. Based on the results obtained from these additional experiments, we will revise the Figure 3 title and discuss these possibilities more carefully in the revised manuscript.
  
  Regarding the quantification of Tsc13-GFP localization in log-growing cells, although the NVJ signal is relatively small and weak under these conditions, we confirmed the signal carefully during quantification. In addition, we consider this dataset important because it suggests that the effect of Msc1 is relatively limited during logarithmic growth. Therefore, we currently prefer to retain these data in the revised manuscript.
  
  As suggested by the reviewer, we will revise the Figure 3 title to more directly describe the observed phenotypes.
  
  We will also examine Tsc13 protein levels in msc1∆ cells using the same Tsc13-GFP strain and growth conditions used for the microscopy analyses. In addition, we will quantitatively analyze expression levels of Ypf1 and other NVJ-associated proteins in msc1∆ cells and discuss how these changes may contribute to the observed localization phenotypes.
  
  *Figure 4. Figure 4 A shows mRNA levels in glucose starved cells compared to log-.growing cells for MSC1, NVJ1 and YPF1. - Comment: I would move Figure 4 A to Figure 1. Figure 4 B shows mRNA levels of proteins expressed in WT and msc1D mutant strain, in log-growing cells in under glucose starvation. The data show that the loss of Msc1 leads to a decrease in NVJ1 mRNA under the conditions of glucose starvation. Th expression of other NVJ proteins analyzed are not affected. - Comment: Would this Figure 4 A-B better fit together with the data showing Nvj1 levels in the msc1D mutant from a previous figure (3 A)? *
  
  *Figure 4 C shows PI staining of cells after 5 days of glucose starvation. The loss of Msc1 leads to a double increase in PI-positive cells (in contrast to the nvj1D mutant, which is similar to WT), indicating that the viability of cells after 5 days of glucose starvation is decreased in the absence of Msc1. - Comment: Since there is no phenotype of nvj1D, this is likely not due to the non-functional NVJ, but another function of Msc1 - the question is which. This could be discussed in the Discussion. - Comment: This is informative, however it is not sure why this data is placed together with the mRNA data within the Figure 4. *
  
  We appreciate these suggestions and agree that the figure organization could be improved. Following the reviewer's recommendations, and taking into account the results of the additional experiments described above, we will reorganize the figure layout to better align related datasets and improve the overall flow of the manuscript. We also agree that the increased PI staining observed in msc1∆ cells is unlikely to be explained solely by loss of NVJ function, since nvj1∆ cells do not show a comparable phenotype. We will therefore discuss this point more carefully in the revised Discussion and consider additional functions of Msc1 that may contribute to cell survival during glucose starvation.
  
  Figure S1. - Comment - as in Figure 2 - Msc1-GFP has a much stronger signal in elo3D mutant, than in WT, which could influence (or likely influences) the measured area. Perhaps one way to test this is to image WT cells with higher % of laser "a "longer exposition"), to get a stronger signal similar to that seen in the elo3D mutant, and then repeat the quantification.
  
  Taken the result as it is presently, I suggest taking the Figure S1 out.
  
  As discussed above for Figure 2C, increased Msc1-GFP levels in elo3∆ cells could influence the apparent increase in NVJ area. We agree that this analysis is not central to the main conclusions of the current manuscript. Therefore, together with the additional experiments described above, we will re-evaluate the organization of the supplementary figures and revise the figure layout accordingly. Based on the revised dataset, we will determine whether Figure S1 should be removed, relocated, or incorporated into a more appropriate context in the revised manuscript.
  
  *Figure S3 . Validation of anti-Msc1 antibody - Could be moved as S1. *
  
  We will move the current Figure S3 to Figure S1 in the revised manuscript.
  
  Reviewer #3
  
  *Summary: In this study, the authors identify Msc1 as a factor associated with nucleus-vacuole junctions (NVJs) during glucose starvation. Using Saccharomyces cerevisiae as a model system, and combining immunoblotting and microscopy approaches, they report a functional connection between Msc1 and the NVJ component Nvj1.
  
  Major comments: - Are the key conclusions convincing? Overall, the main conclusions are largely convincing. However, several interpretations are overstated and should be phrased more cautiously (see specific comments below).
  
  Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Yes. In several instances, the data support correlation rather than causation, and the authors should clearly indicate when conclusions are speculative.
  
  Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. For some conclusions, either:
  
  the interpretation should be weakened, or
  
  additional experiments are needed to fully support the claims
  
  Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. If Western blot membranes are available, additional controls could likely be addressed by reprobing, which would require minimal effort and a short timeframe. Suggested microscopy experiments would require strain construction and are therefore expected to take approximately 2-3 weeks.
  
  Are the data and the methods presented in such a way that they can be reproduced? Some methodological details are insufficiently described and should be clarified to ensure reproducibility.
  
  Are the experiments adequately replicated and statistical analysis adequate? The authors do not specify which tests for normality were performed. It is therefore difficult to assess whether the use of Student's t-test is appropriate. In at least one case (comparison of three groups), a t-test is not appropriate and should be replaced with a suitable multiple-comparison test.
  
  Minor comments: - Specific experimental issues that are easily addressable. See below
  
  Are prior studies referenced appropriately? Yes, mostly/ The authors should provide a reference supporting NVJ expansion during nitrogen starvation.
  
  Are the text and figures clear and accurate? Yes
  
  Do you have suggestions that would help the authors improve the presentation of their data and conclusions? see below *
  
  We appreciate the reviewer's overall positive evaluation of our study and the recognition that the main conclusions are largely convincing. We also appreciate the reviewer's careful and constructive suggestions regarding interpretation, experimental support, and presentation of the data. As also pointed out by other reviewers, several interpretations in the current manuscript may extend beyond what is directly supported by the available data. We will therefore revise the manuscript throughout to more clearly distinguish between observations directly supported by the data and more speculative interpretations, and to avoid overstating our conclusions. In addition, we are currently performing several additional experiments, including complementation analyses, Nvj1 overexpression experiments, quantitative protein analyses, and additional localization studies, which may further strengthen some of the interpretations. We will also revise the Methods section to provide more detailed information regarding experimental procedures, statistical analyses, and reproducibility. In addition, we will reanalyze the data using appropriate statistical methods where necessary. In addition, we will revise and reorganize several figures and figure legends, and methodological details, and improve the overall presentation and clarity of the manuscript. We will also add references regarding NVJ expansion during nitrogen starvation as suggested by the reviewer.
  
  *Figures and data presentation • Figure 1A: The image is difficult to interpret. The authors should improve visibility, for example by: o using grayscale instead of magenta/green for single channels, or o applying an intensity LUT. This is particularly important as the Nvj1 signal is barely visible.
  
  *
  
  We will revise Figure 1A to improve visibility of the fluorescence signals, including the Nvj1 signal, by adjusting the image presentation methods as suggested by the reviewer.
  
  Figure 1B: The use of Tim23 as a loading control is not appropriate. The authors should justify why a mitochondrial protein was used as a reference.* Although Tim23 is a mitochondrial protein, we previously confirmed that its abundance is not substantially affected by glucose starvation conditions and therefore serves as a suitable loading control in this experimental setting (Fujimoto and Tamura, J. Cell Biol., 2026). In the revised manuscript, we will clarify the rationale for using Tim23 as a loading control. We will also normalize immunoblot signals to Tim23 and explicitly state this in the text.
  
  Figure 1C: The experimental design and interpretation are problematic: o Using an ER protein together with mitochondrial markers in the proteinase K protection assay is not appropriate for the stated conclusions. * Because ER and mitochondrial membranes are both present in the membrane fraction used for the proteinase K protection assay, we believe that mitochondrial marker proteins can still serve as controls for proteinase K accessibility. However, we agree that the integrity of the ER membrane itself was not directly assessed in the current experiment. We therefore plan to repeat the experiment using appropriate ER membrane protein controls.
  
  *o The claim that Msc1 is not an integral membrane protein is not sufficiently supported, particularly if a polyclonal antibody was used. *
  
  Similar concerns regarding the topology and membrane association of Msc1 were also raised by other reviewers. To address these issues, we are currently performing additional experiments, including detailed analyses of the N-terminal region of Msc1 and further localization studies (see also our response to the first comment from Reviewer #1). We also plan to examine the fission yeast Msc1 homolog Les1, whose localization has been analyzed in greater detail previously (Asakawa et al. Genes Cells. 2022, 27(11):643-656. doi: 10.1111/gtc.12981). In addition, we are currently investigating NVJ-associated proteins that interact with Msc1, which may provide further mechanistic insight into the localization and function of Msc1 at the NVJ.
  
  *o The authors should provide additional evidence for localization (or use alternative approaches). *
  
  As also mentioned in our response to Reviewer #1, we plan to perform additional localization analyses using a split-GFP approach. Specifically, we will express GFP1-10 either within the ER lumen or within the nucleoplasm and examine in which compartment co-expression of Msc1-GFP11 results in GFP fluorescence.
  
  Figure 1D: o The authors conclude that deletion of NVJ1 and VAC8 reduces Msc1 colocalization. However, an alternative explanation is that NVJs are not formed under these conditions. o This conclusion should therefore be phrased more cautiously. Alternatively, a known NVJ marker should be included to demonstrate NVJ formation. *
  
  We agree that reduced Msc1 localization in nvj1∆ and vac8∆ cells could simply reflect impaired NVJ formation itself. To address this possibility, we plan to examine NVJ formation in these mutants using split-GFP-based NVJ probes that we previously developed (Tashiro et al., Front Cell Dev Biol. 2020, doi: 10.3389/fcell.2020.571388). If NVJ formation is indeed disrupted under these conditions, we will revise the interpretation more cautiously.
  
  *o The argument involving Ypf1 is weak, as the observed effect could be indirect and mediated via another factor. *
  
  The relationship between Msc1 and Ypf1 will be described more cautiously in the revised manuscript.
  
  Figure 2B: The statistical analysis (Student's t-test) is not appropriate for the dataset presented.* The statistical analysis for Figure 2B will be revised using a more appropriate method.
  
  Additional point: The authors again use a mitochondrial protein as a loading control in Figure 1D, which requires justification. As mentioned above, Tim23 used as a loading control was selected based on our previous study showing that its abundance remains unchanged during glucose starvation (Fujimoto and Tamura, J. Cell Biol.* 2026). This explanation will be added to the revised manuscript.
  
  *Conceptual interpretation • The link between transcriptional reprogramming and NVJ remodeling is not convincingly demonstrated. The data suggest a temporal correlation but do not establish causality. • The PI staining experiments show increased cell death in the absence of Msc1. However, a causal relationship to NVJ function is not demonstrated. An alternative explanation (e.g., an additional role of Msc1 in processes such as DNA repair) should be considered or discussed. *
  
  These points will be appropriately discussed in the revised manuscript, taking into account the results of additional experiments, including those examining the effects of Nvj1 expression in msc1∆ cells.
  
  The claim that Msc1 localizes to the perinuclear space is not sufficiently supported: o Appropriate ER/nuclear envelope controls are missing. As noted above, we will perform additional split-GFP-based analyses to further investigate the localization of Msc1.
  
  we will perform additional experiments to further examine the membrane topology of Msc1, including controls using antibodies against ER proteins and alkaline extraction analysis of Les1, a fission yeast homolog of Msc1 with a characterized membrane topology. In addition, we will test whether Les1 can complement the msc1∆ mutant.
  
  *o As an alternative, structural predictions (e.g., transmembrane helix prediction) could strengthen this claim. *
  
  The N-terminal region of Msc1 is predicted to function as a weak transmembrane segment and a signal sequence. We will incorporate these predictions into the revised manuscript and perform additional experiments to examine the topology and potential processing of this region as mentioned above.
  
  *Literature and references • The authors should provide a reference supporting NVJ expansion during nitrogen starvation.
  
  *
  
  The appropriate reference will be cited in the revised manuscript.
  
  *Methods • The antibody section is incomplete; all antibodies used need to be specified. *
  
  The antibody information will be completed in the revised Methods section.
  
  Cultivation conditions require more detail: o duration of growth o timing and conditions of glucose starvation shift
  
  * The cultivation conditions will be described in greater detail in the revised Methods section.
  
  2. Description of the revisions that have already been incorporated in the transferred manuscript
  
  Reviewer#1
  
  Previous reports of Msc1 in patches (page 3): the citation of Breker et al (LOQATE) seems wrong because that database shows Msc1 at the ER not at NVJ; Medina-Suarez et al is also not great: it shows NE w some patches - not high penetrance + some cER. So I suggest the authors simply rely on their own BioRxiv paper.* *
  
  We agree that the LOQATE database only weakly shows punctate localization of Msc1 and will therefore remove this citation. However, we believe that Medina-Suarez et al. still provides relevant support because Msc1 exhibits a localization pattern resembling the NVJ in a subset of cells, and therefore we plan to retain this reference.
  
  Table S1: needs Msc1-GFP adding to some lines.
  
  We revised Table S1 accordingly.
  
  *Avoid unnecessary abbreviations: GS creates a novel word that has no obvious meaning and makes the manuscript hard to read rapidly. It would be better to use "glucose starvation" in all cases, especially the abstract. *
  
  P7: "These results indicate that loss of Msc1 impairs NVJ function more severely than loss of Nvj1 alone." Here NVJ function might not be the target of Msc1 deletion, since nvj1-deletion does not show increased cell death. Also, in general very little is known about NVJ function as very few phenotypes can be pinned down to loss of the NVJ. Better here to say "cell function" (that may involve some aspect of Msc1's interactions at NVJs) instead.
  
  We revised the wording accordingly.
  
  Reviewer#2
  
  *- It is not certain what the term "stability of multiple NVJ proteins" means. Could another term be used, or this explained? *
  
  We agree that the term "stability" could imply a specific mechanism that is not directly demonstrated in our study. Therefore, we have revised the text to more accurately reflect our findings by referring to the abundance of NVJ proteins rather than their stability: "Together with these observations, our results suggest that Msc1 plays a central role in maintaining the abundance of multiple NVJ proteins, including Nvj1, Ypf1, Nsg1, and Nsg2, during glucose starvation."
  
  *o The title of the Figure 2 is: "Snf1 signaling and VLCFA metabolism modulate NVJ partitioning of Msc1" - what is "NVJ partitioning" - for me it would be clearer to write "Snf1 signaling and VLCFA metabolism modulate the localization of Msc1 to NVJ" *
  
  As suggested by the reviewer, we will revise the Figure 2 title from "Snf1 signaling and VLCFA metabolism modulate NVJ partitioning of Msc1" to "Snf1 signaling and VLCFA metabolism modulate the localization of Msc1 to the NVJ."
  
  *Figure 1 A and B shows that Msc1-GFP expression is upregulated in cells starved for glucose for 24h, but not in nitrogen-starved cells. - Comments: o Is Tim23 used as a loading control? If yes, it should be stated in the figure legends and/ or main text. *
  
  We have revised the figure legends to indicate that Tim23 was used as a loading control.*
  
  *
  
  *
  
  *o Which antibody is used for Western in B? *
  
  We have revised the figure legend to specify that the immunoblots shown in Fig. B were probed with anti-Msc1, anti-Nvj1, anti-Ypf1, and anti-Tim23 antibodies.
  
  * - Comment: It would be helpful to explain the abbreviation "PK" in Figure 1C Figure legend. *
  
  We have revised the Figure 1C legend to define PK as proteinase K.
  
  * Figure 1 D: Msc1-GFP localization to the NVJ is dependent on Nvj1, Vac8, but not Nsg1 and 2 and Ypf1 - Comment: a typo: "(D) Fluorescence microscopy images of the indicates strains..." should be "indicated". - Comment: "Single focal planes were shown." Would be better in present tense "are shown". *
  
  We have corrected "indicates" to "indicated" and revised "Single focal planes were shown" to "Single focal planes are shown" in the Figure 1D legend.
  
  *Figure S2. - The list of genes analyzed and the conditions analyzed are different in the figure and in the legend. Probably the figure is correct. *
  
  We revised the Figure S2 legend.
  
  3. Description of analyses that authors prefer not to carry out
  
  Reviewer#1
  
  Comment:
  
  Function/structural form: the manuscript is light on describing what Msc1 is: it shares the same repeat structure that has been described in Ish1/Les1. The S pombe work described the repeats wrongly as motifs, when AlphaFold2 confidently predicts them as structurally characteristic domains with 2 parallel helices separated by a loop. It would be interesting to speculate a bit on how these might function in the NVJ. One major mystery of the NVJ is the extreme uniformity, shown especially well by cryo-ET (MIllen et al (2008) 10.1111/j.1600-0854.2008.00789.x). This suggests some long-range oligomerisation: is it possible that Msc1 provides that? Possible experiments include expressing Pombe Ish1/Les1 either whole or chimeras with Msc1 to see if they function and are extractable. If that is not to be done here it should at least be discussed.
  
  Regarding the suggested experiments using S. pombe Ish1/Les1 or chimeric constructs, we agree that these would be interesting approaches. However, because we plan to prioritize additional analyses of Msc1 topology, including detailed characterization of the N-terminal region and repeated alkaline carbonate extraction experiments using ER membrane protein controls, we do not currently plan to pursue extensive chimera-based functional analyses within the scope of the present revision.
  
  With respect to the possibility that Msc1 contributes to long-range oligomerization underlying the structural uniformity of the NVJ, we currently consider this possibility less likely. In density-gradient centrifugation analyses performed under mild detergent conditions, the apparent molecular size of Msc1 was not particularly large, and we therefore did not obtain evidence supporting formation of a stable large oligomeric complex by Msc1.
  
  Reviewer#2
  
  - The authors refer to a previous study showing that nvj1D deletion does not affect protein levels of several NVJ proteins, however, it would be nice to have this data shown here - i.e. the localization of Tsc13, Ypf1 (and Hmg2) in the nvj1D mutant, especially since the study cited has not been peer-reviewed yet: "Notably, our previous work showed that loss of Nvj1 or Ypf1 does not affect the protein levels of each other or those of other NVJ-associated factors such as Nsg1 and Nsg2 (Fujimoto and Tamura, 2025)."
  
  We believe this point may reflect two partially distinct issues: (i) whether loss of Nvj1 affects the protein levels of NVJ-associated factors, and (ii) whether loss of Nvj1 affects their NVJ localization. In our previous study, we showed that loss of Nvj1 does not affect the protein levels of Ypf1, Nsg1, or Nsg2, whereas their NVJ localization does require Nvj1 (Fujimoto and Tamura, 2026; J Cell Biol. 225. doi:10.1083/jcb.202506071, now published). In addition, a previous study demonstrated that Tsc13 localizes to the NVJ in an Nvj1-dependent manner (Kvam et al., 2005). We also showed that loss of Ypf1 prevents efficient accumulation of Hmg2 at the NVJ (Fujimoto and Tamura, J. Cell Biol. 2026). We therefore believe that these localization dependencies have already been sufficiently established in previous studies.
  
  PeerReviewed
2. EMBOpress 16 Jun 2026
  
  in Review Commons
  
  Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Referee #2
  
  Evidence, reproducibility and clarity
  
  The study describes the finding of the nuclear envelope protein Msc1 as a new component of the membrane contact site nucleus vacuole junction (NVJ) under the conditions of glucose starvation. Msc1 has previously only been known as a nuclear envelope protein, presumably localizing to the nuclear lumen, and its role in DNA damage repair. The main finding of this study is the glucose starvation-induced upregulation and NVJ-localization of Msc1 (Figure 1). The second main finding is that the loss of Msc1 results in an impaired induction of the expression of Nvj1 (the main component of the NVJ, responsible for the formation of NVJ via direct interaction with Vac8) upon glucose starvation (Fig. 3 A). The effect of Msc1-loss on the Nvj1 expression levels is transcriptional (Fig. 4 B). The glucose starvation-mediated expression induction of some other previously identified NVJ components, Nsg1 and Nsg2 is also impaired in the msc1D mutant, while the expression of Ypf1 is affected to a lesser degree. The data supporting these two main findings are solid (Figure 1; Figure 3 A; Figure 4 A, B).
  
  The study further shows that the loss of Msc1 results in a loss of NVJ-localization of NVJ components Tsc13, Ypf1 and to a lesser degree Hmg2. The microscopy data looks solid, however the interpretation of this finding is not clear. In my view, the most likely explanation is that the effect of Msc1 loss on the localization of NVJ components to the NVJ is due to the impaired glucose starvation-induced Nvj1 expression in the msc1D mutant.
  
  Major comments:
  
  Here are suggested experiments that would strengthen the study:
  
  It is difficult to imagine how a NE protein could affect expression levels of other NVj proteins - this key finding would be supported by a complementation experiment where MSC1 is expressed from a vector - to test whether this rescues the phenotype (to make sure that the observed phenotype is not due to an off-target effect of msc1D deletion)
  
  If technically feasible under the glucose starvation conditions, this hypothesis could be tested by overexpressing Nvj1 from an inducible or some other promoter.
  
  The authors refer to a previous study showing that nvj1D deletion does not affect protein levels of several NVJ proteins, however, it would be nice to have this data shown here - i.e. the localization of Tsc13, Ypf1 (and Hmg2) in the nvj1D mutant, especially since the study cited has not been peer-reviewed yet: "Notably, our previous work showed that loss of Nvj1 or Ypf1 does not affect the protein levels of each other or those of other NVJ-associated factors such as Nsg1 and Nsg2 (Fujimoto and Tamura, 2025)."
  
  The effect of msc1D deletion on Tsc13 proteins levels (preferentially using the same Tsc13-GFP strain as used in microscopy - anti Tsc13 or anti-GFP antibodies could be used)
  
  Other Major comments:
  
  The results concerning the localization of Msc1-GFP in elo3D mutant have been interpreted as "accelerated localization", "expansion of the the size of Msc1-NVJ domain" etc. However, the levels of Msc1-GFP in the elo3D mutant are higher compared to WT (Figure 2 D). Considering this, it is very likely that the larger surface area measured in the elo3D mutant is a consequence of this. This could be potentially checked by comparing images set of WT and elo3D that are set to a similar fluorescence intensity. In any case, this possibility should be definitely addressed in the interpretation of the result.
  
  There is an impression that the data has been overinterpreted, and the conclusions should be written much more carefully. Examples:
  
  "Here, we show that Msc1 is a GS-responsive NVJ factor that plays an important role in functional NVJ remodeling." - based on data shown, the effect of Msc1 could be indirect. The statement above should be re-written or argumented much better.
  
  "we find that GS-dependent induction of NVJ1 transcription is attenuated in msc1Δ cells, suggesting that proper NVJ remodeling contributes to the execution of stress-responsive transcriptional programs" - this is unclear; which data support this?
  
  "Together, these findings position Msc1 as an upstream regulator linking GS signaling to functional maturation of the NVJ and associated cellular adaptation responses." - same comment as above
  
  "...suggesting that Msc1 functions as a GS-responsive regulator of NVJ functions."
  
  "...these findings suggest that Msc1 acts upstream of Ypf1 in orchestrating GS-induced NVJ functional maturation."
  
  "Collectively, these results indicate that Snf1 acts upstream of Msc1 to drive GS-induced NVJ remodeling, whereas reduced Elo3 activity further accelerates this process and promotes Msc1 accumulation." - not sure if the available data support this.
  
  "These results indicate that although Msc1 ...... it is required for efficient GS-dependent functional maturation of the NVJ domain."
  
  "These observations suggest that loss of Msc1 does not cause a general defect in transcriptional activation but rather impairs the proper execution and dynamic range of GS-dependent transcriptional responses." - this is unclear
  
  "Within this context, the robust induction of NVJ1 appears to be particularly sensitive to Msc1 deficiency." - this sentence would benefit from being re-written.
  
  "Together, these results indicate that Msc1 contributes to transcriptional reprogramming associated with NVJ remodeling during GS." - this sounds overstated.
  
  "the observation that loss of Msc1 attenuates GS-dependent induction of NVJ1 raises the possibility that NVJ remodeling influences stress-responsive gene expression programs."
  
  It is not certain what the term "stability of multiple NVJ proteins" means. Could another term be used, or this explained?
  
  OTHER COMMENTS FIGURE BY FIGURE - SOME ARE MAJOR (overlapping to the above comments), SOME ARE MINOR:
  
  Figure 1: Figure 1 A and B shows that Msc1-GFP expression is upregulated in cells starved for glucose for 24h, but not in nitrogen-starved cells. - Comments: o Is Tim23 used as a loading control? If yes, it should be stated in the figure legends and/ or main text. o Size of the markers (protein ladder) would be helpful. o Which antibody is used for Western in B? - Comment: It would be helpful to explain the abbreviation "PK" in Figure 1C Figure legend. Figure 1 D: Msc1-GFP localization to the NVJ is dependent on Nvj1, Vac8, but not Nsg1 and 2 and Ypf1 - Comment: a typo: "(D) Fluorescence microscopy images of the indicates strains..." should be "indicated". - Comment: "Single focal planes were shown." Would be better in present tense "are shown".
  
  Figure 2: - Comment: It is not clear if these are the same strains as analyzed by microscopy (GFP-tagged Msc1). This should be specified in the Figure legend 2 D. - Comment: o Since the levels of Msc1-GFP in the elo3D mutant are higher compared to WT (Figure 2 D), the larger surface area measured in C may be a consequence of this. o It is not clear if Figure A and D analyze the same strains (western blot and microscopy - do both show GFP-tagged Msc1? - using anti-GFP?). This should be specified in the Figure legend 2 D. Since the increased area measured in Figure 2 C could be due to increased Msc1-GFP levels in this mutant strain, the WB should check the levels of Msc1-GFP in the same strain and under same conditions as analyzed in Figure 2 C. o The title of the Figure 2 is: "Snf1 signaling and VLCFA metabolism modulate NVJ partitioning of Msc1" - what is "NVJ partitioning" - for me it would be clearer to write "Snf1 signaling and VLCFA metabolism modulate the localization of Msc1 to NVJ" o Does Tim23 serve as a loading control in Figure 2 D? o Would be good to have protein ladder sized marked in Western blots o Since the increase in Msc1 levels in the elo3D mutant could be significant for the interpretation of the results, it would be helpful to have quantification of the protein levels in WB (normalized to a loading control).
  
  Figure 3 Together these data show that localization of other NVJ-proteins to the NVJ depends on the presence of Msc1. Comment: - From the available data it is possible that Msc1 recruits these components by direct interaction, or by modifying the structure of NVJ, or functions in an indirect manner - this should be discussed in the Discussion. Comment: - The signal of Tsc1-GFP in log-growing cells is very weak, therefore the quantification may be unreliable. I would remove this condition (log-grown cells) form the quantification in C) due to the low signal, since it is not crucial to the interpretation of the data. If the authors prefer to leave it, that is fine. - The title of the Figure 3 is "Msc1 supports stability and recruitment of NVJ-associated proteins" - I am not sure what "stability" is; the data don't address stability or recruitment in a direct manner - I suggest to change the figure title into a statement describing what is shown in the Figure, for example: "The loss of Msc1 results in decreased Nvj1 levels and a decreased localization of NVJ proteins to the NVJ). And have a comment that this data suggests that Msc1 supports recruitment of NVJ-associated proteins, likely in an indirect manner, based on the finding that the loss of Msc1 leads to a lower expression of Nvj1, in the main text (e.g. in the Discussion). - Is it possible that the loss of Msc1 on the loss of NVJ-localized Tsc13 is due to the downregulation of Tsc13 expression? Considering the effect of msc1D deletion on the expression of some NVJ proteins (Figure 3 A), Tsc13 expression levels would be good to be checked, considering the effect of msc1D on Tsc13-GFP localization. It would be optimal to do the WB with the same Tsc13-GFP-expressing strain and under the same growth conditions as was used in the microscopy in the Figure 3 B. - Expression levels of Ypf1 are lower in the msc1D strain, than in the WT (Fig. 3 A) - could this affect lower NVJ-area in his mutant? (Fig. 3 B)
  
  Figure 4. Figure 4 A shows mRNA levels in glucose starved cells compared to log-.growing cells for MSC1, NVJ1 and YPF1. - Comment: I would move Figure 4 A to Figure 1. Figure 4 B shows mRNA levels of proteins expressed in WT and msc1D mutant strain, in log-growing cells in under glucose starvation. The data show that the loss of Msc1 leads to a decrease in NVJ1 mRNA under the conditions of glucose starvation. Th expression of other NVJ proteins analyzed are not affected. - Comment: Would this Figure 4 A-B better fit together with the data showing Nvj1 levels in the msc1D mutant from a previous figure (3 A)? Figure 4 C shows PI staining of cells after 5 days of glucose starvation. The loss of Msc1 leads to a double increase in PI-positive cells (in contrast to the nvj1D mutant, which is similar to WT), indicating that the viability of cells after 5 days of glucose starvation is decreased in the absence of Msc1. - Comment: Since there is no phenotype of nvj1D, this is likely not due to the non-functional NVJ, but another function of Msc1 - the question is which. This could be discussed in the Discussion. - Comment: This is informative, however it is not sure why this data is placed together with the mRNA data within the Figure 4.
  
  Figure S1. - Comment - as in Figure 2 - Msc1-GFP has a much stronger signal in elo3D mutant, than in WT, which could influence (or likely influences) the measured area. Perhaps one way to test this is to image WT cells with higher % of laser "a "longer exposition"), to get a stronger signal similar to that seen in the elo3D mutant, and then repeat the quantification. - Taken the result as it is presently, I suggest taking the Figure S1 out. Figure S2. - The list of genes analyzed and the conditions analyzed are different in the figure and in the legend. Probably the figure is correct. Figure S3 . Validation of anti-Msc1 antibody - Could be moved as S1.
  
  *Referee cross-commenting
  
  Rev#1:
  
  I generally agree with the other reviewers. I found an error (?typo) in one thing Reviewer 3 says about Fig 1C: "The claim that Msc1 is an integral membrane protein is not sufficiently supported, particularly if a polyclonal antibody was used." I think they mean: "The claim that Msc1 is NOT an integral membrane protein is not sufficiently supported, particularly if a polyclonal antibody was used." I see that my own review has lots of typos - I will write separately to the editor about those.
  
  Rev#2:
  
  I agree with the Reviewer 3 that the link between transcriptional reprogramming and NVJ remodeling is not convincingly demonstrated.
  
  I agree with the Reviewer 3 that the localization of Msc1 to the perinuclear space is not sufficiently supported. The authors may re-write the conclusion to include this uncertainty, or add experimental data.
  
  I am not sure if I agree with the Reviewer 1 in that the loss of Msc1 leads to the downregulation of Nvj1 "mostly through destabilisation since the transcriptional effect is marginal". Available data does not include the quantification of the Nvj1 protein levels in the msc1- mutant compared to WT, therefore, it is presently unclear how large the downregulation at the protein level is.
  
  I agree with the Reviewer 3 that the Methods section needs a more detailed description, especially of the growth conditions and glucose starvation protocol (at which OD600 were cells diluted to, were cells washed prior to media change, etc.).
  
  Rev#3:
  
  I find Reviewer 2's suggestion of a complementation experiment compelling; this assay would require minimal additional effort and would help exclude off-target effects of the msc1Δ phenotype.
  
  I agree with Reviewer 1 that the use of "GS" is unnecessary and hinders readability; "glucose starvation" should be used throughout.
  
  I agree with Reviewer 1 that a more thorough comparison with homologous proteins in S. pombe (Ish1/Les1), including topology and functional parallels, would substantially strengthen the manuscript.
  
  I thank Reviewer 1 for identifying the misleading phrasing regarding integral versus associated membrane proteins. However, I maintain that the assay in Figure 1C still requires stronger support.
  
  Significance
  
  General assessment - strenghts and limitations:
  
  The identification of Msc1 as a new glucose starvation-induced protein that localizes to the NVJ is supported by strong data and represents a novel and a strong point of the paper. Furthermore, the finding that the loss of Msc1 results in the impaired expression of several other NVJ-localized proteins under glucose starvation is convincing, although the solidity of this latter data requires some more experimental controls (detailed above). The weak point is the interpretation of the Msc1 loss on the localization of other NVJ proteins - the present conclusions need to be modified, or supported by the additional experimental data.
  
  Advance - compare the study to existing knowledge - does it fill the gap?
  
  The study identifies protein Msc1, which was previously known as a nuclear envelope protein involved in DNA damage repair, as a new component of the membrane contact site nucleus vacuole junction (NVJ), whose expression and the localization to the NVJ is induced by glucose starvation. What kind of advance does it make - conceptual; incremental...? The finding that a nuclear lumen protein, which is required for DNA damage repair, under certain circumstances (glucose starvation) changes localization and potentially has new roles, has a potential of a conceptual advance, however, for that, more experimental data would be needed, specifically to determine the mechanistic role of Msc1 in glucose starvation, and compare it to it role in DNA damage response. The available data supports mainly an incremental advance in our understanding of the structure and regulation of the NVJ.
  
  Audience - broad; specialized; basic research...? The audience of this paper will be interested in the basic research. Especially interested may be scientists working with yeast.
  
  Describe your expertise: My expertise is in yeast genetics, in the field of degradation-mediated protein quality control.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.03.13.711511
www.biorxiv.org www.biorxiv.org

The insulin / IGF axis is critically important for controlling gene transcription in the podocyte

1
1. Public_Reviews 15 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, where both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated and studied podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism of cell death, the authors performed global proteomics and found that spliceosome proteins were downregulated. They confirmed this directly by using long-read sequencing. These results suggest a novel role for insulin and IGF1R signaling in RNA splicing in podocytes.
  
  This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling regulates splicing is not directly addressed but implicates potentially the phosphorylation downstream of these receptors. In the revised manuscript, it is shown that the mouse KO is incomplete potentially explaining the slow onset of renal insufficiency. Direct measurement of GFR and serial serum creatinines might also enhance our understanding of progression of disease, proteinuria is a strong sign of renal injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful but may be masked by defects in other spliceosome genes. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.
  
  Significance:
  
  With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism.
  
  Latest comments:
  
  The new reviewer raised two major points, whether the KO effect on splicing is specific to IGF1 and whether the interpretation could be developmental rather than due to splicing. The reviewer raises some important issues but the evidence to suggest that this is specific is data in the literature that IR/IGF signaling is already known to regulate splicing and that splicing defects were not detected in other models that they have analyzed. I agree with the reviewer (and authors) that the incomplete floxing of the genes is a major complication. The point that there could be a developmental defect with mice being born with fewer podocytes and perhaps the authors should caveat this point. The fact that they mice are born with normal function, that renal function can be maintained with up to 80% loss of podocytes suggest that they are likely born with a good number of podocytes and the dysfunction that occurs at 6 months is due to a process, induced by the loss of IR/IGF signaling that is detrimental to the podocyte.
  
  Thank you for these insightful comments. We fully acknowledge that the mouse model will not have had full insulin receptor and IGF1R knockdown and that this is likely the reason it took time to develop and not give a prominent early phenotype. We agree with this reviewer and new reviewer 4 that if the model had facilitated near complete IR and IGF1R knockdown then likely a significant neonatal / embryonic phenotype would have been obvious. We considered using an inducible mouse model to allow normal development before cre-excision but our experience is that the CreER and RtTA-tet-on-cre system is less good at excising genes and hence did not pursue this (we show evidence of reduced excision with an inducible system in supplementary Figure 1D using a reporter mouse system [this was included in a previous response to the reviewers only]). This was rationale for making the immortalised podocyte floxed IR and IGF1R cell line to ensure near complete knockdown. This, not surprisingly, was highly detrimental. We then looked mechanistically for pathways (using agnostic proteomics and phospho-proteomics) and found spliceosomal involvement. From our studies we think this was also involved in our mouse model as SF3B4 was found to be significantly down regulated in the podocytes of double receptor knockdown transgenic mice (Figure 3F).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.
  
  Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.
  
  Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.
  
  Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.
  
  The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, bars/superimposed dot-plots.
  
  Methods are generally well described.
  
  Comments on previous version:
  
  Coward and colleagues have done an excellent job of responding to all the reviewer comments.
  
  Thank you.
  
  Reviewer #4 (Public review):
  
  Summary and background:
  
  This report entitled "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte" from Hurcombe et al is based on a mouse double knockdown of the IR and IGF1R and a parallel cultured mouse podocyte model. Insulin/IGF signaling system in mammals evolved as three gene reduplicated peptides (insulin, IGF-1, and IGF-2) and their two receptors IR and IGF1R that cross-react to variable extents with the peptides, are ubiquitously expressed, and signal through parallel pathways. The major downstream effect of insulin is to regulate glucose uptake and metabolism, while that of the IGF pathways is to regulate growth and cell cycling in part through mTORC1. The GH-IGF-1-IGF1R pathway regulates post-natal growth. IGF-2 signaling is thought to play a major role in regulating intrauterine growth and development, although IGF-2 is also present at high levels in post-natal life. Thus, one would anticipate that reducing IR/IGF1R signaling in any cell would slow growth and cell cycling by reducing growth factor and metabolic mTORC1-mediated and other processes including the splicing of RNA for protein synthesis.
  
  Thank you for this clear overview. Of note the podocyte is a terminally differentiated cell so the growth / cell cycling elements may be different from more proliferative cell types in relation IR/IGF1R mediated signalling.
  
  Comments on revised version:
  
  The second sentence of the Summary reads "This study sought to elucidate the compound role of the insulin/IGF1 axis in podocytes using transgenic mice and cell culture models deficient in both receptors." The study design and rationale for the proteosome analysis described is predicated on the finding that podocyte-specific knockdown of the IR/IGF-1R in mice is associated with development of proteinuria and reduced eGFR by 20months of life. Since the IR/IGF-1R are critically required for normal development and growth of all cells and organs, the obvious explanation for the observation would be that the model system results in defective podocyte development and deployment (caused by reduced IR/IGF-1) that, in turn, causes subsequent development of proteinuria and glomerulosclerosis (that may be much less dependent on a normal level of IR/IGF-1R expression). Thus, the experimental design does not allow a distinction between podocyte development and steady state function which are different biologic processes. The data provided does not examine podocyte status immediately after birth to confirm that podocyte number and size and structure is normal in mice that subsequently develop proteinuria and glomerulosclerosis. The response to the reviewer suggests that since this would require additional mice it has not been undertaken in order to reduce animal usage. This is not a valid argument, particularly when the investigators have not even used state-of-the-art methods to measure podocyte number, size and density in adult mice, key parameters that would be required to interpret their data. Counting podocyte nuclear number in glomerular cross-sections is simply an inadequate method, even if it is used and reported in other journals, and particularly where the examples given to justify its use can hardly be viewed as representing first rate science.
  
  Thank you for these comments. As discussed above we agree that the mouse model was not optimal as despite using a good cre driver we did not consistently knock down both receptors. It was the reason that we made the IR/IGF1R knockdown cell line. Importantly we found with both receptors >80% knocked down that this was highly detrimental and evidence that spliceosomal dysfunction was prominent. Thank you for the comment about methodology of assessing podocyte number which we and other investigators use.
  
  If the absence of studies that would answer the above questions, the investigators should add a sentence to the Discussion dealing with study limitations as follows. "The study design does not allow us to determine whether the primary effect of reduced IR/IGF-1R expression on the phenotype is during in utero and post-natal podocyte development and deployment, during periods of rapid growth when IGF-1 levels are highest, in steady state adult podocytes, or under all of the above conditions".
  
  Thank you. We have added a section describing that we did not investigate the embryonic neonatal early phenotype for more subtle changes in our model. We have also added a sentence saying we would have liked to have used an inducible model but the cre driven excision is less than constitutional driver and we think would have shown either a very mild or no phenotype due to minimal excision.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.20.594973v4
www.medrxiv.org www.medrxiv.org

Theta-Beta Ratio in Attention Deficit Hyperactivity Disorder: A Multiverse Analysis

1
1. Public_Reviews 15 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors address whether theta/beta ratio /TBR) can be used as a clinical biomarker for ADHD.
  
  Strengths:
  
  The data were acquired independently from 2 separate datasets, and there are sufficient subjects for adequate statistical power. The authors applied up-to-date EEG data preprocessing, state-of-the-art feature extraction, and statistical analyses, using a multiverse approach. By testing and comparing all meaningful approaches, defined a priori in the previous meta-analysis, the author convincingly demonstrates that TBR cannot be used as a clinical biomarker, and previous positive results can be explained by interactions between different factors (alpha peak frequency, aperiodic component, age).
  
  Weaknesses:
  
  There are no apparent issues with data, separate datasets, large sample sizes, and state-of-the-art data analysis.
  
  We thank Reviewer #1 for their positive evaluation of our manuscript and for the constructive recommendations. The reviewer did not raise additional comments requiring a point-by-point response beyond the recommendations addressed below.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This manuscript examines whether the theta-beta ratio as derived from EEG data relates to ADHD diagnoses. To do so, it performs a multiverse analysis across a large number of analytical choices, applied to a large EEG dataset, and corroborated in an additional validation set. The results overall show that the TBR is not a reliable indicator of ADHD diagnosis. In discussing the patterns of results across analytical choices, the authors also demonstrate some key points about what appears to be driving the ratio measures, noting that significant results appear to be driven by choices regarding aperiodic-correction and the use of individualized alpha frequencies, suggesting TBR measures can be affected by these features rather than reflecting theta and/or beta activity.
  
  Strengths:
  
  This manuscript addresses a clearly posed and important question in the literature, addressing a longstanding discussion on the relationship between TBR and ADHD, and uses a large dataset and an expansive analysis approach to provide a definitive answer. The strengths of the approach allow for a clear answer, providing a notable contribution to the field.
  
  Weaknesses:
  
  I find no notable weaknesses in the current manuscript nor any major issues that I think challenge the key findings of this manuscript.
  
  We thank Reviewer #2 for their positive evaluation of our manuscript and for the constructive recommendations. The reviewer did not raise additional comments requiring a point-by-point response beyond the recommendations addressed below.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this manuscript, Strzelczyk, Vetsch, and Langer tackle an incredibly important question in clinical neuroscience: the use of the theta/beta ratio as a biomarker of attention deficit hyperactivity disorder (ADHD). The theta/beta ratio is argued to be so reliable as an ADHD biomarker that, in the United States, the Food and Drug Administration has approved its use as a biomarker for ADHD diagnosis. However, there is mounting evidence that the theta/beta ratio is likely not really measuring the relative power between two oscillations - the theta rhythm and the beta rhythm - but rather reflects differences in a singular, non-oscillatory aperiodic process. In this very convincing study, Strzelczyk and colleagues take a "multiverse" analysis approach to show that aperiodic activity differences between healthy controls and people with ADHD are driving the apparent theta/beta ratio differences. While in a vacuum, where a measure is a measure and if it's related to a diagnosis it's still useful no matter what, this distinction might not seem important, from a neuroscientific perspective this is a critical distinction, because the ratio between two oscillations has fundamentally very different underlying physiological mechanisms than aperiodic differences, and this framing has a major impact on guiding research on the diagnosis and treatment of ADHD.
  
  Strengths:
  
  While smaller studies and analyses have already hinted at similar results as shown here, the current study's multiverse analysis approach is comprehensive, convincing, and very well done. The large sample size of 1,499 participants is very impressive, as is the use of an independent validation sample of 381 participants.
  
  Overall, the technical and statistical aspects are very well done: the multiverse approach, the validation set, the resampling methods, and even the shiny apps. The authors should be applauded for being so thorough and making their data and analyses publicly accessible.
  
  Weaknesses:
  
  To be clear, I see no breaking weaknesses in the theoretical foundations, methods, statistical analyses, or interpretations. All of my recommendations below are for the sake of clarity, which I believe is especially important because this is such an important paper that many people should read.
  
  Comments:
  
  (1) Some figures are mislabeled. For example, Supplementary Figure 1 says (C) are scalp topographies, but those are (A), while (C) shows power spectra, but it's unclear what (C) is. I assume it's only the aperiodic part of the spectrum (oscillations removed)? But it would be better to plot on a log-log scale if so. In fact, I recommend showing all spectra on a log-log scale.
  
  The reviewer is correct that the figure legend was mislabeled. Panel (A) shows the scalp topographies, panel (B) shows the 1/f-uncorrected power spectra, and panel (C) shows the reconstructed aperiodic signal with oscillations removed. We have corrected the figure legend accordingly. In addition, the power spectra and the reconstructed aperiodic signal are now plotted on log-log scales to improve readability and interpretability.
  
  (2) Supplementary Figure 6 is also mislabeled, saying (A) shows age (it does not) and so on.
  
  We thank the reviewer for noticing this error. We have revised the figure legend so that the panel descriptions now match the displayed plots.
  
  (3) In Supplementary Figure 7, is (B) the aperiodic-removed spectrum? The authors are very inconsistent with what they're showing in these spectral plots, and not actually explaining what they're showing: raw spectra, semi-logged or not, aperiodic-removed or oscillations-removed, etc.
  
  Panel (B) in Supplementary Figure 7 shows the aperiodic-adjusted spectrum. We have now corrected the figure labeling and revised the figure legend to explicitly state what is shown in each panel.
  
  (4) For the HBN data, it is said that, "electrode impedances were kept below 40 kΩ, lower than EGI's standard recommendation of 50 (Net Station Acquisition Technical Manual)." For the validation data: "... electrode impedances were maintained below 5 kΩ." These are big impedance threshold differences. Of course, these recommendations differ by recording system, the use of active electrodes, and so on. But such differences can certainly influence signal-to-noise. The fact that the results are so consistent between them is a strength that perhaps should be explicitly called out.
  
  We appreciate the reviewer’s suggestion. We now explicitly state in the discussion section that the consistency of the results across datasets with different EEG systems and impedance thresholds strengthens the generalizability of the findings. The revised text reads as follows:
  
  “Our multiverse results thus converge with this broader literature, providing further evidence that TBR lacks the reliability and discriminative validity required for clinical utility. Beyond methodological convergence across analytical frameworks, the consistency of results across two datasets differing substantially in EEG recording systems and impedance thresholds further strengthens the generalizability of these null findings, suggesting they are unlikely to reflect idiosyncrasies of a specific acquisition protocol.”
  
  (5) The authors cite a lot of foundational / related work here, such as Finley et al, but they should also cite several other highly relevant ones:
  
  Saad et al., "Is the Theta/Beta EEG Marker for ADHD Inherently Flawed?", J Atten Disord, 2015
  
  Donoghue, Dominguez, Voytek, "Electrophysiological frequency band ratio measures conflate periodic and aperiodic neural activity", eNeuro, 2020
  
  Karalunas et al., "Electroencephalogram aperiodic power spectral slope can be reliably measured and predicts ADHD risk in early development", Develop Psychobiol, 2022
  
  Donoghue, "A systematic review of aperiodic neural activity in clinical investigations", Eur J
  
  Neurosci 2025
  
  We thank the reviewer for pointing us to these additional relevant references. We have added the suggested references to the revised manuscript.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) "Multiverse analysis was conducted in RStudio (R version 4.4.1) using the multiverse package (version 0.6.1; Sarma et al., 2021). T" Ok, cool, but it would be useful to explain what it does compared to running the standard stat analysis N times.
  
  We thank the reviewer for this helpful recommendation. We have now expanded the Methods section to clarify this point. The revised text reads as follows:
  
  “Multiverse analysis was conducted in RStudio (R version 4.4.1) using the multiverse package (version 0.6.1; Sarma et al., 2021). The multiverse framework differs from simply repeating the same statistical analysis multiple times, because it first requires the researcher to define a structured analysis space consisting of multiple defensible analytic decisions. These decisions are then expanded into all valid combinations, with each combination representing one complete analysis specification, or “universe”, providing a transparent and reproducible record of which analytic decisions were considered and how they were combined. In addition, the package reduces the need to manually write, modify, and track separate analysis scripts for each specification, which helps avoid inconsistencies or coding errors across universes. The results can then be extracted and summarized across the full set of universes to evaluate whether the conclusions are robust across reasonable analytic alternatives or depend on specific combinations of choices.”
  
  (2) I may have missed it, but how many subjects per group do you end up with after all the cleaning (not what is in Table 1, but like in each dataset you describe how many got removed at each step, so we are left wondering the final numbers).
  
  We thank the reviewer for pointing this out. The final group sizes after all cleaning and exclusion steps were not described in the original manuscript. We have therefore revised Table 1 so that it now reports only the remaining participants included in the final analyses after all exclusions were applied. The revised table shows the final sample sizes separately for the HC (N = 228), ADHD-Combined (N = 429), and ADHD-Inattentive (n = 465) groups, together with the corresponding demographic and clinical characteristics. We have also revised the accompanying text in the result section 3. 1. 1. The same changes were applied to the validation sample, which is reported in the Supplement.
  
  (3) Missing reference in my opinion. In the discussion, the sentence "as both oscillatory and aperiodic contributions vary systematically across the lifespan" could do with a reference or two about that
  
  We have now added references showing that developmental changes in EEG spectra involve both periodic/oscillatory and aperiodic components. The revised text reads as follows:
  
  “These dynamics may account for the recurring Age ’ IAF interactions observed in our multiverse analyses, as both oscillatory and aperiodic contributions vary systematically across the lifespan (Merkin et al. 2023; Tröndle et al. 2022; Tröndle et al. 2021; McSweeney et al. 2023; Hill et al. 2022; Stanyard et al. 2024).”
  
  (4) Now the big one: this is a cool visualization, and beta estimates from linear modeling do tell us the strength, BUT I would like to see raw effect sizes. It could be in a table or text, to go with the discussion. What was the theta, alpha, beta power raw or adjusted in each group, what about the aperiodic component - even maybe some violin plots to show canonical vs individual - my point is I am convinced from the frequency analysis since an entire subspace become significant and your interpretation that this is spurious is satisfactory but showing that this subspace as tiny effect sizes driven by interactions would be even more convincing in my opinion.
  
  To complement the regression coefficients from the multiverse models, we now additionally report descriptive standardized effect sizes across representative analytical subspaces. Specifically, we grouped analytical paths according to frequency band definition (IAF-relative vs canonical) and spectral representation (aperiodic signal, 1/f-uncorrected power, and aperiodic-adjusted power). Within each subspace, we computed Cohen’s d values for theta power, beta power, and TBR between ADHD and healthy control groups across all corresponding analytical paths.
  
  To visualize the distribution of effects across analytical paths, we added violin plots with overlaid individual paths and mean effect sizes with 95% confidence intervals. Importantly, even in subspaces where interaction effects frequently emerged in the multiverse analysis, the corresponding descriptive group differences remained small, supporting our interpretation that the observed significant effects are driven by subtle interactions and analytical choices rather than large underlying group differences.
  
  The added text in the Results 3. 1. 4. reads as follows:
  
  “To complement the regression coefficients from the multiverse models, we additionally examined descriptive standardized effect sizes across representative analytical subspaces. Analytical paths were grouped according to frequency band definition (IAF-relative vs. canonical) and spectral representation (aperiodic signal, 1/f-uncorrected power, and aperiodic-adjusted power). Within each subspace, Cohen's d was computed for theta power, beta power, and TBR for both the HC vs. ADHD-Inattentive and HC vs. ADHD-Combined comparisons. To visualize the distribution of effect sizes across the analytical space, violin plots were constructed with each data point representing the Cohen's d value of a single analytical specification (Figure 8). Across all subspaces and outcome measures, Cohen's d values were small for both comparisons, including subspaces in which interaction effects frequently reached statistical significance in the multiverse analysis. This pattern indicates that even where the multiverse revealed reliable significant effects, the underlying group differences in theta power, beta power, and TBR remained small in magnitude. These findings support the interpretation that the significant interactions observed across analytical specifications are driven by subtle moderation effects and analytical choices rather than large, robust group differences in neural activity.”
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) As a minor clarification, the manuscript could specify if the calculation of aperiodic-adjusted power values was done as subtraction with linear or log power values.
  
  The aperiodic-adjusted power values were computed by subtracting the aperiodic fit from the observed power spectrum in log10 power space. Specifically, both the observed power spectrum and the estimated aperiodic component were log10-transformed, and the aperiodic-adjusted signal was obtained as the difference between these two quantities. The result was then transformed back to linear scale. We have clarified this in the revised manuscript. The revised text reads as follows:
  
  “The aperiodic component was reconstructed based on its fitted parameters and subtracted from the total power spectrum in log10 power space, resulting in an aperiodic-adjusted, 1/f-corrected power spectrum. The resulting values were then transformed back to linear scale and therefore represent power relative to the estimated aperiodic background.”
  
  (2) The last section of the abstract is a bit repetitive in stating the main finding of what drives the TBR, and this could be edited/condensed.
  
  We agree that the final part of the abstract repeated the main interpretation regarding the role of aperiodic activity and IAF. We have therefore condensed this section to avoid redundancy while preserving the central conclusion. The revised text reads as follows:
  
  Across the multiverse, we found that group differences in TBR were highly contingent on analytical choices, with no evidence for robust main effects of diagnosis, indicating no reliable differences between healthy controls, ADHD-inattentive, and ADHD-combined subtypes. Instead, significant effects emerged primarily as interactions with age and individual alpha frequency (IAF), particularly when TBR was derived from aperiodic-uncorrected power or from the aperiodic signal itself. These interaction patterns replicated across both independent samples and were observed using both categorical and dimensional definitions of ADHD. Together, these findings indicate that previously reported TBR effects are largely driven by variability in aperiodic activity and IAF rather than genuine differences in oscillatory theta-beta dynamics. Our results challenge the interpretation of TBR as a reliable standalone biomarker for ADHD and underscore the importance of multiverse approaches for evaluating candidate neurobiological markers in heterogeneous clinical populations.
  
  (3) As a minor literature note, the finding that ratio measures often largely reflect aperiodic activity rather than oscillatory theta and/or beta per se activity is consistent with a previous (non-clinical) investigation of band ratio measures in a previous report that should perhaps be cited as relevant prior work:
  
  Donoghue, T., Dominguez, J., & Voytek, B. (2020). Electrophysiological Frequency Band Ratio Measures Conflate Periodic and Aperiodic Neural Activity. eNeuro, 7(6),ENEURO.0192-20.2020. https://doi.org/10.1523/ENEURO.0192-20.2020
  
  We appreciate the reviewer’s suggestion. We have added this reference to the Discussion section, where we interpret the observed TBR effects as reflecting variability in the aperiodic background rather than genuine differences in oscillatory theta-beta dynamics. The revised text reads as follows:
  
  “These results suggest that apparent TBR differences may reflect properties of the aperiodic background signal interacting with individual variability in IAF rather than true oscillatory theta or beta activity. This interpretation is consistent with previous work showing that electrophysiological frequency-band ratio measures can conflate periodic and aperiodic neural activity, such that apparent changes in theta/beta or other band ratios may partly reflect changes in the aperiodic spectral component rather than narrowband oscillatory activity (Donoghue et al., 2020).”
  
  (4) In Figure 3, it may be useful to highlight the theta and beta ranges in panel B.
  
  We considered highlighting the theta and beta ranges in Figure 3B, but decided against it. In the multiverse analysis, theta and beta were defined using both canonical frequency bands and IAF-relative bands. The IAF-relative bands differ across participants, therefore marking only the canonical ranges could give the impression that these were the only frequency definitions used in the analyses. We therefore kept the spectra unmarked.
  
  (5) In Figure 5 (and other figures following this motif), it may be useful to color the significant results as green or red based on direction, to match Figure 4.
  
  We have updated Figure 5 and the corresponding figures so that significant positive effects are shown in green and significant negative effects are shown in red, matching the color scheme used in Figure 4.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) P10, L30: "Individualized bands were centered on the IAF, defined as theta = IAF-6 Hz to IAF-4 Hz"; why is theta defined using such a narrow, 2 Hz band here, when canonical theta is usually defined as a 4 Hz wide, 4-8 Hz band?
  
  The individualized theta band was chosen to follow the IAF-based frequency-band framework proposed by the seminal work of Wolfgang Klimesch (1999, 2012), rather than to reproduce the width of the canonical 4-8 Hz theta band. In this framework, frequency bands are defined relative to each participant’s individual alpha frequency. Theta is defined as the range from IAF-6 Hz to IAF-4 Hz, while lower alpha occupies the range closer to the individual alpha peak. The narrower individualized theta band is therefore intended to reduce overlap with lower-alpha activity and to account for inter-individual and developmental differences in alpha peak frequency. The 2020 guidelines from the International Federation of Clinical Neurophysiology (IFCN) reaffirmed Klimesch’s division of the alpha and theta bands (Babiloni, 2020). We have explained the frequency bands selection in more detail in the manuscript. The revised text reads as follows in Methods 2. 5. 7. Extraction of power for statistical analyses:
  
  The selection of these frequency bands is grounded in the seminal work of Wolfgang Klimesch (1999), who demonstrated that the alpha band can be divided into distinct lower and upper sub-bands. The lower alpha band extends up to 4 Hz below the IAF, covering a broader range of approximately 3.5 to 4 Hz, while the upper alpha band, which lies above the IAF, is narrower, spanning about 1 to 1.5 Hz. Klimesch also characterized the theta band as a frequency range that is approximately 2 Hz below the lower alpha band (Klimesch, 1999; Klimesch, 2012). The 2020 guidelines from the International Federation of Clinical Neurophysiology (IFCN) reaffirmed Klimesch’s division of the alpha and theta bands (Babiloni, 2020).
  
  (2) Figure 3 and Supplementary Figure 1, 7, 8: "Electrodes highlighted on the topographies..." means just the text labels, right? It might be better to show all electrodes as black dots and highlight the others with white dots or something.
  
  We have revised the figures to display all electrodes as black dots. In addition, we have clarified in the figure legends that the highlighted electrode labels correspond to the regions of interest used in the analyses.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.64898/2026.01.08.26343676v3
www.biorxiv.org www.biorxiv.org

Sleep-Wake Transitions Are Impaired in the AppNL-G-F Mouse Model of Early Onset Alzheimer’s Disease

1
1. Public_Reviews 15 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The manuscript titled," Sleep-Wake Transitions Are Impaired in the AppNL-G-F Mouse Model of Early Onset Alzheimer's Disease", is about a study of sleep/wake phenomena in a knockin mouse strain carrying "three mutations in the human App gene associated with elevated risk for early onset AD". Traditional, in-depth characterization of sleep/wake states, EEG parameters, and response to sleep loss are employed to provide evidence, "supporting the use of this strain as a model to investigate interventions that mitigate AD burden during early disease stages". The sleep/wake findings of earlier studies (especially Maezono et al., 2020, as noted by the authors) were extended by several important, genotype-related observations, including age-related hyperactivity onset that is typically associated with increased arousal, a normal response to loss of sleep and to multiple sleep latency testing, and a stronger AD-like phenotype in females. The authors conclude that the AppNL-G-F mice demonstrate many of the human AD prodromal symptoms and suggest that this strain may serve as a model for prodromal AD in humans, confirming the earlier results and conclusions of Maezono et al. Finally, based on state bout frequency and duration analyses, it is suggested that the AppNL-G-F mice may develop disruptions in mechanism(s) involved in state transition.
  
  Strengths:
  
  The study appears to have been, technically, rigorously conducted with high quality, in-depth traditional assessment of both state and EEG characteristics, with the concordant addition of activity and temperature. The major strengths of this study derive from observations that the AppNL-G-F mice: (1) are more hyperactive in association with decreased transitions between states; (2) maintain a normal response to sleep deprivation and have normal MSLT results; and (3) display a sex specific, "stronger" insomnia-like effect of the knockin in females.
  
  Weaknesses:
  
  The weaknesses stem from the study's impact being limited due to its being largely confirmatory of the Maezono et al. study, with advances of importance to a potentially more focused field. Further, the authors conclude that AppNL-G-F mice have disrupted mechanism(s) responsible for state transition; however, these were not directly examined. The rationale for this conclusion is stated by the authors as based on the observations that bouts of both W and NREM tend to be longer in duration and decreased in frequency in AppNL-G-F mice. Although altered mechanism(s) of state transition (it is not clear what mechanisms are referenced here) cannot be ruled out, other explanations might be considered. For example, increased arousal in association with hyperactivity would be expected to result in increased duration of W bouts during the active phase. This would also predictably result in greater sleep pressure that is typically associated with more consolidated NREM bouts, consistent with the observations of bout duration and frequency.
  
  Reviewer 1 succinctly summarizes the advances of this study beyond the ground-breaking Maezono et al (2020) study of this “humanized” mouse model exhibiting amyloid deposition. Whereas Maezono et al. conducted sleep/wake studies on male App<sup>NL-G-F</sup> mice at 6 and 12 months of age, we had the unusual opportunity to study both sexes of homozygous App<sup>NL-G-F</sup> mice and WT littermates at 14-18 months of age and to conduct a longitudinal assessment of many of the same individuals at 18-22 months. In addition to baseline sleep/wake and EEG spectral analyses, we (1) measured subcutaneous body temperature and activity to obtain a broader picture of the physiology and behavior of this strain at advanced ages; (2) assessed baseline sleepiness in this strain using the murine version of the clinically-relevant Multiple Sleep Latency Test (MSLT); (3) evaluated the response of App<sup>NL-G-F</sup> mice and WT littermates to a 6-h perturbation of the sleep homeostat; (4) compared the sleep/wake characteristics of male vs. female App<sup>NL-G-F</sup> mice at 18-22 months; and (5) to assess the stability of the phenotypes, analyzed these data over a continuous 14-d recording rather than the conventional 24h recordings typical of most sleep/wake studies including Maezono et al. We found that a long wake/short sleep phenotype was characteristic of homozygous App<sub>NL-G-F</sub> mice at these advanced ages which is also evident in the Maezono et al. (2020) study at 12 months of age (but not at 6 months), although the authors do not comment on this phenotype and instead focus on the reduced REM sleep which is particularly evident in female App<sup>NL-G-F</sup> mice in our study. Remarkably, despite being awake ~20% longer per day, we find that App<sup>NL-G-F</sup> mice are no sleepier than WT mice as determined by the MSLT and that their sleep homeostat is intact when challenged by 6-h sleep deprivation. At both advanced ages, the long wake/short sleep phenotype is due primarily to longer Wake bouts and shorter bouts of both NREM and REM sleep during the dark phase. Moreover, hyperactivity develops in older App<sup>NL-G-F</sup> mice, particularly females, which contributes to this phenotype. We agree with Reviewer 1 that “hyperactivity would be expected to result in increased duration of W bouts during the active phase” and that this could result in more consolidated NREM bouts. Accordingly, we have added the following sentence to the Discussion subsection Impacts of pathology on sleep/wake and activity: “Thus, the hyperactivity evident in Figures 4D, 4D’, and 5D’ could drive the longer wake bouts evident in Figure 7A and result in the longer NREM and REM sleep bouts found in male App<sup>NL-G-F</sup> mice (Figure 12A’ and 12A”).”
  
  The suggestion of greater sleep pressure is not borne out by our MSLT studies as we did not observe the shorter sleep latencies nor increased sleep during the nap opportunities on the MSLT that we have observed in other mouse strains. Moreover, due to their short sleep phenotype, App<sup>NL-G-F</sup> mice should be entering the sleep deprivation study with a greater sleep debt than WT mice, yet we did not observe a stronger homeostatic response (i.e., enhanced EEG Slow Wave Activity) in this strain during recovery from sleep deprivation. Thus, we have suggested that App<sup>NL-G-F</sup> mice are unable to transition from Wake to sleep as readily as their WT littermates. Our observations summarized above set the stage for subsequent mechanistic studies in aged App<sup>NL-G-F</sup> mice, although realistically, mice of this age and genotype are a rare commodity.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors have used a knock-in mouse model to explore late-in-life amyloid effects on sleep. This is an excellent model as the mutated genes are regulated by the endogenous promoter system. The sleep study techniques and statistical analyses are also first-rate.
  
  The group finds an age-dependent increase in motor activity in advanced age in the NLGF homozygous knock-in mice (NLGF), with a parallel age-dependent increase in body temperature, both effects predominate in the dark period. Interestingly, the sleep patterns do not quite follow the sleep changes. Wake time is increased in NLGF mice, and there is no progression in increased wake over time. NREMS and REM sleep are both reduced, and there is no progression. Sleep-wake effects, however, show a robust light:dark effect with larger effects in the dark period. These findings support distinct effects of this mutation on activity and temperature and on sleep. This is the first description of the temporal pattern of these effects. NLGF mice show wake stability (longer bout durations in the dark period (their active period) and fewer brief arousals from sleep. Sleep homeostasis across the lights-on period is normal. Wake power spectral density is unaffected in NLGF mice at either age. Only REM power spectra are affected, with NLGF mice showing less theta and more delta. There are interesting sex differences, with females showing no gene difference in wake bout number, while males show a gene effect. Similarly, gene effects on NREM bout number seem larger in males than in females. Although there was no difference in homeostatic response, there was normalization of sleep-wake activity after sleep deprivation.
  
  Strengths:
  
  Approach (model extent of sleep phenotyping), analysis.
  
  Weaknesses:
  
  The weaknesses are summarized below and are viewed as "addressable".
  
  (1) The term insomnia. Insomnia is defined as a subjective dissatisfaction with sleep, which cannot be ascertained in a mouse model. The findings across baseline sleep in NLGF mice support increased wake consolidation in the active period. The predominant sleep period (lights on) is largely unaffected, and the active period (lights off) shows increased activity and increased wake with longer bouts. There is a fantastic clue where NLGF effects are consistent with increased hypocretinergic (orexinergic) neuron activity in the dark period, and/or increased drive to hypocretin neurons from PVH.
  
  Although the DSM-5 definition of Insomnia Disorder indeed emphasizes a subjective “complaint of dissatisfaction with sleep quantity or quality”, I think the Reviewer takes an unnecessarily narrow view of the term “insomnia”. Aside from cases of “psychological” insomnia in which there is a mismatch between subjective and objective measures of sleep, most sleep researchers would likely agree that insomnia is objectively characterized by a greater than normal wake time during the sleep period (i.e., low sleep efficiency) due to difficulty in either initiating or maintaining sleep. This view has led to efforts to identify not only the biological causes of insomnia but also animal models in which this disorder can be studied. A PubMed search on the terms “mouse” and “insomnia” retrieves 844 publications, including an authoritative 2023 review in J Sleep Research entitled "Animal Models of Human Insomnia" co-authored by a clinician-scientist who has done human sleep research throughout his career and is an authority on CBT-I, in particular. Similarly, a PubMed search on the terms “fly” and “insomnia” retrieves 18 publications. So, although our intent in the submitted version of the manuscript was to use “insomnia” as an operational term to succinctly mean “less sleep than usual”, in the revised manuscript, we have eliminated use of the term “partial insomnia” and replaced it with the term “insomnia-like phenotype”. In the Discussion section “Impacts of pathology on sleep/wake and activity”, we have revised the opening sentence to read “Insomnia in humans is typically characterized by subjective reports of reduced sleep quality and can be accompanied by objective measures of sleep fragmentation and reduced sleep amounts.”
  
  (2) Sleep-wake transitions are impaired: This should not be termed an impairment. It could actually be beneficial to have greater state stability, especially wake stability in the dark or active period. There is reduced sleep in the model that can be normalized by short-term sleep loss. It is fascinating that recovery sleep normalized sleep in the NLGF in the immediate lights-on and light-off period. This is a key finding.
  
  Due to the Reviewer’s objection regarding “impairment”, we have changed the title of the manuscript to “Long Wake/Short Sleep Bouts and Hyperactivity with Advanced Age in a Mouse Model of Early Onset Alzheimer’s Disease”. In Comments (1) and (2), Reviewer 2 suggests a provocative hypothesis to test. In the section “Impacts of pathology on sleep/wake and activity“, we previously stated “A hyperactive hypocretin/orexin or monoaminergic arousal system or a dysfunctional GABAergic sleep onset system could underlie the longer bouts of Wake in App<sup>NL-G-F</sup>mice.” We have now added this additional sentence: “Indeed, Hcrt neurons in aged mice have been shown to exhibit more frequent neuronal activity driving wake bouts and optogenetic stimulation of Hcrt neurons in aged mice results in prolonged wakefulness (Li et al., 2022).“
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this study, Tisdale et al. studied the sleep/wake patterns in the biological mouse model of Alzheimer's disease. The results in this study, together with the established literature on the relationship of sleep and Alzheimer's disease progression, guided the authors to propose this mouse model for the mechanistic understanding of sleep states that translates to Alzheimer's disease patients. However, the manuscript currently suffers from a disconnect between the physiological data and the mechanistic interpretations. Specifically, the claim of "impaired transitions" is logically at odds with the observed increase in wake-state stability or possible hyperactivity. Additionally, the description of the methods, the quantification, and the figure presentation could be substantially improved. I detail some of my concerns below.
  
  Strengths:
  
  The selection of the knock-in model is a notable strength as it avoids the artifacts associated with APP overexpression and more closely mimics human pathology. The study utilizes continuous 14-day EEG recordings, providing a unique dataset for assessing chronic changes in arousal states. The assessment of sex as a biological variable identifies a more severe "insomniac-like" phenotype in females, which aligns with the higher prevalence and severity of Alzheimer's disease in women.
  
  Weaknesses:
  
  The study seems to lack a clear hypothesis-driven approach and relies mostly on explorative investigations. Moreover, lack of quantitative analytical methods as well as shaky logical conclusions, possibly not supported by data in its current form, leaves room for major improvement.
  
  Since this paper studied sleep states, the "Methods" section is quite unclear on what specific criteria were used to classify sleep states. There is no quantitative description of classifying sleep based on clear, reproducible procedures. There are many reasonably well-characterized sleep scoring systems used in rat electrophysiological literature, which could be useful here. The authors are generally expected to describe movement speed and/or EMG and/or EEG (theta/delta/gamma) criteria used to classify these epochs. The subjective (manual) nature of this procedure provides no verifiable validation of the accuracy and interpretability of the results.
  
  This was an oversight: the “Classification of Arousal States” section has been modified accordingly.
  
  One of the bigger claims is that "state transition mechanism(s)" are impaired. However, Figure 7 shows that model mice exhibit significantly more long wake bouts (>260s) and fewer short wake bouts (<60s). Logically, an "impaired switch" (the flip-flop model, Saper et al., 2010) results in state fragmentation. The data here show the opposite: the wake state has become too stable. This suggests the primary defect is not in the transition mechanism itself, but possibly in a pathological increase in arousal drive (hyper-arousal), likely linked to the dark-phase hyperactivity shown in Figures 4 and 5. Also, a point to note is that this finding is not new.
  
  Reviewers 1 and 2 also make comments conisistent with the alternative interpretation that “the wake state has become too stable.” However, I think we are using different words to say the same thing: that the transition from wake to sleep is impaired whether it is due to hyperarousal or to a defect in the flip/flop switch that results in greater Wake stability. I hope the reviewer would agree that a switch can be impaired in two directions: either it could “flicker” as seems to be the case in narcolepsy or it could get stuck in one position, which is what we suggest here based on the data in Fig. 12A, A’ and A” which show longer bouts of all states (Wake, NREM and REM) in older males. Nonetheless, the hyperarousal hypothesis suggested by the Reviewer is certainly a reasonable alternative. Consequently, we have added the following sentence to the Discussion subsection Impacts of pathology on sleep/wake and activity: “Thus, the hyperactivity evident in Figures 4D, 4D’, and 5D’ could drive the longer wake bouts evident in Figure 7A and result in the longer NREM and REM sleep bouts found in male App<sup>NL-G-F</sup> mice.”
  
  Figure 3 heatmaps lack color bars and units. Spectral power must be quantitatively defined and methods well-explained in the Methods section. Without these, the reader cannot discern if the "reduced power" in females is a global suppression of signal or a frequency-specific shift. Additionally, the representative example used to claim shorter sleep bouts lacks the statistical weight required for a major physiological conclusion. How does a cooler color (not clear what range and what the interpretation is) mean shorter sleep bout in female mice? The authors should clearly mark the frequency ranges that support their claims. In this figure, there is a question mark following the theta/delta range. The authors should avoid speculation and state their claims based on facts. They should also add the theta and delta ranges in the plot, such that readers can draw their own conclusions.
  
  The Y-axis in the previous version of this figure was labelled 0-25 Hz. This figure was intended to be a descriptive illustration of how unusual the female App<sup>NL-G-F</sup> mice are relative to WT of either sex rather than a quantitative analysis of spectral power. As suggested by Reviewer 2, we have combined this figure with the previous Fig. 14 as the revised Fig. 3 and we have modified the Y-axis labels to more explicitly indicate EEG frequencies. The question mark was legacy text from an earlier version of the manuscript; sorry for the confusion!
  
  Figure 8 and the MSLT results show that model mice are "no sleepier than WT mice" and have a functional homeostatic rebound. This presents a logical flaw in the "insomnia" narrative. True insomnia in AD patients typically involves a failure of the homeostatic process or a debilitating accumulation of sleep debt. If these mice do not show increased sleepiness (shorter latency) despite ~19% less sleep, the authors might be describing a "reduced need" for sleep or a "hyper-aroused" state, possibly not a clinical insomnia phenotype.
  
  Both Reviewer 2 and 3 suggest that we are using “insomnia” incorrectly, which we have used as shorthand to denote less sleep per 24h period. Reviewer 2 states that “Insomnia is defined as a subjective dissatisfaction with sleep” per DSM-5 and Reviewer 3 suggests that the mechanism underlying insomnia in AD patients is “a failure of the homeostatic process or a debilitating accumulation of sleep debt” which is not in DSM-5. Our clinical colleagues tell us that this is not established fact; some argue that the homeostat is intact and that the input(s) to the homeostat are defective. We agree that less sleep in these mice could be due to a reduced need for sleep or to hyperarousal. Consequently, we have changed the title of the manuscript to eliminate “Sleep-Wake Transitions are Impaired…” to the more objective “Long Wake/Short Sleep Bouts and Hyperactivity with Advanced Age in a Mouse Model of Early Onset Alzheimer’s Disease”.
  
  In Figure 9, LFP power shown and compared in percentages is problematic, as LFP power distribution is known to be skewed (follows power law). This is particularly problematic here because all the frequencies above ~20 Hz seem to be totally flattened or nonexistent, which makes this comparison of power severely limited and biased towards the relative frequency in the highly skewed portion of the LFP power spectrum, i.e., very low frequency ranges like delta, theta, and possibly beta. This ignores low, mid, and high gamma as well as ripple band frequencies. NREM sleep is known to have relatively greater ripple band (100-250 Hz) power bursts in hippocampal regions, and REM sleep is known to have synchronous theta-gamma relationships.
  
  We completely agree with the reviewer. There are at least 3 ways that spectral power data can be presented: (1) absolute power; (2) relative power (normalized to a baseline); and (3) power density. In this study, we intentionally presented results in terms of spectral power density so that our results could be compared to those in Figure 3A and 3B of Maezono et al. (2020). This was important because Maezono et al. recorded from mice of 6 and 12 months of age whereas we recorded from older mice, which allowed us to determine which parameters are likely changing with age (and, presumably, greater Ab deposition).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) A key finding for the AppNL-G-F mouse model is the emergence of hyperactivity that may be responsible for the altered sleep architecture. Further investigation to help determine the mechanism(s) responsible might include cFos expression to help localize or provide evidence for the distributed neuronal activity increase in this model. Additionally, identification of overly active areas might provide targets for their manipulation to test the authors' hypothesis of the mechanism of the altered sleep architecture. Does chronic hyperactivity caused by other mechanisms (DREADDs, LOF of a K channel) mimic the AppNL-G-F mouse model sleep phenotype? These sorts of findings would impact the study's significance.
  
  We agree with the Reviewer that identifying the mechanism underlying the long wake/short sleep phenotype of aged App<sup>NL-G-F</sup>mice would increase the study’s significance. However, we want to underscore that the opportunity to study both sexes of homozygous App<sup>NL-G-F</sup> mice and WT littermates at 14-18 months of age and to conduct a longitudinal assessment of many of the same individuals at 18-22 months was very unusual. Our observations of the phenotype described in this manuscript set the stage for subsequent mechanistic studies in aged App<sup>NL-G-F</sup> mice, although realistically, mice of this age and genotype are a rare commodity.
  
  (2) A more technical area of improvement involves the presentation of the results and the associated critical statistical analyses. Relevant tables and statistics are not always reported (in the results) or properly referenced. In the mixed models, the repeated measures are "time of day", I presume.
  
  Tables 1-6 present statistical results; these 6 Tables are referred to in the Results section a total of 14 times. The text states “The larger sample size in Experiment 2 (N=31 mice) allowed a mixed-effects model ANOVA to be conducted with Genotype, Sex, and Time as factors”. Although “Time of Day” was specified several places in the Results, thank you for pointing out omission of “of Day” from the “Data Analysis and Statistics” section; we have added this information accordingly.
  
  (3) The model is presented as age-dependent, but there was little statistical support for this. The subjects spanned a considerable age range, and a direct quantifiable correlation between age and the various measured dependent variables could be helpful in this regard.
  
  The long wake/short sleep phenotype characteristic of homozygous App<sup>NL-G-F</sup> mice that we describe here is also evident in the Maezono et al. (2020) study at 12 months of age but not at 6 months in either the Maezono et al. (2020) or Calafete et al. (2023) studies, although the authors do not comment on this phenotype and instead focus on the reduced REM sleep. Thus, between these studies, there seems to be an age-dependent progression of the phenotype. We have thus added this sentence to the Discussion subsection Sleep/wake and activity phenotypes of 14-18 month vs. 18-22 month old App<sup>NL-G-F</sup> mice: “This long wake/short sleep insomnia-like phenotype is also evident at 12 months of age (Maezono et al., 2020) but not at 6 months (Calafate et al., 2023; Maezono et al., 2020), suggesting a progression in this symptomatology.”
  
  (4) Would a more advanced age point be helpful? Would sleep fragmentation be likely to appear with more advanced age?
  
  The text states “Recordings collected throughout the entire 14-day period when Cohort 2 App KI and App WT mice were 21.0-24.3 months of age”. Mice on a C57BL6/J background are considered old at 18-24 months. Fig. 6B’ shows a strong trend (p=0.0558) toward shorter NREM bouts in App KI mice at 18-22 months during the dark phase at the same time that long wake bouts are evident (Fig. 6A’), strongly indicative of sleep/wake fragmentation but not quite significant with the sample size measured.
  
  (5) How does the onset of sleep-architecture-related symptoms relate to the cognitive impairment onset in AppNL-G-F mice?
  
  We have added this sentence to the Conclusions: “In a fear conditioning paradigm, impaired learning ability has been correlated with REM sleep duration in 13 month old but not 7 month old App<sup>NL-G-F</sup> mice (Maezono et al., 2020).
  
  (6) It is importantly concluded that the AppNL-G-F mouse phenotype is "stronger" in females. What is meant here by "stronger" and can this be quantified?
  
  We have eliminated use of “stronger” and replaced with “more evident” or “more apparent”.
  
  (7) Would ovariectomized females still show partial insomnia?
  
  This is an interesting question, particularly because the hyperactivity evident in Figure 7C is most evident in females. The average age of cessation of estrus cyclicity in C57BL6/J mice occurs between 13-16 months of age (Nelson et al., 1982, Biol Reproduction). The female KI mice in Cohort 2 ranged from 21.0 to 23.3 months of age at the time of recording and thus can be expected to be functionally ovariectomized.
  
  (8) The statement, "...female AppNL-G-F mice exhibited the most wakefulness and the least amount of sleep each day", sounds like a tautology.
  
  It was an intentional statement to underscore the long wake/short sleep phenotype.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Introduction:
  
  The authors might mention in paragraph 3 that because these studies each used a mutant protein on a powerful, and not the endogenous, promoter, the effects on sleep may be skewed by overexpression in specific brain areas. In addition, they might mention that sleep homeostasis and sleep changes relative to brain temp and activity have not been examined longitudinally.
  
  We have added the following sentences to the Limitations subsection of the Discussion: “Moreover, because studies of this strain used a mutant protein on a powerful exogenous promoter, the effects on sleep described by us and previous investigators may be skewed by overexpression in specific brain areas” and “Neither the present nor previous studies have assessed the effects of age-related changes in brain temperature on sleep/wake, sleep homeostasis or activity.”
  
  (2) Results:
  
  Figure 2: Images in 1B and 1B' look like IHC labeling in well over 1 and 2% of the brain for Iba-1. Are these images correct?
  
  The use of “%” on the Y-axis was inappropriate and has been corrected. Due to variation in Iba1 immunostaining across WT mice, Iba1 measurements were normalized to WT such that the mean Iba1 area coverage for WT mice within each region of interest was set to 1. The negligible 82E1 signal in WT mice obviated the need for normalization.
  
  Figure 3: I would move to incorporate into Figure 14 with spectra, as this is descriptive but nicely illustrates Figure 14.
  
  Done -- thank you for this excellent suggestion!
  
  Figure 10: The figure supports no significant estrus effects in either WT or NLGF. Could run the analysis, but important finding.
  
  Agreed but, as indicated in the response to Reviewer 1, the average age of cessation of cyclicity in C57BL6/J mice has been reported to occur between 13-16 months of age (Nelson et al., 1982, Biol Reproduction). The female mice in the older cohort that we recorded were 18-22 months of age.
  
  (3) Discussion:
  
  Page 11, last paragraph: It is hard to say whether activity caused more wake or response to wake is different in these mice (anxiety and hyperactivity are both seen in Alzheimer's disease).
  
  Hypocretin MCH is touched on but could be elaborated upon, given light/dark differences.
  
  We agree that the directionality is difficult to ascertain. As mentioned above, we have added a discussion on hyperactivity but, having not made any assessment of anxiety in the present study, we have refrained from further speculation.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Figure 9: Y-axis labels are missing on several plots.
  
  Due to the density of info on this figure, Y-axis labels were intentionally omitted for those panels for which the Y-axis label of the panel to the left applied. Since the reviewer found this to be confusing, we have added Y-axis labels to all panels at the risk of making the figure even more dense!
  
  (2) Figure 14: x tick labels are perplexing - why would they be labelled in such arbitrary decimal points?
  
  As stated in the text, “EEG spectra for each state were analyzed in 0.061 Hz bins”. Consequently, X-axis labels are modulo 0.061 Hz.
  
  (3) Figure S1 is not aligned; some plots cannot even be read.
  
  Figure S1 has been reformatted to portrait mode from the previous landscape version (although no alignment issues were evident when viewed in landscape mode).
  
  (4) For some reason, Tables 1-3 are horizontal, which I couldn't read.
  
  Our apologies, some of the info in Table 1 was omitted during export. We have retained landscape mode for Table 1 and re-formatted Tables 2 and 3 in portrait mode for ease of accessibility.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2025.12.29.696818v2
www.biorxiv.org www.biorxiv.org

Mammalian MemPrep establishes the lipid composition of ER membranes in HEK293T cells

1
1. EMBOpress 15 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  The authors adapt MemPrep, a protocol they originally developed to purify organelle membranes from yeast, for use in human cell lines. To this end, they established immuno-isolation strategies based on tagged versions of the ER sheet protein SEC61β and the ER tubular protein REEP5 in HEK293T cells. Their purification strategy allowed them to generate highly pure ER sheet- and tubule-enriched fractions, which were then subjected to quantitative lipidomic and proteomic analyses.
  
  Overall, this manuscript is well written and presents a careful interpretation of the data. It introduces MemPrep in mammalian cells as a method that will be useful for studying the membrane lipid and protein composition of organelles, with a particular focus on the ER. As such, the manuscript provides sufficient information and controls to assess the experiments in terms of reproducibility and clarity.
  
  We thank the reviewer for a positive, thorough assessment and for raising important points that helped us to improve the manuscript.
  
  Major comments:
  
  Based on the immunofluorescence images in Figure 1, it is not clear that the tagged and slightly overexpressed versions of SEC61β and REEP5 localize specifically to ER sheets and tubules, respectively, or that these proteins are enriched in these distinct ER subdomains. Perhaps reducing the fixation time, for example to a maximum of 2 minutes, or using PFA fixation, could help to better preserve ER sheet and tubular domains.
  
  To address the localization of the bait proteins in the ER membrane network, we added new co-localization microscopy data and quantifications to the revised manuscript (new Figure 1E,F; new Supplementary Figure S1C,D). Despite its low level of overexpression (new Figure 1C; new Suppl. Fig. S1A), SEC61β localizes to the entire ER membrane network including ER tubules and the nuclear envelope (new Fig. 1E,F).
  
  Considering the new data, we have carefully rephrased all sections regarding the subcellular localization of bait-SEC61β. In the revised manuscript, we use SEC61β as a general ER marker.
  
  Intriguingly, quantitative proteomics of the SEC61β MemPrep isolate demonstrates a selective enrichment of ER sheet-associated proteins compared to the REEP5 MemPrep, which selectively enriches proteins associated with ER tubules (Fig. 5). While we do not claim to 'isolate' ER subdomains, we enrich ER subdomains.
  
  We have performed additional microscopy experiments and adjusted our fixation protocol as suggested by the reviewer (Revision Fig. 1). Shortening the fixation time has no apparent impact on the ER structure, while any PFA fixation seems to largely disrupt the ER.
  
  Does expression of tagged SEC61β or REEP5 influence the ER sheet:tubule ratio? In addition, does expression of these constructs affect the lipidome or proteome of the cells?
  
  The reviewer raises an important point, which is experimentally not easy to address. Our imaging modality is not sufficient to make a firm statement about the sheet:tubule ratio in HEK293T cells. We are not aware of any study that firmly quantifies the relative content of sheets and tubules in HEK293T cells. Imaging the ER in HEK293T cells is challenging and most studies on the ER membrane networks use other cell types to study the impact of ER-shaping protein on the ER membrane network.
  
  In the revised manuscript we state: 'We found no evidence that the expression of the bait constructs disrupts the tubule-to-sheet ratio or other aspects of the ER architecture, but distinguishing ER sheets and ER tubules is challenging in HEK293T cells.'
  
  Furthermore, we have studied if the expression of the bait constructs affects the cellular proteome (new Suppl. Fig. S1A,B) and lipidome (new Suppl. Fig. S4A-H (previously Suppl. Fig. S3)). The expression of the bait constructs has no substantial impact of the cellular proteome. Most importantly, we find no evidence that proteins characteristic for ER sheets or ER tubules (other than the bait proteins) change their expression level (new Suppl. Fig. S1A,B). In the revised manuscript we state:
  
  ' We decided to go one step further and compared the proteomes of wildtype HEK293T cells with the two cell lines using TMT multiplexed, untargeted protein mass spectrometry (Suppl. Fig. S1A, B). This experiment revealed that bait proteins have only a minimal, neglectable impact on the cellular proteome (Suppl. Fig. S1A, B). We did not find evidence for a systematic deregulation of proteins known to localize exclusively to ER tubules or other ER subdomains. Furthermore, quantitative proteomics validated the results from immunoblotting (Fig. 1B, C): Expression of bait-SEC61β has barely any impact on the total cellular level of SEC61β (Suppl. Fig. S1A) while the expression of the REEP5-bait results in a 1.8-fold overabundance of REEP5 (Suppl. Fig. S1B).'
  
  Likewise, the expression of the bait constructs has little to no effect on the cellular lipidome as shown in Suppl. Fig. S4A-J. In the revised manuscript we state:
  
  'As a control, we also tested the impact of the bait constructs on the HEK293T whole cell lipidome (Suppl. Fig. 4A-J). Overall, the lipid composition of the virally transduced cells was indistinguishable from HEK293T cells with only minor impact on the level of CL and lysolipids (Suppl. Fig. 4A-J).'
  
  Apart from hypotonic swelling and douncing, could the authors use alternative methods for cell disruption to exclude the possibility that mechanical stress confounds the interpretation of the data?
  
  Thanks to the reviewer's comment, we became aware of a mistake. Our cell lysis buffer is hypertonic and not hypotonic (15% sucrose w/v, 10 mM 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid)(HEPES) pH 7.4, 300 mM NaCl, 1 mM EDTA freshly supplemented with protease inhibitor cocktail from Roche). We have corrected all relevant sections in the revised manuscript.
  
  The reviewer is right that different means of mechanical lysis, and/or the incubation of the cells in hypo/hypertonic buffer are likely to have impact on the structure of the ER and to affect the isolation procedure. Changing such critical parameters will likely affect the purity of the preparation. Performing additional MemPrep isolations using different means of cell disruptions goes beyond the scope of this manuscript.
  
  Upon establishing the MemPrep protocol, we have explored various mechanical cell disruptions: Different cannula, Dounce homogenizers, and a ball-bearing device. We experimented with both hypo- and hypertonic buffers. Given the costs and work associated with lipidomic and proteomic analyses, we have tried to find a suitable conditions for cell disruption without performing a full analysis each time. Therefore, we performed differential centrifugations as exemplary shown in Fig. 2B of the manuscript. Critical factors for our decision whether to further persue a certain condition was 1) the depletion of the mitochondrial TOM22 marker, 2) the enrichment of the ER markers, and 3) the total protein yield in the P100,000 fraction.
  
  In the revised manuscript we state: 'Compared to the MemPrep procedure in yeast, we tested various means of cell disruption and optimized the differential centrifugation protocol.'
  
  and
  
  'Mild cell disruption by Dounce homogenization in a hypertonic buffer is crucial for cracking cells open, but these procedures can disrupt normal ER architecture and might facilitate the undesired mixing of previously well-defined ER subdomains. Despite these limitations, our data underscore the purity of our ER membrane preparations, demonstrate a differential enrichment of ER subdomains (Fig. 5), and establish the lipid composition of the ER membrane (Fig. 6)'.
  
  What is the total amount of lipids and proteins isolated with REEP5- or SEC61β-based MemPrep? Are there differences in the total lipid:protein ratio between these isolates, and could this reflect differences in the ER sheet:tubule ratio?
  
  In response to the reviewers' question, we have included a new Supplementary table 1 to the manuscript outlining the yield of total protein and total lipid of MemPrep.
  
  The mammlian MemPrep protocol is not yet optimized for determining the lipid:protein ratio in the membrane. At this moment, we do not want to make a statement about the protein-to-lipid ratio in the ER or its subdomains. The isolates still contain material originating from the ER lumen.
  
  The combined analysis of lipid and protein composition demonstrates the capacity of the method. To test that MemPrep can capture changes in ER membrane architecture, it would be useful to compare ER protein and lipid composition across different cellular states, such as stressed versus unstressed cells, or growing versus resting cells.
  
  We agree with the reviewer that a comparison of the ER under different conditions would be extremely interesting. Currently, we see it beyond the scope of this study.
  
  Minor comment:
  
  In line 335, the authors state: "To address this possibility, we performed a new round of REEP5 and SEC61β MemPreps for a direct comparison of the isolates (Fig. 5A, B)." It is unclear whether the MemPrep protocol was altered or whether this refers simply to an additional round of purification. Please clarify.
  
  Thank you. This point was also raised by reviewer 2 and 3. We have clarified our statements. In the revised manuscript we state:
  
  'Hence, we performed a new round of REEP5 and SEC61β MemPreps in triplicates for a direct comparison of the isolates (Fig. 5A, B) rather than comparing the changes in abundance relative to the respective cell lysates as performed in Figure 3. Knowing that non-ER proteins are less efficiently enriched by the MemPrep procedure than ER proteins (Fig. 3C, D) and that the sensitivity and comprehensiveness of mass spectrometry-based proteomics experiments are reduced with increasing sample complexity (Ting et al, 2011; Beck et al, 2011) , we were hoping to gain a better insight into the distribution of low abundant and challenging to quantify proteins in the two MemPrep isolates'.
  
  Reviewer #1 (Significance (Required)):
  
  General assessment:
  
  The manuscript establishes MemPrep for mammalian cells as an important discovery tool to investigate how cells coordinate membrane lipid composition with membrane protein composition, and vice versa. This is a rapidly growing research field, which attracts a lot of interest.
  
  MemPrep is based on an immuno-isolation strategy using tagged versions of the ER sheet protein SEC61β and the ER tubular protein REEP5 in HEK293T cells. The purification strategy allowed to generate highly pure ER sheet- and tubule-enriched fractions, which were then subjected to quantitative lipidomic and proteomic analyses.
  
  The results show that the protein composition differs between the SEC61β- and REEP5-enriched fractions. Yet the lipid composition of ER sheets and tubules is largely indistinguishable. Both fractions are dominated by PC alongside other monounsaturated GPL, and hydroxylated ceramides. These physicochemical properties of the ER lipid bilayer are matched by ER-resident membrane proteins.
  
  Thorough bioinformatic analysis of a subset of ER membrane proteins further revealed that their transmembrane domains have reduced hydrophobicity and increased polarity compared with those of plasma membrane proteins, matching the ER lipidome.
  
  Hence the combined analysis of lipid and protein composition demonstrates the capacity of the method. Many variations of this approach will be possible in the future to understand on the molecular level how cells assemble and control their membranes.
  
  Advance: Other immuno-isolation methods, or "organelle immunoprecipitation" approaches, have been established for lysosomes, the Golgi apparatus, and other organelles.
  
  MemPrep is an important and complementary addition to the technical toolbox for organelle isolation, with a particular focus on the analysis of membrane lipid and protein content.
  
  Audience: The manuscript will be of broad interest to researchers in basic biology as well as clinical and translational research.
  
  Reviewer's field of expertise:
  
  Molecular membrane biology.
  
  __Reviewer #2 __
  
  Jain and colleagues develop a biochemical fractionation procedure in which ER microsomes are enriched through small epitope tags. The manuscript is pitched around the concept that there are ER sheets and tubules and ER proteins differentially localise to them. The authors use REEP5 as a 'tubule' bait and SEC61beta as a 'sheet' bait. These baits are immuoisolated after a sensible membrane fractionation and ER membraned purified. There is a convincing ER proteome as a result, and this is used to compare the TMD properties of the organelles resident membrane proteins. The authors make the interesting observation that the transmembrane domains are more polar in the ER. They then compare the two sheet and tubule preparations and see a different in the proteome, before comparing the lipidome. There is no difference observed between the lipidome of the sheet and tubule preps, however they see a difference in the whole cell lysate and use that to compare the ER lipidome against the whole cell.
  
  Overall the manuscript has an interesting premise and the data is well presented, the experiments well performed and the interpretations appropriate. I think there are some issues with the mechanistic insight and novelty, and essentially although the premise is with regards to sheets and tubules there is limited progress in that direction in terms of results. I am reluctant to be to critical overall as there are certainly interesting observations that may be insightful for future studies in the field. I have some more specific comments below:
  
  We thank the reviewer for a thorough, constructive assessment and for highlighting important points that helped us improve the manuscript.
  
  1) The authors cite nixon-abell, but they do not mention the major point of that manuscript which is that the 'sheets' in the cellular periphery are instead dense tubular networks. I think this is quite an omission for the introduction, as it points to the premise not being as clear as stated.
  
  In the revised manuscript we refer to the Nixon-Abell study and two additional studies from the Jokitalo lab. Notably, the Nixon-Abell study does not rule out the existence of ER sheets.
  
  In the revised manuscript we state: ' [...] dense tubular networks in the cell periphery can appear like ER sheets in diffraction-limited microscopy (Nixon-Abell et al, 2016). Furthermore, the edges of ER sheets are populated by curvature-stabilizing proteins also found in ER tubules (Shibata et al, 2010; Shemesh et al, 2014), and ER sheets show different degrees of fenestration dependent on the cell type and the cell cycle phase (Puhka et al, 2007, 2012; Nixon-Abell et al, 2016). Consistent with our microscopic data (Fig. 1E, F) and because ER sheets may be biochemically inseparable from ER tubules, we use SEC61β as a general ER marker.'
  
  We performed additional co-localization studies of the bait proteins with RTN4 and CLIMP63 (new Fig. 1E,F) suggesting that SEC61B can localize across many ER subdomains including ER tubules and the nuclear envelope.
  
  We have carefully revised our manuscript accordingly and shifting the focus of our discussion away from a molecular description of discrete ER subdomains.
  
  2) The first section when the protocol is discussed essentially relies on looking at other papers to understand. As the manuscript is centrally about this protocol, I think a brief but clear description is more appropriate.
  
  We agree with the reviewer. We added a short section to the results section providing an overview over the MemPrep procedure. We now state:
  
  'To this end, we adapted the MemPrep procedure originally developed for the isolation of organelle membranes from Saccaromyces cerevisiae (S. cerevisiae) (Reinhard et al, 2023, 2024). Mammalian MemPrep relies on a gentle, detergent-free, mechanical lysis of the cells in a hypertonic buffer followed by differential centrifugation to separate ER-derived microsomes from mitochondria-derived membranes. Next, larger organelle fragments are disrupted by brief pulses of sonication, and the resulting vesicles are subjected to affinity purification using magnetic dynabead-coupled antibodies directed against the cleavable tag of the bait protein. Specifically bound, ER-derived membrane vesicles are washed with harsh, urea-containing buffers and selectively released by proteolytically cleaving the bait tag.'
  
  3) In figure 1C the two markers are supposed to localise to sheets and tubules differentially. To me they look very similar. This, of course, is a major concern. Have the authors co-expressed them (at the same levels in these lines) and seen that indeed they do differentially localise?
  
  The reviewer raises an important point regarding the localzation of the bait proteins. While we have not co-expressed the bait proteins in cells, we have performed additional co-localization experiments with RTN4 and CLIMP63 as markers for ER tubules and ER sheets, respectively (new Figure 1E,F; new Suppl. Fig. S1C,D). The implications of these data are discussed in the manuscript.
  
  In light of these new data, we do not refer to SEC61β as an ER sheet marker any longer, instead we refer to SEC61β as a general ER marker. We carefully revised our discussion of the data throughout the manuscript along the line suggested by the reviewer in point 8.
  
  4) I found the TMD polarity section very interesting, but it was not clear to me why they needed their proteomics for this? Could this not be done with annotated ER membrane proteins?
  
  The reviewer is correct. The same type of analysis could have been performed with an even bigger dataset of all ER annotated proteins. One of the co-authors, Joseph Lorent, has performed such analysis at this larger scale (PMID: 40326394). The study by Lorent et al. addressed TMH length and side chain bulkiness (PMID: 40326394) in the ER, Golgi apparatus, and the PM. This work is referenced in the manuscript.
  
  We focused our analysis on the smaller dataset of 83 single-pass proteins found in our proteomics experiments, because we initially planned to perform a comparative analysis of ER proteins in either of the two isolates.
  
  In line of the reviewers' suggestion, we validate our new finding on the TMH hydrophobicity in the ER using a larger dataset covering all single pass TMHs of ER proteins (215 instead of 83), Golgi apparatus proteins (260), and plasma membrane proteins (1322) (Suppl. Fig. S3D).
  
  5) It was not clear to me based on the results section text the difference between the figure 5 proteomics and the previous runs.
  
  This point was also raised by reviewer 1 and 3. We clarified our statement in the revised manuscript:
  
  'Hence, we performed a new round of REEP5 and SEC61β MemPreps in triplicates for a direct comparison of the isolates (Fig. 5A, B) rather than comparing the changes in abundance relative to the respective cell lysates as performed in Figure 3. Knowing that non-ER proteins are less efficiently enriched by the MemPrep procedure than ER proteins (Fig. 3C, D) and that the sensitivity and comprehensiveness of mass spectrometry-based proteomics experiments are reduced with increasing sample complexity (Ting et al, 2011; Beck et al, 2011) , we were hoping to gain a better insight into the distribution of low abundant and challenging to quantify proteins in the two MemPrep isolates.'
  
  6) Again in figure 5- are the authors sure that the difference was not due to the over-expression (albeit mild) of their protein.
  
  After performing an important control experiment, we are sure that the mild over-expression of the bait proteins has no impact.
  
  We have compared HEK293T WT cells with the bait protein expressing cell lines by quantitative proteomics (new Suppl. Fig. S1A,B). The bait proteins have no impact of the cellular proteome and do not affect the abundance of proteins known to be enriched in ER sheets or ER tubules. Hence, the enrichment of these proteins in our MemPrep isolates as shown in Fig. 5 suggests that some of the identity of ER sheets and ER tubules is maintained in our preparations even though they are not resolved by our microscopy experiments (Fig. 1). In the revised manuscript, we carefully discuss the implications of these findings.
  
  7) There were no differences in the ER lipidome between the two baits. This may be because there is no difference between the lipid profile of sheets and tubules, but it is very hard to conclude that.
  
  The reviewer has a point. Even though our findings suggest that we can differentially enrich for ER subdomains (the proteomics data in Fig. 5 on MemPrep isolates can be regarded as a golded standard for this statement), we do not have any knowledge about their biochemical purity. Hence, we have carefully toned down our statements on the basis of new imaging data (Fig. 1E,F; Suppl. Fig. S1C,D) and new proteomics data (Suppl. Fig. S1A,B).
  
  Along the reasoning of the reviewer, we also rephrased our statements on the difference/similarity of ER subdomains.
  
  8) I do not see it as my job as a reviewer to propose reorganisations and rewrites, so I encourage the authors to feel free to ignore this comment. To me the lipidome and TMD polar observations are the key manuscript findings, and there is very limited insight into the tubules and sheets line of inquiry. I wonder if it would be worth changing the focus of the manuscript overall to rather be about the ER, and not the tubules and sheets.
  
  Again, the reviewer raises an important point that we did not want 'to ignore'. We have carefully revised the manuscript and toned down our interpretations. In the revised manuscript we put more emphasis on the ER lipidome and less so on the composition of specific ER subdomains.
  
  __Reviewer #2 (Significance (Required)): __
  
  Overall the manuscript has an interesting premise and the data is well presented, the experiments well performed and the interpretations appropriate. I think there are some issues with the mechanistic insight and novelty, and essentially although the premise is with regards to sheets and tubules there is limited progress in that direction in terms of results. I am reluctant to be to critical overall as there are certainly interesting observations that may be insightful for future studies in the field.
  
  Reviewer #3
  
  Summary: Jain et al., provide a clear and thorough manuscript that extends their prior biochemical analysis of the yeast ER-lipidome (MEMPREP) to mammalian cells. They use detergent free lysis and differential speed centrifugation from 293T cells bearing reporters with affinity handles targeted to sheet-like or tubular-like subdomains of the ER and enrich membranes and membrane-embedded proteins from these sites. The lipidomics reveals a distinct ER-lipidome, heavily enriched in PC and PI, contains predominantly mono-unsaturated phospholipids and is surprisingly invariant across sheet-like and tubule-like domains. Additional hydrophobicity analysis suggests that ER-localised TMDs are more polar and shorter than PM-resident TMDs, and the authors speculate about co-evolution of the lipidome and proteome to ensure targeting.
  
  Major comments:
  
  I think the data are solid, clear and convincing. The similarity of the lipidomes from sheet and tubule regions of the ER give good indication of the robustness of the technique. Whilst the yield is low, the authors go to good lengths to demonstrate purity of ER capture and de-enrichment of other cellular membranes. There is good discussion of the limitations of the technique and good comparison to recent data from other labs, most notably, a recent preprint and I think the manuscripts support eachother well. There's a fair amount of speculation in the manuscript, e.g., about lipid headgroup charge density being inferred by the charge distribution on the -1 position, but the speculation is clearly acknowledged.
  
  I think that blotting for SEC61B would really help. A clear comparison to endogenous SEC61B would be helpful. I appreciate that the authors lacked an antibody here, but there are several on CiteAb that seem to detect endogenous protein.
  
  Following the reviewers' advice, we added new data using a commercial antibody directed against SEC61β (new Fig. 1C). We also added proteomics data comparing HEK293T WT cells with the bait expressing cell lines (new Suppl. Fig. S1A,B).
  
  We also characterized the commercial Proteintech (15087-1-AP) antibody to make sure it recognizes the same epitopes in the tagged and untagged variant of SEC61β.
  
  It's not brilliantly easy to see the 'sharp decline' in relative frequency of hydrophobic amino acids at 21 aa for ER and Golgi; whilst the individual amino acid information is interesting (and some comment could be made about the favouring of Leucines in ER and Golgi TMDs), would this be clearer if the relative frequencies were binned into hydrophobic/aromatic, polar, positive, negative?
  
  The reviewer is right. We have removed our statement regarding a 'sharp decline'. In fact, the decline is rather gradual for ER and Golgi TMHs, but more clear for PM TMHs. This is also reflected in the data shown in Suppl. Fig. S3D and discussed in the revised manuscript.
  
  We state: Confirming our expectations based on the predicted TMH length (Suppl. Fig. S3A), we observed a gradual decline in the relative frequency of hydrophobic and aromatic resides at about 21 amino acids for ER (Fig. 4E) and Golgi-associated TMHs (Fig. 4F). Such decline was more clearly defined for plasma membrane TMHs but only after 24 aa or more (Fig. 4G).'
  
  We also state: 'We therefore challenged our finding and performed an additional analysis using this larger dataset of all annotated human single-pass TMHs (Fig. S3D) and compared the hydrophobicity profiles of TMHs from the ER (215), the Golgi apparatus (260), and the PM (1322) (Lorent et al, 2025). This analysis further substantiated our finding that the ER and the Golgi apparatus host less hydrophobic TMHs compared to the plasma membrane. Furthermore, we observed that the ER and Golgi profiles display a conical shape with hydrophobic maxima at the center of the membrane's hydrophobic core, while the PM TMH's possess higher hydrophobicity in the cytoplasmic part of the membrane, compared to the exoplasmic part (Fig. S3D).'
  
  We decided to keep the Fig. 4 with its single amino acid 'resolution' was it was in the original manuscript, because we feel that this representation still has its value. It helps connecting physicochemical parameters of an average TMH in an organelle (Fig. 4A-D; Suppl. Fig. S3A-D) with the preferred amino acid composition and distribution (Fig. 4E-G). Nevertheless, some 'noise' in inherent to the data and we hope that the adaptations to the text avoids any possible confusion of the reader.
  
  The frequency of leucine residues in TMHs from the PM (24.5%) is comparable to the frequency of TMHs from the ER (24.1%) and from the Golgi apparatus (26.3%). Our attempts to identify an organelle-selective usage of certain amino acids did not yield robust and significant results.
  
  Related to this point, it's hard to correlate the degree of polar amino acid incorporation in the TMDs of Golgi, ER, PM proteins (which don't appear to vary in 4E, 4F and 4G) with the variance described in 4C. Is there a better way of displaying this data, or are the polarity measurements calculated by some other metric in 4C?
  
  The reviewer is right. Figure 4A-D and Figure 4E-G are based on different metrics. Figure 4A-D considers different physicochemical parameters of the amino acid sidechains (Fig. 4C: Kyte-Dolittle scale). Figure 4E-G only represents the relative frequencies. We believe that both representations can be useful.
  
  Notably, the relative incorporation of polar and apolar amino acids is significantly different between TMHs from the ER and the Golgi versus the TMHs from the PM (Suppl. Fig. S3B,C).
  
  In the revised manuscript we state: 'Our new finding that the TMHs of ER proteins are more polar than the TMHs in the plasma membrane (Fig. 4C) is also reflected by the significantly different number of apolar and polar residues in the TMHs from ER-, Golgi apparatus-, and PM-derived proteins (Suppl. S3B, C)'.
  
  Indeed, the polarity in Fig. 4A and Fig. 4C is calculated via the Kyte-Dolittle scale, while only the normalized frequency of the amino acid is color-coded in Fig. 4E-G.
  
  Minor comments:
  
  Panel 2D isn't labelled on the figure
  
  We represented both MemPreps in a single Panel 2C because we aimed to label in the immunoblots only a single time to avoid redundancies. We are open to change our strategy of panel labeling if our current representation is confusing.
  
  There is limited co-enrichment of non-ER proteins in the ER-affinity preps, and the authors have done well to deal with misannotated GO terms. It might be worthwhile adding to the discussion that all TMD proteins that localise at steady-state to post-ER compartments must necessarily pass through the ER during biosynthesis. As such, detection of non-ER proteins in ER fractions is not inherently unexpected.
  
  This is of course correct. In the revised manuscript we state: 'Finding non-ER proteins in an ER proteome is not surprising, because a very large number of proteins are first delivered to the ER, before they are sent to other cellular destinations.'
  
  I didn't understand the line on L377 about the new round of extraction featureing inherently less complex proteomes.
  
  This point was also raised by reviewer 1 and 2. We clarified our statement in the revised manuscript:
  
  'Hence, we performed a new round of REEP5 and SEC61β MemPreps in triplicates for a direct comparison of the isolates (Fig. 5A, B) rather than comparing the changes in abundance relative to the respective cell lysates as performed in Figure 3. Knowing that non-ER proteins are less efficiently enriched by the MemPrep procedure than ER proteins (Fig. 3C, D) and that the sensitivity and comprehensiveness of mass spectrometry-based proteomics experiments are reduced with increasing sample complexity (Ting et al, 2011; Beck et al, 2011) , we were hoping to gain a better insight into the distribution of low abundant and challenging to quantify proteins in the two MemPrep isolates.'
  
  For line L390-391, in the speculation about progressively more unsaturation as you move ER-Golgi-postGolgi, is there any (published) data from ER-FLIPPR that could inform about the degree of membrane fluidity/packing as you traverse the secretory pathway?
  
  We agree that mentioning evidence on the biophysical changes along the secretory pathway is helpful in this section. In the revised manuscript we state:
  
  'These changes of the lipid acyl chains are associated with biophysical changes of the membrane properties along the secretory pathway as observed by molecular probes reporting on lipid packing and membrane tension (Goujon et al, 2019; López-Andarias et al, 2021, 2022; Wong & Budin, 2024).'
  
  Reviewer #3 (Significance (Required)):
  
  The strengths of the study are the conceptual novelty and information provided - I think this is the first comprehensive reporting of the ER lipidome. This is a major organelle and I think as the lipid biology field develops, resources like this are really important. Moreover, the MEMPREP protocol is applicable for protein extraction from these domains, which will help with functional characterisation of ER subdomains and is a strong technical advance.
  
  Weaknesses relate to the single cell type and overexpression (albeit mild) methodologies. I'm not hugely fussed about this as this manuscript describes an important 1st step.
  
  I'm a cell biologist studying the ER
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.03.20.713196
www.biorxiv.org www.biorxiv.org

Spatial activation of Kinesin-1 by Ensconsin shapes microtubule networks via ncMTOC recruitment

1
1. EMBOpress 15 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1
  
  Evidence, reproducibility and clarity
  
  This paper addresses a very interesting problem of non-centrosomal microtubule organization in developing Drosophila oocytes. Using genetics and imaging experiments, the authors reveal an interplay between the activity of kinesin-1, together with its essential cofactor Ensconsin, and microtubule organization at the cell cortex by the spectraplakin Shot, minus-end binding protein Patronin and Ninein, a protein implicated in microtubule minus end anchoring. The authors demonstrate that the loss of Ensconsin affects the cortical accumulation non-centrosomal microtubule organizing center (ncMTOC) proteins, microtubule length and vesicle motility in the oocyte, and show that this phenotype can be rescued by constitutively active kinesin-1 mutant, but not by Ensconsin mutants deficient in microtubule or kinesin binding. The functional connection between Ensconsin, kinesin-1 and ncMTOCs is further supported by a rescue experiment with Shot overexpression. Genetics and imaging experiments further implicate Ninein in the same pathway. These data are a clear strength of the paper; they represent a very interesting and useful addition to the field.
  
  The weaknesses of the study are two-fold. First, the paper seems to lack a clear molecular model, uniting the observed phenomenology with the molecular functions of the studied proteins. Most importantly, it is not clear how kinesin-based plus-end directed transport contributes to cortical localization of ncMTOCs and regulation of microtubule length.
  
  Second, not all conclusions and interpretations in the paper are supported by the presented data.
  
  We thank the reviewer for recognizing the impact of this work. In response to the insightful suggestions, we performed extensive new experiments that establish a well-supported cellular and molecular model (Figure 7). The discussion has been restructured to directly link each conclusion to its corresponding experimental evidence, significantly strengthening the manuscript.
  
  Below is a list of specific comments, outlining the concerns, in the order of appearance in the paper/figures.
  
  Figure 1. The statement: "Ens loading on MTs in NCs and their subsequent transport by Dynein toward ring canals promotes the spatial enrichment of the Khc activator Ens in the oocyte" is not supported by data. The authors do not demonstrate that Ens is actually transported from the nurse cells to the oocyte while being attached to microtubules. They do show that the intensity of Ensconsin correlates with the intensity of microtubules, that the distribution of Ensconsin depends on its affinity to microtubules and that an Ensconsin pool locally photoactivated in a nurse cell can redistribute to the oocyte (and throughout the nurse cell) by what seems to be diffusion. The provided images suggest that Ensconsin passively diffuses into the oocyte and accumulates there because of higher microtubule density, which depends on dynein. To prove that Ensconsin is indeed transported by dynein in the microtubule-bound form, one would need to measure the residence time of Ensconsin on microtubules and demonstrate that it is longer than the time needed to transport microtubules by dynein into the oocyte; ideally, one would like to see movement of individual microtubules labelled with photoconverted Ensconsin from a nurse cell into the oocyte. Since microtubules are not enriched in the oocyte of the dynein mutant, analysis of Ensconsin intensity in this mutant is not informative and does not reveal the mechanism of Ensconsin accumulation.
  
  As noted by Reviewer 3, the directional movement of microtubules traveling at ~140 nm/s from nurse cells toward the oocyte through Ring Canals was previously reported using a tagged Ens-MT binding domain reporter line by Lu et al. (2022). We have therefore added the citation of this crucial work in the novel version of the manuscript (lane 155-157) and removed the photo-conversion panel.
  
  Critically, however, our study provides mechanistic insight that was missing from this earlier work: this mechanism is also crucial to enrich MAPs in the oocyte. The fact that Dynein mutants fail to enrich Ensconsin is a crucial piece of evidence: it supports a model of Ensconsin-loaded MT transport (Figure 1D-1F).
  
  Figure 2. According to the abstract, this figure shows that Ensconsin is "maintained at the oocyte cortex by Ninein". However, the figure doesn't seem to prove it - it shows that oocyte enrichment of Ensonsin is partially dependent on Ninein, but this applies to the whole cell and not just to the cell cortex. Furthermore, it is not clear whether Ninein mutation affects microtubule density, which in turn would affect Ensconsin enrichment, and therefore, it is not clear whether the effect of Ninein loss on Ensconsin distribution is direct or indirect.
  
  Ninein plays a critical role in Ensconsin enrichment and microtubule organization in the oocyte (new Figure 2, Figure 3, Figure S3). Quantification of total Tubulin signal shows no difference between control and Nin mutant oocytes (new Figure S3 panels A, B). We found decreased Ens enrichment in the oocyte, and Ens localization on MTs and to the cell cortex (Figure 2E, 2F, and Figure S3C and S3D).
  
  Novel quantitative analyses of microtubule orientation at the anterior cortex, where MTs are normally preferentially oriented toward the posterior pole (Parton et al. 2011), demonstrate that Nin mutants exhibit randomized MT orientation compared to wild-type oocytes (new Figure 3C-3E).These findings establish that Ninein (although not essential) favors Ensconsin localization on MTs, Ens enrichment in the oocyte, ncMTOC cortical localization, and more robust MT orientation toward the posterior cortex. It also suggests that Ens levels in the oocyte acts as a rheostat to control Khc activation.
  
  The observation that the aggregates formed by overexpressed Ninein accumulate other proteins, including Ensconsin, supports, though does not prove their interactions. Furthermore, there is absolutely no proof that Ninein aggregates are "ncMTOCs". Unless the authors demonstrate that these aggregates nucleate or anchor microtubules (for example, by detailed imaging of microtubules and EB1 comets), the text and labels in the figure would need to be altered.
  
  We have modified the manuscript, we now refer to an accumulation of these components in large puncta, rather than aggregates, consistent with previous observations (Rosen et al., 2000). We acknowledge in the revised version that these puncta recruit Shot, Patronin and Ens without mentioning direct interaction (lane 218).
  
  Importantly, we conducted a more detailed characterization of these Ninein/Shot/Patronin/Ens-containing puncta in a novel Figure S4. To rigorously assess their nucleation capacity, we analyzed Eb1-GFP-labeled MT comets, a robust readout of MT nucleation (Parton et al., 2011, Nashchekin et al., 2016). While few Eb1-positive comets occasionally emanate from these structures, confirming their identity as putative ncMTOCs, these puncta function as surprisingly weak nucleation centers (new Figure S4 E, Video S1) and, their presence does not alter overall MT architecture (new Figure S4 F). Moreover, these puncta disappear over time, are barely visible at stage 10B, they do not impair oocyte development or fertility (Figure S4 G and Table 1).
  
  Minor comment: Note that a "ratio" (Figure 2C) is just a ratio, and should not be expressed in arbitrary units.
  
  We have amended this point in all the figures.
  
  Figure 3B: immunoprecipitation results cannot be interpreted because the immunoprecipitated proteins (GFP, Ens-GFP, Shot-YFP) are not shown. It is also not clear that this biochemical experiment is useful. If the authors would like to suggest that Ensconsin directly binds to Patronin, the interaction would need to be properly mapped at the protein domain level.
  
  This is a good point: the GFP and Ens-GFP immunoprecipitated proteins are now much clearly identified on the blots and in the figure legend (new Figure 4G). Shot-YFP IP, was used as a positive control but is difficult to be detected by Western blot due to its large size (>106 Da) using conventional acrylamide gels (Nashchekin et al., 2016).
  
  We now explicitly state that immunoprecipitations were performed at 4{degree sign}C, where microtubules are fully depolymerized, thereby excluding undirect microtubule-mediated interactions. We agree with this reviewer: we cannot formally rule out interactions through bridging by other protein components. This is stated in the revised manuscript (lane 238-239).
  
  One of the major phenotypes observed by the authors in Ens mutant is the loss of long microtubules. The authors make strong conclusions about the independence of this phenotype from the parameters of microtubule plus-end growth, but in fact, the quality of their data does not allow to make such a conclusion, because they only measured the number of EB1 comets and their growth rate but not the catastrophe, rescue or pausing frequency."Note that kinesin-1 has been implicated in promoting microtubule damage and rescue (doi: 10.1016/j.devcel.2021).In the absence of such measurements, one cannot conclude whether short microtubules arise through defects in the minus-end, plus-end or microtubule shaft regulation pathways.
  
  We thank the reviewer for raising this important point. Our data demonstrate that microtubule (MT) nucleation and polymerization rates remain unaffected under Khc RNAi and ens mutant conditions, indicating that MT dynamics alterations must arise through alternative mechanisms.
  
  As the reviewer suggested, recent studies on Kinesin activity and MT network regulation are indeed highly relevant. Two key studies from the Verhey and Aumeier laboratories examined Kinesin-1 gain-of-function conditions and revealed that constitutively active Kinesin-1 induces MT lattice damage (Budaitis et al., 2022). While damaged MTs can undergo self-repair, Aumeier and colleagues demonstrated that GTP-tubulin incorporation generates "rescue shafts" that promote MT rescue events (Andreu-Carbo et al., 2022). Extrapolating from these findings, loss of Kinesin-1 activity could plausibly reduce rescue shaft formation, thereby decreasing MT rescue frequency and stability. Although this hypothesis is challenging to test directly in our system, it provides a mechanistic framework for the observed reduction in MT number and stability.
  
  Additionally, the reviewer highlighted the role of Khc in transporting the dynactin complex, an anti-catastrophe factor, to MT plus ends (Nieuwburg et al., 2017), which could further contribute to MT stabilization. This crucial reference is now incorporated into the revised Discussion.
  
  Importantly, our work also demonstrates the contribution of Ens/Khc to ncMTOC targeting to the cell cortex. Our new quantitative analyses of MT organization (new Figure 5 B) reveal a defective anteroposterior orientation of cortical MTs in mutant conditions, pointing to a critical role for cortical ncMTOCs in organizing the MT network.
  
  Taken together, we propose that the observed MT reduction and disorganization result from multiple interconnected mechanisms: (1) reduced rescue shaft formation affecting MT stability; (2) impaired transport of anti-catastrophe factors to MT plus ends; and (3) loss of cortical ncMTOCs, which are essential for minus-end MT stabilization and network organization. The Discussion has been revised to reflect this integrated model in a dedicated paragraph ("A possible regulation of MT dynamics in the oocyte at both plus end minus MT ends by Ens and Khc" lane 415-432).
  
  It is important to note in that a spectraplakin, like Shot, can potentially affect different pathways, particularly when overexpressed.
  
  We agree that Shot harbors multiple functional domains and acts as a key organizer of both actin and microtubule cytoskeletons. Overexpression of such a cytoskeletal cross-linker could indeed perturb both networks, making interpretation of Ens phenotype rescue challenging due to potential indirect effects.
  
  To address this concern, we selected an appropriate Shot isoform for our rescue experiments that displayed similar localization to "endogenous" Shot-YFP (a genomic construct harboring shot regulatory sequences) and importantly that was not overexpressed.
  
  Elevated expression of the Shot.L(A) isoform (see Western Blot Figure S8 A), considered as the wild-type form with two CH1 and CH2 actin-binding motifs (Lee and Kolodziej, 2002), showed abnormal localization such as strong binding to the microtubules in nurse cells and oocyte confirming the risk of gain-of-function artifacts and inappropriate conclusions (Figure S8 B, arrows).
  
  By contrast, our rescue experiments using the Shot.L(C) isoform (that only harbors the CH2 motif) provide strong evidence against such artifacts for three reasons. First, Shot-L(C) is expressed at slightly lower levels than a Shot-YFP genomic construct (not overexpressed), and at much lower levels than Shot-L(A), despite using the same driver (Figure S8 A). Second, Shot-L(C) localization in the oocyte is similar to that of endogenous Shot-YFP, concentrating at the cell cortex (Figure S8 B, compare lower and top panels). Taken together, these controls rather suggest our rescue with the Shot-L(C) is specific.
  
  Note that this Shot-L(C) isoform is sufficient to complement the absence of the shot gene in other cell contexts (Lee and Kolodziej, 2002).
  
  Unjustified conclusions should be removed: the authors do not provide sufficient data to conclude that "ens and Khc oocytes MT organizational defects are caused by decreased ncMTOC cortical anchoring", because the actual cortical microtubule anchoring was not measured.
  
  This is a valid point. We acknowledge that we did not directly measure microtubule anchoring in this study. In response, we have revised the discussion to more accurately reflect our observations. Throughout the manuscript, we now refer to "cortical microtubule organization" rather than "cortical microtubule anchoring," which better aligns with the data presented.
  
  Minor comment: Microtubule growth velocity must be expressed in units of length per time, to enable evaluating the quality of the data, and not as a normalized value.
  
  This is now amended in the revised version (modified Figure S7).
  
  A significant part of the Discussion is dedicated to the potential role of Ensconsin in cortical microtubule anchoring and potential transport of ncMTOCs by kinesin. It is obviously fine that the authors discuss different theories, but it would be very helpful if the authors would first state what has been directly measured and established by their data, and what are the putative, currently speculative explanations of these data.
  
  We have carefully considered the reviewer's constructive comments and are confident that this revised version fully addresses their concerns.
  
  First, we have substantially strengthened the connection between the Results and Discussion sections, ensuring that our interpretations are more directly anchored in the experimental data. This restructuring significantly improves the overall clarity and logical flow of the manuscript.
  
  Second, we have added a new comprehensive figure presenting a molecular-scale model of Kinesin-1 activation upon release of autoinhibition by Ensconsin (new Figure 7D). Critically, this figure also illustrates our proposed positive feedback loop mechanism: Khc-dependent cytoplasmic advection promotes cortical recruitment of additional ncMTOCs, which generates new cortical microtubules and further accelerates cytoplasmic transport (Figure 7 A-C). This self-amplifying cycle provides a mechanistic framework consistent with emerging evidence that cytoplasmic flows are essential for efficient intracellular transport in both insect and mammalian oocytes.
  
  Minor comment: The writing and particularly the grammar need to be significantly improved throughout, which should be very easy with current language tools. Examples: "ncMTOCs recruitment" should be "ncMTOC recruitment"; "Vesicles speed" should be "Vesicle speed", "Nin oocytes harbored a WT growth,"- unclear what this means, etc. Many paragraphs are very long and difficult to read. Making shorter paragraphs would make the authors' line of thought more accessible to the reader.
  
  We have amended and shortened the manuscript according to this reviewer feed-back. We have specifically built more focused paragraphs to facilitates the reading.
  
  Significance
  
  This paper represents significant advance in understanding non-centrosomal microtubule organization in general and in developing Drosophila oocytes in particular by connecting the microtubule minus-end regulation pathway to the Kinesin-1 and Ensconsin/MAP7-dependent transport. The genetics and imaging data are of good quality, are appropriately presented and quantified. These are clear strengths of the study which will make it interesting to researchers studying the cytoskeleton, microtubule-associated proteins and motors, and fly development.
  
  The weaknesses of this study are due to the lack of clarity of the overall molecular model, which would limit the impact of the study on the field. Some interpretations are not sufficiently supported by data, but this can be solved by more precise and careful writing, without extensive additional experimentation.
  
  We thank the reviewer for raising these important concerns regarding clarity and data interpretation. We have thoroughly revised the manuscript to address these issues on multiple fronts. First, we have substantially rewritten key sections to ensure that our conclusions are clearly articulated and directly supported by the data. Second, we have performed several new experiments that now allow us to propose a robust mechanistic model, presented in new figures. These additions significantly strengthen the manuscript and directly address the reviewer's concerns.
  
  My expertise is cell biology and biochemistry of the microtubule cytoskeleton, including both microtubule-associated proteins and microtubule motors.
  
  Reviewer #2
  
  Evidence, reproducibility and clarity
  
  In this manuscript, Berisha et al. investigate how microtubule (MT) organization is spatially regulated during Drosophila oogenesis. The authors identify a mechanism in which the Kinesin-1 activator Ensconsin/MAP7 is transported by dynein and anchored at the oocyte cortex via Ninein, enabling localized activation of Kinesin-1. Disruption of this pathway impairs ncMTOC recruitment and MT anchoring at the cortex. The authors combine genetic manipulation with high-resolution microscopy and use three key readouts to assess MT organization during mid-to-late oogenesis: cortical MT formation, localization of posterior determinants, and ooplasmic streaming. Notably, Kinesin-1, in concert with its activator Ens/MAP7, contributes to organizing the microtubule network it travels along. Overall, the study presents interesting findings, though we have several concerns we would like the authors to address. Ensconsin enrichment in the oocyte 1. Enrichment in the oocyte • Ensconsin is a MAP that binds MTs. Given that microtubule density in the oocyte significantly exceeds that in the nurse cells, its enrichment may passively reflect this difference. To assess whether the enrichment is specific, could the authors express a non-Drosophila MAP (e.g., mammalian MAP1B) to determine whether it also preferentially localizes to the oocyte?
  
  To address this point, we performed a new series of experiments analyzing the enrichment of other Drosophila and non-Drosophila MAPs, including Jupiter-GFP, Eb1-GFP, and bovine Tau-GFP, all widely used markers of the microtubule cytoskeleton in flies (see new Figure S2). Our results reveal that Jupiter-GFP, Eb1-GFP, and bovine Tau-GFP all exhibit significantly weaker enrichment in the oocyte compared to Ens-GFP. Khc-GFP also shows lower enrichment. These findings indicate that MAP enrichment in the oocyte is MAP-dependent, rather than solely reflecting microtubule density or organization. Of note, we cannot exclude that microtubule post-translational modifications contribute to differential MAP binding between nurse cells and the oocyte, but this remains a question for future investigation.
  
  The ability of ens-wt and ens-LowMT to induce tubulin polymerization according to the light scattering data (Fig. S1J) is minimal and does not reflect dramatic differences in localization. The authors should verify that, in all cases, the polymerization product in their in vitro assays is microtubules rather than other light-scattering aggregates. What is the control in these experiments? If it is just purified tubulin, it should not form polymers at physiological concentrations.
  
  The critical concentration Cr for microtubule self-assembly in classical BRB80 buffer found by us and others is around 20 µM (see Fig. 2c in Weiss et al., 2010). Here, microtubules were assembled at 40 µM tubulin concentration, i.e., largely above the Cr. As stated in the materials and methods section, we systematically induced cooling at 4{degree sign}C after assembly to assess the presence of aggregates, since those do not fall apart upon cooling. The decrease in optical density upon cooling is a direct control that the initial increase in DO is due to the formation of microtubules. Finally, aggregation and polymerization curves are widely different, the former displaying an exponential shape and the latter a sigmoid assembly phase (see Fig. 3A and 3B in Weiss et al., 2010).
  
  Photoconversion caveatsMAPs are known to dynamically associate and dissociate from microtubules. Therefore, interpretation of the Ens photoconversion data should be made with caution. The expanding red signal from the nurse cells to the oocyte may reflect a any combination of dynein-mediated MT transport and passive diffusion of unbound Ensconsin. Notably, photoconversion of a soluble protein in the nurse cells would also result in a gradual increase in red signal in the oocyte, independent of active transport. We encourage the authors to more thoroughly discuss these caveats. It may also help to present the green and red channels side by side rather than as merged images, to allow readers to assess signal movement and spatial patterns better.
  
  This is a valid point that mirrors the comment of Reviewers 1 and 3. The directional movement of microtubules traveling at ~140 nm/s from nurse cells toward the oocyte via the ring canals was previously reported by Lu et al. (2022) with excellent spatial resolution. Notably, this MT transport was measured using a fusion protein containing the Ens MT-binding domain. We now cite this relevant study in our revised manuscript and have removed this redundant panel in Figure 1.
  
  Reduction of Shot at the anterior cortex• Shot is known to bind strongly to F-actin, and in the Drosophila ovary, its localization typically correlates more closely with F-actin structures than with microtubules, despite being an MT-actin crosslinker. Therefore, the observed reduction of cortical Shot in ens, nin mutants, and Khc-RNAi oocytes is unexpected. It would be important to determine whether cortical F-actin is also disrupted in these conditions, which should be straightforward to assess via phalloidin staining.
  
  As requested by the reviewer, we performed actin staining experiments, which are now presented in a new Figure S5. These data demonstrate that the cortical actin network remains intact in all mutant backgrounds analyzed, ruling out any indirect effect of actin cytoskeleton disruption on the observed phenotypes.
  
  MTs are barely visible in Fig. 3A, which is meant to demonstrate Ens-GFP colocalization with tubulin. Higher-quality images are needed.
  
  The revised version now provides significantly improved images to show the different components examined. Our data show that Ens and Ninein localize at the cell cortex where they co-localize with Shot and Patronin (Figure 2 A-C). In addition, novel images show that Ens extends along microtubules (new Figure 4 A).
  
  MT gradient in stage 9 oocytesIn ens-/-, nin-/-, and Khc-RNAi oocytes, is there any global defect in the stage 9 microtubule gradient? This information would help clarify the extent to which cortical localization defects reflect broader disruptions in microtubule polarity.
  
  We now provide quantitative analysis of microtubule (MT) array organization in novel figures (Figure 3D and Figure 5B). Our data reveal that both Khc RNAi and ens mutant oocytes exhibit severe disruption of MT orientation toward the posterior (new Figure 5B). Importantly, this defect is significantly less pronounced in Nin-/- oocytes, which retain residual ncMTOCs at the cortex (new Figure 3D). This differential phenotype supports our model that cortical ncMTOCs are critical for maintaining proper MT orientation toward the posterior side of the oocyte.
  
  Role of Ninein in cortical anchoringThe requirement for Ninein in cortical anchorage is the least convincing aspect of the manuscript and somewhat disrupts the narrative flow. First, it is unclear whether Ninein exhibits the same oocyte-enriched localization pattern as Ensconsin. Is Ninein detectable in nurse cells? Second, the Ninein antibody signal appears concentrated in a small area of the anterior-lateral oocyte cortex (Fig. 2A), yet Ninein loss leads to reduced Shot signal along a much larger portion of the anterior cortex (Fig. 2F)-a spatial mismatch that weakens the proposed functional relationship. Third, Ninein overexpression results in cortical aggregates that co-localize with Shot, Patronin, and Ensconsin. Are these aggregates functional ncMTOCs? Do microtubules emanate from these foci?
  
  We now provide a more comprehensive analysis of Ninein localization. Similar to Ensconsin (Ens), endogenous Ninein is enriched in the oocyte during the early stages of oocyte development but is also detected in NCs (see modified Figure 2 A and Lasko et al., 2016). Improved imaging of Ninein further shows that the protein partially co-localizes with Ens, and ncMTOCs at the anterior cortex and with Ens-bound MTs (Figure 2B, 2C).
  
  Importantly, loss of Ninein (Nin) only partially reduces the enrichment of Ens in the oocyte (Figure 2E). Both Ens and Kinesin heavy chain (Khc) remain partially functional and continue to target non-centrosomal microtubule-organizing centers (ncMTOCs) to the cortex (Figure 3A). In Nin-/- mutants, a subset of long cortical microtubules (MTs) is present, thereby generating cytoplasmic streaming, although less efficiently than under wild-type (WT) conditions (Figure 3F and 3G). As a non-essential gene, we envisage Ninein as a facilitator of MT organization during oocyte development.
  
  Finally, our new analyses demonstrate that large puncta containing Ninein, Shot, Patronin, and despite their size, appear to be relatively weak nucleation centers (revised Figure S4 E and Video 1). In addition, their presence does not bias overall MT architecture (Figure S4 F) nor impair oocyte development and fertility (Figure S4 G and Table 1).
  
  Inconsistency of Khc^MutEns rescueThe Khc^MutEns variant partially rescues cortical MT formation and restores a slow but measurable cytoplasmic flow yet it fails to rescue Staufen localization (Fig. 5). This raises questions about the consistency and completeness of the rescue. Could the authors clarify this discrepancy or propose a mechanistic rationale?
  
  This is a good point. The cytoplasmic flows (the consequence of cargo transport by Khc on MTs) generated by a constitutively active KhcMutEns in an ens mutant condition, are less efficient than those driven by Khc activated by Ens in a control condition (Figure 6C). The rescued flow is probably not efficient enough to completely rescue the Staufen localization at stage 10.
  
  Additionally, this KhcMutEns variant rescues the viability of embryos from Khc27 mutant germline clones oocytes but not from ens mutants (Table1). One hypothesis is that Ens harbors additional functions beyond Khc activation.
  
  This incomplete rescue of Ens by an active Khc variant could also be the consequence of the "paradox of co-dependence": Kinesin-1 also transport the antagonizing motor Dynein that promotes cargo transport in opposite directions (Hancock et al., 2016). The phenotype of a gain of function variant is therefore complex to interpret. Consistent with this, both KhcMutEns-GFP and KhcDhinge2 two active Khc only rescues partially centrosome transport in ens mutant Neural Stem Cells (Figure S10).
  
  Minor points: 1. The pUbi-attB-Khc-GFP vector was used to generate the Khc^MutEns transgenic line, presumably under control of the ubiquitous ubi promoter. Could the authors specify which attP landing site was used? Additionally, are the transgenic flies viable and fertile, given that Kinesin-1 is hyperactive in this construct?
  
  All transgenic constructs were integrated at defined genomic landing sites to ensure controlled expression levels. Specifically, both GFP-tagged KhcWT and KhcMutEns were inserted at the VK05 (attP9A) site using PhiC31-mediated integration. Full details of the landing sites are provided in the Materials and Methods section. Both transgenic flies are homozygous lethal and the transgenes are maintained over TM6B balancers.
  
  On page 11 (Discussion, section titled "A dual Ensconsin oocyte enrichment mechanism achieves spatial relief of Khc inhibition"), the statement "many mutations in Kif5A are causal of human diseases" would benefit from a brief clarification. Since not all readers may be familiar with kinesin gene nomenclature, please indicate that KIF5A is one of the three human homologs of Kinesin heavy chain.
  
  We clarified this point in the revised version (lane 465-466).
  
  On page 16 (Materials and Methods, "Immunofluorescence in fly ovaries"), the sentence "Ovaries were mounted on a slide with ProlonGold medium with DAPI (Invitrogen)" should be corrected to "ProLong Gold."
  
  This is corrected.
  
  Significance
  
  This study shows that enrichment of MAP7/ensconsin in the oocyte is the mechanism of kinesin-1 activation there and is important for cytoplasmic streaming and localization non-centrosomal microtubule-organizing centers to the oocyte cortex
  
  We thank the reviewers for the accurate review of our manuscript and their positive feed-back.
  
  Reviewer #3
  
  Evidence, reproducibility and clarity
  
  The manuscript of Berisha et al., investigates the role of Ensconsin (Ens), Kinesin-1 and Ninein in organisation of microtubules (MT) in Drosophila oocyte. At stage 9 oocytes Kinesin-1 transports oskar mRNA, a posterior determinant, along MT that are organised by ncMTOCs. At stage 10b, Kinesin-1 induces cytoplasmic advection to mix the contents of the oocyte. Ensconsin/Map7 is a MT associated protein (MAP) that uses its MT-binding domain (MBD) and kinesin binding domain (KBD) to recruit Kinesin-1 to the microtubules and to stimulate the motility of MT-bound Kinesin-1. Using various new Ens transgenes, the authors demonstrate the requirement of Ens MBD and Ninein in Ens localisation to the oocyte where Ens activates Kinesin-1 using its KBD. The authors also claim that Ens, Kinesin-1 and Ninein are required for the accumulation of ncMTOCs at the oocyte cortex and argue that the detachment of the ncMTOCs from the cortex accounts for the reduced localisation of oskar mRNA at stage 9 and the lack of cytoplasmic streaming at stage 10b. Although the manuscript contains several interesting observations, the authors' conclusions are not sufficiently supported by their data. The structure function analysis of Ensconsin (Ens) is potentially publishable, but the conclusions on ncMTOC anchoring and cytoplasmic streaming not convincing.
  
  We are grateful that the regulation of Khc activity by MAP7 was well received by all reviewers. While our study focuses on Drosophila oogenesis, we believe this mechanism may have broader implications for understanding kinesin regulation across biological systems.
  
  For the novel function of the MAP7/Khc complex in organizing its own microtubule networks through ncMTOC recruitment, we have carefully considered the reviewers' constructive recommendations. We now provide additional experimental evidence supporting a model of flux self-amplification in which ncMTOC recruitment plays a key role. It is well established that cytoplasmic flows are essential for posterior localization of cell fate determinants at stage 10B. Slow flows have also been described at earlier oogenesis stages by the groups of Saxton and St Johnston. Building on these early publications and our new experiments, we propose that these flows are essential to promote a positive feedback loop that reinforces ncMTOC recruitment and MT organization (Figure 7).
  
  1) The main conclusion of the manuscript is that "MT advection failure in Khc and ens in late oogenesis stems from defective cortical ncMTOCs recruitment". This completely overlooks the abundant evidence that Kinesin-1 directly drives cytoplasmic streaming by transporting vesicles and microtubules along microtubules, which then move the cytoplasm by advection (Palacios et al., 2002; Serbus et al, 2005; Lu et al, 2016). Since Kinesin-1 generates the flows, one cannot conclude that the effect of khc and ens mutants on cortical ncMTOC positioning has any direct effect on these flows, which do not occur in these mutants.
  
  We regret the lack of clarity of the first version of the manuscript and some missing references. We propose a model in which the Kinesin-1- dependent slow flows (described by Serbus/Saxton and Palacios/StJohnston) play a central role in amplifying ncMTOC anchoring and cortical MT network formation (see model in the new Figure 7).
  
  2) The authors claim that streaming phenotypes of ens and khs mutants are due to a decrease in microtubule length caused by the defective localisation of ncMTOCs. In addition to the problem raised above, However, I am not convinced that they can make accurate measurements of microtubule length from confocal images like those shown in Figure 4. Firstly, they are measuring the length of bundles of microtubules and cannot resolve individual microtubules. This problem is compounded by the fact that the microtubules do not align into parallel bundles in the mutants. This will make the "microtubules" appear shorter in the mutants. In addition, the alignment of the microtubules in wild-type allows one to choose images in which the microtubule lie in the imaging plane, whereas the more disorganized arrangement of the microtubules in the mutants means that most microtubules will cross the imaging plane, which precludes accurate measurements of their length.
  
  As mentioned by Reviewer 4, we have been transparent with the methodology, and the limitations that were fully described in the material and methods section.
  
  Cortical microtubules in oocytes are highly dynamic and move rapidly, making it technically impossible to capture their entire length using standard Z-stack acquisitions. We therefore adopted a compromise approach: measuring microtubules within a single focal plane positioned just below the oocyte cortex. This strategy is consistent with established methods in the field, such as those used by Parton et al. (2011) to track microtubule plus-end directionality. To avoid overinterpretation, we explicitly refer to these measurements as "minimum detectable MT length," acknowledging that microtubules may extend beyond the focal plane, particularly at stage 10, where long, tortuous bundles frequently exit the plane of focus. These methodological considerations and potential biases are clearly described in the Materials and Methods section and the text now mentions the possible disorganization of the MT network in the mutant conditions (lane 272-273).
  
  In this revised version, we now provide complementary analyses of MT network organization.Beyond length measurements (and the mentioned limitations), we also quantified microtubule network orientation at stage 9, assessing whether cortical microtubules are preferentially oriented toward the posterior axis as observed in controls (revised Figure 3D and Figure 5B). While this analysis is also subject to the same technical limitations, it reveals a clear biological difference: microtubules exhibit posterior-biased orientation in control oocytes similar to a previous study (Parton et al., 2011) but adopt a randomized orientation in Nin-/-, ens, and Khc RNAi-depleted oocytes (revised Figure 3D and Figure 5B).
  
  Taken together, these complementary approaches, despite their technical constraints, provide convergent evidence for the role of the Khc/Ens complex in organizing cortical microtubule networks during oogenesis.
  
  3) "To investigate whether the presence of these short microtubules in ens and Khc RNAi oocytes is due to defects in microtubule anchoring or is also associated with a decrease in microtubule polymerization at their plus ends, we quantified the velocity and number of EB1comets, which label growing microtubule plus ends (Figure S3)." I do not understand how the anchoring or not of microtubule minus ends to the cortex determines how far their plus ends grow, and these measurements fall short of showing that plus end growth is unaffected. It has already been shown that the Kinesin-1-dependent transport of Dynactin to growing microtubule plus ends increases the length of microtubules in the oocyte because Dynactin acts as an anti-catastrophe factor at the plus ends. Thus, khc mutants should have shorter microtubules independently of any effects on ncMTOC anchoring. The measurements of EB1 comet speed and frequency in FigS2 will not detect this change and are not relevant for their claims about microtubule length. Furthermore, the authors measured EB1 comets at stage 9 (where they did not observe short MT) rather than at stage 10b. The authors' argument would be better supported if they performed the measurements at stage 10b.
  
  We thank the reviewer for raising this important point. The short microtubule (MT) length observed at stage 10B could indeed result from limited plus-end growth. Unfortunately, we were unable to test this hypothesis directly: strong endogenous yolk autofluorescence at this stage prevented reliable detection of Eb1-GFP comets, precluding velocity measurements.
  
  At least during stage 9, our data demonstrate that MT nucleation and polymerization rates are not reduced in both KhcRNAi and ens mutant conditions, indicating that the observed MT alterations must arise through alternative mechanisms.
  
  In the discussion, we propose the following interconnected explanations, supported by recent literature and the reviewers' suggestions:
  
  1- Reduced MT rescue events. Two seminal studies from the Verhey and Aumeier laboratories have shown that constitutively active Kinesin-1 induces MT lattice damage (Budaitis et al., 2022), which can be repaired through GTP-tubulin incorporation into "rescue shafts" that promote MT rescue (Andreu-Carbo et al., 2022). Extrapolating from these findings, loss of Kinesin-1 activity could plausibly reduce rescue shaft formation, thereby decreasing MT stability. While challenging to test directly in our system, this mechanism provides a plausible framework for the observed phenotype.
  
  2- Impaired transport of stabilizing factors. As that reviewer astutely points out, Khc transports the dynactin complex, an anti-catastrophe factor, to MT plus ends (Nieuwburg et al., 2017). Loss of this transport could further compromise MT plus end stability. We now discuss this important mechanism in the revised manuscript.
  
  3- Loss of cortical ncMTOCs. Critically, our new quantitative analyses (revised Figure 3 and Figure 5) also reveal defective anteroposterior orientation of cortical MTs in mutant conditions. These experiments suggest that Ens/Khc-mediated localization of ncMTOCs to the cortex is essential for proper MT network organization, and possibly minus-end stabilization as suggested in several studies (Feng et al., 2019, Goodwin and Vale, 2011, Nashchekin et al., 2016).
  
  Altogether, we now propose an integrated model in which MT reduction and disorganization may result from multiple complementary mechanisms operating downstream of Kinesin-1/Ensconsin loss. While some aspects remain difficult to test directly in our in vivo system, the convergence of our data with recent mechanistic studies provides an interesting conceptual framework. The Discussion has been revised to reflect this comprehensive view in a dedicated paragraph ("A possible regulation of MT dynamics in the oocyte at both plus end minus MT ends by Ens and Khc" lane 415-432).
  
  4) The Shot overexpression experiments presented in Fig.3 E-F, Fig.4D and TableS1 are very confusing. Originally , the authors used Shot-GFP overexpression at stage 9 to show that there is a decrease of ncMTOCs at the cortex in ens mutants (Fig.3 E-F) and speculated that this caused the defects in MT length and cytoplasmic advection at stage 10B. However the authors later state on page 8 that : "Shot overexpression (Shot OE) was sufficient to rescue the presence of long cortical MTs and ooplasmic advection in most ens oocytes (9/14), resembling the patterns observed in controls (Figures 4B right panel and 4D). Moreover, while ens females were fully sterile, overexpression of Shot was sufficient to restore that loss of fertility (Table S1)". Is this the same UAS Shot-GFP and VP16 Gal4 used in both experiments? If so, this contradictions puts the authors conclusions in question.
  
  This is an important point that requires clarification regarding our experimental design.
  
  The Shot-YFP construct is a genomic insertion on chromosome 3. The ens mutation is also located on chromosome 3 and we were unable to recombine this transgene with the ens mutant for live quantification of cortical Shot. To circumvent this technical limitation, we used a UAS-Shot.L(C)-GFP transgenic construct driven by a maternal driver, expressed in both wild-type (control) and ens mutant oocytes. We validated that the expression level and subcellular localization of UAS-Shot.L(C)-GFP were comparable to those of the genomic Shot-YFP (new Figure S8 A and B).
  
  From these experiments, we drew two key conclusions. First, cortical Shot.L(C)-GFP is less abundant in ens mutant oocytes compared to wild-type (the quantification has been removed from this version). Second, despite this reduced cortical accumulation, Shot.L(C)-GFP expression partially rescues ooplasmic flows and microtubule streaming in stage 10B ens mutant oocytes, and restores fertility to ens mutant females.
  
  5) The authors based they conclusions about the involvement of Ens, Kinesin-1 and Ninein in ncMTOC anchoring on the decrease in cortical fluorescence intensity of Shot-YFP and Patronin-YFP in the corresponding mutant backgrounds. However, there is a large variation in average Shot-YFP intensity between control oocytes in different experiments. In Fig. 2F-G the average level of Shot-YFP in the control sis 130 AU while in Fig.3 G-H it is only 55 AU. This makes me worry about reliability of such measurements and the conclusions drawn from them.
  
  To clarify this point, we have harmonized the method used to quantify the Shot-YFP signals in Figure 4E with the methodology used in Figure 3B, based on the original images. The levels are not strictly identical (Control Figure 2 B: 132.7+/-36.2 versus Control Figure 4 E: 164.0+/- 37.7). These differences are usual when experiments are performed at several-month intervals and by different users.
  
  6) The decrease in the intensity of Shot-YFP and Patronin-YFP cortical fluorescence in ens mutant oocytes could be because of problems with ncMTOC anchoring or with ncMTOCs formation. The authors should find a way to distinguish between these two possibilities. The authors could express Ens-Mut (described in Sung et al 2008), which localises at the oocyte posterior and test whether it recruits Shot/Patronin ncMTOCs to the posterior.
  
  We tried to obtain the fly stocks described in the 2008 paper by contacting former members of Pernille Rørth's laboratory. Unfortunately, we learned that the lab no longer exists and that all reagents, including the requested stocks, were either discarded or lost over time. To our knowledge, these materials are no longer available from any source. We regret that this limitation prevented us from performing the straightforward experiments suggested by the reviewer using these specific tools.
  
  7) According to the Materials and Methods, the Shot-GFP used in Fig.3 E-F and Fig.4 was the BDSC line 29042. This is Shot L(C), a full-length version of Shot missing the CH1 actin-binding domain that is crucial for Shot anchoring to the cortex. If the authors indeed used this version of Shot-GFP, the interpretation of the above experiments is very difficult.
  
  The Shot.L(C) isoform lacks the CH1 domain but retains the CH2 actin-binding motif. Truncated proteins with this domain and fused to GST retains a weak ability to bind actin in vitro. Importantly, the function of this isoform is context-dependent: it cannot rescue shot loss-of-function in neuron morphogenesis but fully restores Shot-dependent tracheal cell remodeling (Lee and Kolodziej, 2002).
  
  In our experiments, when the Shot.L(C) isoform was expressed under the control of a maternal driver, its localization to the oocyte cortex was comparable to that of the genomic Shot-YFP construct (new Figure S8). This demonstrates unambiguously that the CH1 domain is dispensable for Shot cortical localization in oocytes, and that CH2-mediated actin binding is sufficient for this localization. Of note, a recent study showed that actin network are not equivalent highlighting the need for specific Shot isoforms harboring specialized actin-binding domain (Nashchekin et al., 2024).
  
  We note that the expression level of Shot.L(C)-GFP in the oocyte appeared slightly lower than that of Shot-YFP (expressed under endogenous Shot regulatory sequences), as assessed by Western blot (Figure S8 A).
  
  Critically, Shot.L(C)-GFP expression was substantially lower than that of Shot.L(A)-GFP (that harbored both the CH1 and CH2 domain). Shot.L(A)-GFP was overexpressed (Figure 8 A) and ectopically localized on MTs in both nurse cells and the ooplasm (Figure S8 B middle panel and arrow). These observations are in agreement that the Shot.L(C)-GFP rescue experiment was performed at near-physiological expression levels, strengthening the validity of our conclusions.
  
  8) Page 6 "converted in NCs, in a region adjacent to the ring canals, Dendra-Ens-labeled MTs were found in the oocyte compartment indicating they are able to travel from NC toward the oocyte through ring canals". I have difficulty seeing the translocation of MT through the ring canals. Perhaps it would be more obvious with a movie/picture showing only one channel. Considering that f Dendra-Ens appears in the oocyte much faster than MT transport through ring canals (140nm/s, Lu et al 2022), the authors are most probably observing the translocation of free Ens rather than Ens bound to MT. The authors should also mention that Ens movement from the NC to the oocyte has been shown before with Ens MBD in Lu et al 2022 with better resolution.
  
  We fully agree on the caveat mentioned by this reviewer: we may observe the translocation of free Dendra-Ensconsin. The experiment, was removed and replaced by referring to the work of the Gelfand lab. The movement of MTs that travel at ~140 nm/s between nurse cells toward the oocyte through the Ring Canals was reported before by Lu et al. (2022) with a very good resolution. Notably, this directional directed movement of MTs was measured using a fusion protein encompassing Ens MT-binding domain. We decided to remove this inclusive experiment and rather refer to this relevant study.
  
  9) Page 6: The co-localization of Ninein with Ens and Shot at the oocyte cortex (Figure 2A). I have difficulty seeing this co-localisation. Perhaps it would be more obvious in merged images of only two channels and with higher resolution images
  
  10) "a pool of the Ens-GFP co-localized with Ch-Patronin at cortical ncMTOCs at the anterior cortex (Figure 3A)". I also have difficulty seeing this.
  
  We have performed new high-resolution acquisitions that provide clearer and more convincing evidence for the localization cortical distribution of these proteins (revised Figure 2A-2C and Figure 4A). These improved images demonstrate that Ens, Ninein, Shot, and Patronin partially colocalize at cortical ncMTOCs, as initially proposed. Importantly, the new data also reveal a spatial distinction: while Ens localizes along microtubules extending from these cortical sites, Ninein appears confined to small cytoplasmic puncta adjacent but also present on cortical microtubules.
  
  11) "Ninein co-localizes with Ens at the oocyte cortex and partially along cortical microtubules, contributing to the maintenance of high Ens protein levels in the oocyte and its proper cortical targeting". I could not find any data showing the involvement of Ninein in the cortical targeting of Ens.
  
  We found decreased Ens localization to MTs and to the cell cortex region (new Figure S3 A-B).
  
  12) "our MT network analyses reveal the presence of numerous short MTs cytoplasmic clustered in an anterior pattern." "This low cortical recruitment of ncMTOCs is consistent with poor MT anchoring and their cytoplasmic accumulation." I could not find any data showing that short cortical MT observed at stage 10b in ens mutant and Khc RNAi were cytoplasmic and poorly anchored.
  
  The sentence was removed from the revised manuscript.
  
  13) "The egg chamber consists of interconnected cells where Dynein and Khc activities are spatially separated. Dynein facilitates transport from NCs to the oocyte, while Khc mediates both transport and advection within the oocyte." Dynein is involved in various activities in the oocyte. It anchors the oocyte nucleus and transports bcd and grk mRNA to mention a few.
  
  The text was amended to reflect Dynein involvement in transport activities in the oocyte, with the appropriate references (lane 105-107).
  
  14) The cartoons in Fig.2H and 3I exaggerate the effect of Ninein and Ens on cortical ncMTOCs. According to the corresponding graphs, there is a 20 and 50% decrease in each case.
  
  New cartoons (now revised Figure 3E and 4F), are amended to reflect the ncMTOC values but also MT orientation (Figure 3E).
  
  Significance
  
  Given the important concerns raised, the significance of the findings is difficult to assess at this stage.
  
  We sincerely thank the reviewer for their thorough evaluation of our manuscript. We have carefully addressed their concerns through substantial new experiments and analyses. We hope that the revised manuscript, in its current form, now provides the clarifications and additional evidence requested, and that our responses demonstrate the significance of our findings.
  
  Reviewer #4 (Evidence, reproducibility and clarity (Required)):
  
  Summary: This manuscript presents an investigation into the molecular mechanisms governing spatial activation of Kinesin-1 motor protein during Drosophila oogenesis, revealing a regulatory network that controls microtubule organization and cytoplasmic transport. The authors demonstrate that Ensconsin, a MAP7 family protein and Kinesin-1 activator, is spatially enriched in the oocyte through a dual mechanism involving Dynein-mediated transport from nurse cells and cortical maintenance by Ninein. This spatial enrichment of Ens is crucial for locally relieving Kinesin-1 auto-inhibition. The Ens/Khc complex promotes cortical recruitment of non-centrosomal microtubule organizing centers (ncMTOCs), which are essential for anchoring microtubules at the cortex, enabling the formation of long, parallel microtubule streams or "twisters" that drive cytoplasmic advection during late oogenesis. This work establishes a paradigm where motor protein activation is spatially controlled through targeted localization of regulatory cofactors, with the activated motor then participating in building its own transport infrastructure through ncMTOC recruitment and microtubule network organization.
  
  There's a lot to like about this paper! The data are generally lovely and nicely presented. The authors also use a combination of experimental approaches, combining genetics, live and fixed imaging, and protein biochemistry.
  
  We thank the reviewer for this enthusiastic and supportive review, which helped us further strengthen the manuscript.
  
  Concerns: Page 6: "to assay if elevation of Ninein levels was able to mis-regulate Ens localization, we overexpressed a tagged Ninein-RFP protein in the oocyte. At stage 9 the overexpressed Ninein accumulated at the anterior cortex of the oocyte and also generated large cortical aggregates able to recruit high levels of Ens (Figures 2D and 2H)... The examination of Ninein/Ens cortical aggregates obtained after Ninein overexpression showed that these aggregates were also able to recruit high levels of Patronin and Shot (Figures 2E and 2H)." Firstly, I'm not crazy about the use of "overexpressed" here, since there isn't normally any Ninein-RFP in the oocyte. In these experiments it has been therefore expressed, not overexpressed. Secondly, I don't understand what the reader is supposed to make of these data. Expression of a protein carrying a large fluorescent tag leads to large aggregates (they don't look cortical to me) that include multiple proteins - in fact, all the proteins examined. I don't understand this to be evidence of anything in particular, except that Ninein-RFP causes the accumulation of big multi-protein aggregates. While I can understand what the authors were trying to do here, I think that these data are inconclusive and should be de-emphasized.
  
  We have revised the manuscript by replacing overexpressed with expressed (lanes 211 and 212). In addition, we now provide new localization data in both cortical (new Figure S4 A, top) and medial focal planes (new Figure S4 A, bottom), demonstrating that Ninein puncta (the word used in Rosen et al, 2019), rather than aggregates are located cortically. We also show that live IRP-labelled MTs do not colocalize with Ninein-RFP puncta. In light of the new experiments and the comments from the other reviewers, the corresponding text has been revised and de-emphasized accordingly.
  
  Page 7: "Co-immunoprecipitations experiments revealed that Patronin was associated with Shot-YFP, as shown previously (Nashchekin et al., 2016), but also with EnsWT-GFP, indicating that Ens, Shot and Patronin are present in the same complex (Figure 3B)." I do not agree that association between Ens-GFP and Patronin indicates that Ens is in the same complex as Shot and Patronin. It is also very possible that there are two (or more) distinct protein complexes. This conclusion could therefore be softened. Instead of "indicating" I suggest "suggesting the possibility."
  
  We have toned down this conclusion and indicated "suggesting the possibility" (lane 238-239).
  
  Page 7: "During stage 9, the average subcortical MT length, taken at one focal plane in live oocytes (see methods)..." I appreciate that the authors have been careful to describe how they measured MT length, as this is a major point for interpretation. I think the reader would benefit from an explanation of why they decided to measure in only one focal plane and how that decision could impact the results.
  
  We appreciate this helpful suggestion. Cortical microtubules are indeed highly dynamic and extend in multiple directions, including along the Z-axis. Moreover, their diameter is extremely small (approximately 25 nm), making it technically challenging to accurately measure their full length with high resolution using our Zeiss Airyscan confocal microscope (over several, microns): the acquisition of Z-stacks is relatively slow and therefore not well suited to capturing the rapid dynamics of these microtubules. Consequently, our length measurements represent a compromise and most likely underestimate the actual lengths of microtubules growing outside the focal plane. We note that other groups have encountered similar technical limitations (Parton et al., 2011).
  
  Page 7: "... the MTs exhibited an orthogonal orientation relative to the anterior cortex (Figures 4A left panels, 4C and 4E)." This phenotype might not be obvious to readers. Can it be quantified?
  
  We have now analyzed the orientation of microtubules (MTs) along the dorso-ventral axis. Our analysis shows that ens, Khc RNAi oocytes (new Figure 5B), and, to a lesser extent, Nin mutant oocytes (new Figure 3D), display a more random MT orientation compared to wild-type (WT) oocytes. In WT oocytes, MTs are predominantly oriented toward the posterior pole, consistent with previous findings (Parton et al., 2011).
  
  Page 8: "Altogether, the analyses of Ens and Khc defective oocytes suggested that MT organization defects during late oogenesis (stage 10B) were caused by an initial failure of ncMTOCs to reach the cell cortex. Therefore, we hypothesized that overexpression of the ncMTOC component Shot could restore certain aspects of microtubule cortical organization in ens-deficient oocytes. Indeed, Shot overexpression (Shot OE) was sufficient to rescue the presence of long cortical MTs and ooplasmic advection in most ens oocytes (9/14)..." The data are clear, but the explanation is not. Can the authors please explain why adding in more of an ncMTOC component (Shot) rescues a defect of ncMTOC cortical localization?
  
  We propose that cytoplasmic ncMTOCs can bind the cell cortex via the Shot subunit that is so far the only component that harbors actin-binding motifs. Therefore, we propose that elevating cytoplasmic Shot increase the possibility of Shot to encounter the cortex by diffusion when flows are absent. This is now explained lane 282-285.
  
  I'm grateful to the authors for their inclusion of helpful diagrams, as in Figures 1G and 2H. I think the manuscript might benefit from one more of these at the end, illustrating the ultimate model.
  
  We have carefully considered and followed the reviewer's suggestions. In response, we have included a new figure illustrating our proposed model: the recruitment of ncMTOCs to the cell cortex through low Khc-mediated flows at stage 9 enhances cortical microtubule density, which in turn promotes self-amplifying flows (new Figure 7, panels A to C). Note that this Figure also depicts activation of Khc by loss of auto-inhibition (Figure 7, panel D).
  
  I'm sorry to say that the language could use quite a bit of polishing. There are missing and extraneous commas. There is also regular confusion between the use of plural and singular nouns. Some early instances include:
  
  Page 3: thought instead of "thoughted."
  
  Page 5: "A previous studies have revealed"
  
  Page 5: "A significantly loss"
  
  Page 6: "troughs ring canals" should be "through ring canals"
  
  Page 7: lives stage 9 oocytes
  
  Page 7: As ens and Khc RNAi oocytes exhibits
  
  Page 7: we examined in details
  
  Page 7: This average MT length was similar in Khc RNAi and ens mutant oocyte..
  
  We apologize for errors. We made the appropriate corrections of the manuscript.
  
  Reviewer #4 (Significance (Required)):
  
  This work makes a nice conceptual advance by showing that motor activation controls its own transport infrastructure, a paradigm that could extend to other systems requiring spatially regulated transport.
  
  We thank the reviewers for their evaluation of the manuscript and helpful comments.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2025.04.15.648882
www.biorxiv.org www.biorxiv.org

Atomistic simulations reveal sub-μs contact dynamics in MUT-16 condensates

1
1. Public_Reviews 15 Jun 2026
  
  in eLife
  
  Author response:
  
  Response to the eLife Assessment
  
  We thank the Editors and the Reviewers for their helpful suggestions, which will help us strengthen and test the key conclusions of this study of condensate dynamics at atomic resolution. In response to the Editors, we will make clearer in the Results and Discussion how the present work advances beyond our initial study of MUT-16 condensates, the scaffold of Mutator foci (Gaurav K et al., Biophys. J. 2025; 124:3987–4004). That study used a multiscale approach — residue-level (CALVADOS2) and near-atomic (Martini3) coarse-grained simulations together with in vitro experiments — to establish that the foci-forming region (FFR) phase separates whereas the adjacent MUT-8-binding region (M8BR) does not, and used atomistic simulations of that non-phase-separating region to dissect client–scaffold recognition. In this way the multi-scale simulations helped to provide a molecular basis for previous in vivo observations by Uebel et al. (PLOS Genet. 2018; 14(7):e1007542). That study did not, however, resolve with atomic resolution the interactions within the phase-separated FFR condensate itself. The present study addresses precisely this gap: from 10 µs of atomistic molecular dynamics of the FFR condensate, we characterise the sub-µs contact dynamics and the protein–ion and protein–water interactions that govern the condensed phase at atomistic resolution — observables inaccessible to the coarse-grained models used previously, but key to understanding the properties of Mutator foci and ultimately how they underpin biological function in small RNA biology.
  
  Reviewer 1:
  
  (1) I have several questions regarding the system preparation that require clarification. The authors state that "65 copies of the coarse-grained MUT-16 FFR were embedded in a slab-shaped simulation," but it is not clear how this initial configuration was generated. Were the molecules randomly distributed in the simulation box, or were they initially arranged in a preformed condensate? Alternatively, were they randomly inserted and allowed to self-assemble into a condensate during NpT simulations? In Figure 1, the atomistic snapshot appears to show a well-defined condensate at the center of the simulation box. It would be important to clarify how this configuration was obtained: Was it generated from coarse-grained simulations starting from random initial conditions? Or was a preassembled condensate used as input? Related to this, how do the authors ensure that the simulations are equilibrated? While 20 μs appears to be a reasonably long simulation time for coarse-grained simulations, it would be useful to demonstrate equilibration explicitly. For example, the authors could plot the center-of-mass positions (in the long axis of the simulation box) of individual proteins over time to show that all molecules reach a steady state and remain within the condensate without systematic drift.
  
  We thank the reviewer for these important clarifying questions regarding system preparation and equilibration.
  
  The initial structure for the atomistic simulation was generated by randomly inserting 65 copies of the coarse-grained MUT-16 FFR into a slab-shaped simulation box using the gmx insert-molecules tool. The molecules were therefore not pre-arranged in a condensate; instead, they were allowed to spontaneously self-assemble from this random configuration during NpT simulations using the Martini3-IDP force field over 20 μs. The well-defined condensate visible in Figure 1 is thus the product of this unbiased self-assembly process.
  
  To make this workflow transparent to the reader, we will revise Figure 1 to include a two-panel illustration of the Martini3 simulation: a snapshot at t = 0 ns showing the randomly distributed chains, and a snapshot at t = 20 μs showing the assembled condensate, connected by an arrow indicating the subsequent backmapping step to the atomistic representation. We believe this will clearly communicate the sequential nature of the pipeline (random insertion → coarse-grained self-assembly → atomistic backmapping).
  
  We appreciate the concrete suggestion for demonstrating equilibration. We will add a supplementary figure showing the center-of-mass positions of individual protein chains along the long axis of the simulation box as a function of simulation time. This will allow readers to verify that molecules converge into the condensate phase and reach a steady state without systematic drift, providing explicit evidence that 20 μs coarse-grained simulation time is sufficient for equilibration under these conditions.
  
  (2) The authors experimentally observe UCST behavior for these condensates. Do the coarse-grained or atomistic simulations reproduce this behavior?
  
  While atomistic simulations may be too computationally demanding to systematically explore temperature dependence, coarse-grained simulations could be used to test whether condensates are stable at lower temperatures and dissolve at higher temperatures. Such an analysis would provide valuable support for the experimental observations.
  
  We thank the reviewer for this valuable suggestion. In previous coarse-grained simulations we have used a coarse-grained force field that does not capture UCST vs LCST behavior (Gaurav K et al. Biophys. J. 2025; 124:3987–4004). It will be very interesting to revisit these coarse-grained simulations with a coarse-grained simulation force field that can capture UCST and LCST behavior such as the Mpipi-T (Chakravarti & Joseph, Protein Sci 2025;34(10):e70284) and HPS-T models (Dignon GL et al. ACS Cent. Sci. 2019; 5(5):821–830). We plan to perform additional coarse-grained simulations at multiple temperatures using the HPS-T force field. The HPS-T model has been shown to capture UCST versus LCST behavior (Changiarath A et al. bioRxiv 2024) in accordance with previous in vitro experiments. These simulations will allow us to test whether the MUT-16 FFR condensates remain stable at lower temperatures and dissolve at higher temperatures, providing direct computational support for the experimentally observed UCST behavior. We will include this analysis in the revised manuscript.
  
  (3) Regarding the analysis of ions, several points could be clarified and extended:
  
  a) It would be helpful to report the total number of ions and quantify how many are located inside vs. outside the condensate. While qualitative trends can be inferred from density profiles, quantitative analysis would strengthen the conclusions.
  
  b) It would also be interesting to analyze the number of contact ion pairs (e.g., Na⁺-Cl⁻ pairs), as described in J. Chem. Phys. 156, 044505 (2022). It is known that some ion models tend to overestimate ion pairing and underestimate solubility (e.g., J. Chem. Phys. 153, 010903 (2020)).
  
  c) In this context, the use of scaled-charge models has been shown to improve the description of ionic solutions and biomolecular systems (e.g., J. Phys. Chem. Lett. 2019, 10, 23, 7531-7536). I would suggest that, at least for one trajectory, the authors perform a test simulation using scaled charges (e.g., scaling by ~0.8) to evaluate whether ion distributions and protein-ion interactions are significantly affected.
  
  We thank the reviewer for these insightful suggestions regarding the ion analysis. We agree that a more quantitative treatment of ion behavior would strengthen the manuscript. To address all three points collectively, we will expand the existing Figure S7 with additional panels. These will include quantitative counts of Na<sup>+</sup> and Cl<sup>-</sup> ions partitioning inside versus outside the condensate complementing the existing density profiles, the Na<sup>+</sup>–Cl<sup>-</sup> radial distribution functions to estimate contact ion pair populations following J. Chem. Phys. 156, 044505 (2022).
  
  Following the Reviewer suggestion we will run a simulation with scaled charges (~0.8 scaling factor, J. Phys. Chem. Lett. 2019, 10(23):7531–7536) to evaluate the sensitivity of our results to the choice of ion model. We will compare ion distributions obtained with standard versus scaled charges . We will discuss the contact ion pair results in the context of known force field limitations regarding ion pairing (J. Chem. Phys. 153, 010903 (2020)) and assess whether the scaled-charge treatment leads to any qualitatively different conclusions.
  
  (4) Finally, while the selected water model is known to be accurate, it would be useful to assess its performance for concentrated salt solutions. For example, the authors could estimate the density of a 6 m salt solution and compare it with experimental data or validated models (e.g., J. Chem. Phys. 151, 134504 (2019)). This would help clarify to what extent the conclusions depend on the chosen force field.
  
  We thank the reviewer for this important suggestion. We agree that while the chosen water model is well established for biomolecular simulations, its performance under concentrated salt conditions is a legitimate concern that is worth explicitly validating in the context of this work. We will perform a short bulk simulation of a 6 m NaCl solution and compute the solution density, comparing it to experimental data (J. Chem. Phys. 151, 134504 (2019)). This straightforward validation will allow us to quantify how well our water and ion force field combination reproduces the thermodynamic properties of concentrated salt solutions, and to transparently discuss any deviations and their potential implications for the ion partitioning and protein–ion interaction results presented in the manuscript. The results will be added to the supplementary information alongside the expanded ion analysis in Figure S7.
  
  (5) In the Introduction, it would be helpful to elaborate further on the possible driving forces of LLPS in this region. Are there prior hypotheses or evidence pointing to specific interactions (e.g., cation-π, π-π, electrostatic interactions)? While this work addresses these questions, a brief discussion of previous experimental or theoretical insights would provide useful context.
  
  We thank the reviewer for this helpful suggestion. We will expand the Introduction to briefly discuss the known molecular driving forces of LLPS in IDR-containing proteins. Specifically, we will discuss the role of π–π interactions between aromatic residues (Vernon et al. eLife 2018; 7:e31486), cation–π interactions between aromatic and positively charged residues such as tyrosine–arginine pairs, which have been experimentally demonstrated to drive condensate formation in proteins such as FUS (Qamar et al. Cell 2018; 173:720–734), and the broader sequence-encoded molecular grammar governing these interactions in prion-like RNA-binding proteins (Wang et al. Cell 2018; 174:688–699, Rekhi et al. Nat Chem 2024 16:1113–1124 ). We will discuss previous findings on how ions shape interactions in condensates (MacAinsh et al. eLife 2024; 13:RP100282). We will also note the contribution of electrostatic interactions arising from charge patterning within the IDR, and contextualize how these general principles apply to the specific sequence composition of MUT-16 FFR, motivating the simulation-based investigation presented in this work.
  
  (6) On page 18, the authors state: "MUT-16 FFR satisfies the length (172 residues), aromatic content (20.35%), and Arg enrichment (85.71%) criteria. Its charge content (10.47%) and charge balance (38.89% positive charge fraction) are slightly below the nominal thresholds." It would be very helpful to include a schematic representation of the protein sequence highlighting these features (aromatic residues, charge distribution, etc.) in the corresponding figure, to provide a more intuitive understanding.
  
  We thank the reviewer for this helpful suggestion. We will include a figure showing a schematic representation of the MUT-16 FFR sequence, with aromatic residues, charged residues (positive and negative), and arginine content highlighted.
  
  (7) A question regarding ion hydration: What is the coordination environment of the ions that bridge proteins? Are they still hydrated by water molecules, or does the reduced water content inside the condensate significantly affect their solvation. Typically, Na<sup>+</sup> and Cl<sup>-</sup> ions have coordination numbers around 5-6 in aqueous solution. Do protein interactions and reduced solvent conditions within the condensate alter this coordination? A brief analysis or discussion would be valuable.
  
  We will calculate the coordination numbers of Na⁺ and Cl⁻ ions that mediate residue–residue bridging interactions inside the condensate and compare them against ions in the bulk dilute phase. This will directly reveal the degree to which bridging ions retain or lose their hydration shell when engaging with protein residues, and whether the condensate environment meaningfully perturbs ion solvation. The results will be presented as an additional figure in the Supplementary Information.
  
  Reviewer 2:
  
  (1) The large amount of detail in the results section sometimes makes it difficult to identify the central take-home messages. I encourage the authors to more clearly highlight the principal findings and the physical insights that may generalize to other condensate-forming systems. The authors may also consider streamlining parts of the Results section to improve focus and readability.
  
  We thank the reviewer for this constructive feedback. We will revise the Results section by adding brief concluding remarks at the end of each subsection that explicitly state the key physical insight emerging from that analysis. We will consider which secondary findings can be moved to the Supplementary Information. We will also strengthen the Conclusion section to more clearly distil the principal findings of the study as a whole and highlight the broader insights that may generalize to other condensate-forming systems, ensuring the central take-home messages are clearly communicated to the reader.
  
  Reviewer 3:
  
  (1) In its current form, several technical issues need to be addressed before the main conclusions can be considered robust. Most importantly, the simulated sequence is 172 residues long, while the atomistic slab has box dimensions of only 12 nm in two directions. This length scale is comparable to the expected end-to-end distances of a disordered 172-residue chain. It is therefore not clear whether individual protein chains interact with their own periodic images, which could substantially affect overall chain dynamics and subsequently bias contact lifetimes, residue-residue interaction statistics, and the inferred condensate dynamics. The authors should check, for each chain, histograms of end-to-end distances. For chains for which more than ~2-3% of the end-to-end distances exceed ~11 nm, the authors should explicitly check for self-image interactions (for example, using "gmx mindist -pi") and report whether such interactions occur and for what fraction of the trajectory. Without this control, at least in the Supporting Information, I do not think the simulation-derived contact dynamics are sufficiently trustworthy.
  
  We thank the reviewer for raising this important point. Indeed the box size in x and y dimensions is only marginal, which may influence the dynamics in our simulations and could affect our conclusions. In response, we will perform a control simulation with a larger box, increasing the x and y dimensions to ~16 nm. We will compare the contact dynamics of the resulting trajectory with our original results. This control simulation is initiated from an independently assembled coarse-grained condensate (see our response to Question 6) and therefore also addresses the replica-independence concern raised there.
  
  (2) A second major concern is the treatment of ions. The manuscript makes important conclusions about Na<sup>+</sup> association and Na<sup>+</sup>-mediated bridging, but the atomistic ion model is not explicitly stated. This is a reproducibility problem and also affects interpretation - for example, standard Amber ions are known to bind too strongly to the oppositely charged residues. In their results, one acidic residue appears to interact on average with roughly two Na⁺ ions, which is not obviously expected from charge balance alone. The authors should state the exact Na<sup>+</sup>/Cl<sup>-</sup> parameters used, justify their compatibility with TIP4P-D and the protein force field, and explicitly interpret why such a strong Na<sup>+</sup> association with acidic residues is observed.
  
  We thank the reviewer for raising this important point. We will explicitly state in the Methods section how the Na<sup>+</sup> and Cl<sup>-</sup> ions, including the force field parameters of the ions, were modelled in our setup, and discuss its compatibility with TIP4P-D and the protein force field. In the presented simulations we have used the Joung and Cheatham parameters (Joung et al, J. Phys. Chem. B 2008, 112 (30), 9020–9041) with σ = 0.243934 nm and ε = 0.365846 (kJ mol<sup>-1</sup>) for Na<sup>+</sup> and σ = 0.447766 nm and ε = 0.148913 (kJ mol<sup>-1</sup>) for Cl<sup>-</sup>. While similar setups have been used, these ion parameters have not been optimized for TIP4P-D (originally developed for TIP3P water) and thus a lack of compatibility of the parameters could affect our conclusions.
  
  In response to the Reviewer and also in response to Reviewer 1 (Question 3), we will perform a sensitivity check by running an additional molecular dynamics simulation with scaled ion parameters as suggested by Reviewer 1 ( J. Phys. Chem. Lett. 2019, 10, 23, 7531-7536). In this way we will assess to what extent the degree of Na<sup>+</sup> association with acidic residues is sensitive to the choice of ion parameters and discuss the implications for our conclusions regarding Na⁺-mediated bridging interactions.
  
  (3) More generally, because the manuscript is centered on contact lifetimes, the choice of the atomistic force field needs stronger justification. Salt bridges, cation-pi contacts, pi-pi stacking, ion coordination, and water-mediated interactions are all force-field-sensitive. Since there is no direct experimental observable used here to validate the simulations, the authors should discuss the expected limitations of the chosen force field (while I do acknowledge that testing different force fields would be computationally too demanding).
  
  We thank the reviewer for this fair comment. We will add a short discussion justifying the choice of both TIP4P-D and Amber99sb-star-ILDN-q force field, discussing their performance for disordered proteins. We will explicitly acknowledge that absolute contact lifetime values should be interpreted with caution given the inherent force field sensitivities of salt bridges, cation-π, and π-π interactions, while relative trends and qualitative insights are expected to be more robust. We believe this transparent discussion will strengthen the manuscript and place our findings in the appropriate context for the reader.
  
  (4) I also find the sequence-comparison section somewhat confusing. The authors compare one specific IDR, MUT-16 FFR, with the average properties of human IDRs and then frame it as more representative than FUS LCD. It is not clear how informative this is because IDR behavior depends strongly on sequence-specific patterning, molecular connectivity, and the particular interaction network of each protein. Averages over human IDRs may provide a broad context, but they do not necessarily define what is physically or biologically representative for phase separation. In addition, FUS LCD is not intended to be a representative human IDR; it is an unusually low-complexity, phase-separating domain. Therefore, the "more representative than FUS" framing should be toned down. At most, this analysis shows that MUT-16 FFR is compositionally less extreme than FUS LCD.
  
  We thank the reviewer for this valid criticism. We agree that the framing of MUT-16 FFR as "more representative than FUS LCD" is an overstatement, and we will revise the text accordingly. The comparison against human IDR averages was intended to provide broad compositional context rather than make claims about functional or dynamical representativeness, and we will make this distinction explicit. We will reframe the statement to simply note that MUT-16 FFR is compositionally less extreme than FUS LCD, without implying broader representativeness, which as the reviewer correctly points out cannot be inferred from sequence composition alone given the strong dependence of IDR behavior on sequence-specific patterning and interaction networks.
  
  (5) The ion- and water-bridging analyses are also potentially overinterpreted. A distance-based simultaneous contact with two residues does not by itself establish functional mediation or regulation of condensate dynamics. The authors should either add appropriate controls, such as local-density-normalized baselines or randomized-contact expectations, or soften the language to describe these as geometrically defined co-contact events rather than mechanistic bridging interactions.
  
  We thank the reviewer for this valid point. We agree that distance-based co-contact events do not by themselves establish mechanistic bridging or functional regulation, and we will revise the manuscript language throughout to describe these observations as geometrically defined co-contact events rather than mechanistic bridging interactions. We will also explore appropriate controls such as local-density normalized baselines or randomized-contact expectations. In this respect we will also consider our results in light of a recent paper that showed that salt-bridges are overestimated in atomistic molecular dynamics simulations (Ivanović et al, JACS Au 2026, 6(3), 1900–1913). We will ensure the interpretation is appropriately cautious and does not overstate the mechanistic implications of these findings.
  
  (6) Finally, the independence of the atomistic replicas is unclear. The manuscript should state whether all ten all-atom simulations were initiated from the same coarse-grained condensate configuration or from distinct CG frames. If the starting structures came from one CG trajectory, the authors should report how far apart those frames were in simulation time and provide evidence that the initial atomistic configurations are structurally independent. If only velocities differ, the simulations should not be described as fully independent structural replicas.
  
  We thank the reviewer for this important clarification request. We confirm that all ten atomistic replicas were initiated from the same coarse-grained condensate configuration following backmapping, but were equilibrated independently using different random velocity seeds. Only the last 800 ns of each trajectory was used for analysis, discarding the initial 200 ns as equilibration. We will add these details explicitly to the Methods section and make clearer that these simulations are not fully independent structural replicas. We will report the overlap of residue–residue contact maps between replicas to provide an indication of how the contact statistics have decorrelated, given the shared starting structure.
  
  In response to this question and also question 1, we are initiating an all-atom simulation from an independently formed CG condensate (16 nm x 16 nm x 60 nm). This will provide a valuable check as to the conclusions from our ten initial simulation trajectories.
  
  References
  
  Blazquez S, Conde MM, Abascal JLF, Vega C. J. Chem. Phys. 2022;156(4):044505.
  
  Chakravarti A, Joseph JA. Protein Sci. 2025;34(10):e70284.
  
  Changiarath A, Flores-Solis D, Michels JJ, Herrera Rodriguez R, Hanson SM, Schmid F, Zweckstetter M, Padeken J, Stelzl LS. bioRxiv. 2024. doi:10.1101/2024.03.16.585180.
  
  Dignon GL, Zheng W, Kim YC, Mittal J. ACS Cent. Sci. 2019;5(5):821–830.
  
  Gaurav K, Busetto V, Páez-Moscoso DJ, Changiarath A, Hanson SM, Falk S, Ketting RF, Stelzl LS. Biophys. J. 2025;124:3987–4004.
  
  Ivanović MT, Holla A, Nüesch MF, von Roten V, Schuler B, Best RB. JACS Au. 2026;6(3):1900–1913.
  
  Joung IS, Cheatham TE III. J. Phys. Chem. B. 2008;112(30):9020–9041.
  
  Kirby BJ, Jungwirth P. J. Phys. Chem. Lett. 2019;10(23):7531–7536.
  
  MacAinsh M, Dey S, Zhou HX. eLife. 2024;13:RP100282.
  
  Panagiotopoulos AZ. J. Chem. Phys. 2020;153(1):010903.
  
  Qamar S, et al. Cell. 2018;173:720–734.
  
  Rekhi S, Garcia CG, Barai M, Rizuan A, Schuster BS, Kiick KL, Mittal J. Nat. Chem. 2024;16:1113–1124.
  
  Uebel CJ, Anderson DC, Mandarino LM, Manage KI, Aynaszyan S, et al. PLOS Genet. 2018;14(7):e1007542.
  
  Vernon RM, Chong PA, Tsang B, Kim TH, Bah A, Farber P, Lin H, Forman-Kay JD. eLife. 2018;7:e31486.
  
  Wang J, et al. Cell. 2018;174:688–699.
  
  Zeron IM, Abascal JLF, Vega C. J. Chem. Phys. 2019;151:134504.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.30.715404v2
www.biorxiv.org www.biorxiv.org

Dual-feature selectivity enables bidirectional coding in visual cortical neurons

1
1. Public_Reviews 11 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  This manuscript used deep learning to highlight the role of inhibition in shaping selectivity in primary and higher visual cortex. The findings hint at hitherto unknown axes of structured inhibition operating in cortical networks with a potentially key role in object recognition.
  
  The multi-species approach of testing the model in macaque and mouse is excellent, as it improves the chances that the observed findings are a general property of mammalian visual cortex. However, it would be useful to delineate any notable differences between these species, which are to be expected given their lifestyle.
  
  The overall performance of the model appears to be excellent in V1, with over 80% performance, but it falls substantially in V4. It would be important to consider the implications of this finding; for example, in the context of studying temporal lobe structures that are central to recognizing objects. Would one expect that model performance decreases further here, and what measures could be taken to avoid this? Or is this type of model better restricted to V1 or even LGN?
  
  While the manuscript delineates novel axes of inhibitory interactions, it remains unclear what exactly these axes are and how they arise. What are the steps that need to be taken to make progress along these lines?
  
  Reviewer #2 (Public review):
  
  The classic view of sensory coding states that (excitatory) neurons are active to some preferred stimuli and otherwise silent. In contrast, inhibitory neurons are considered broadly tuned. Due to the gigantic potential image space, it is hard to comprehensively map the tuning of individual neurons. In this tour de force study, Franke et al. combine electrophysiological recordings in macaque (V1, V4) and mouse (V1, LM, LI) visual cortex with large-scale screens based on digital twin models, as well as beautiful systems identification (most/least activating stimuli). Based on these digital twins, they discover dual-feature selectivity (which they validate both in macaques and mice). Dual-feature selectivity involves a bidirectional modulation of firing rates around an elevated baseline. Neurons are excited by specific preferred features and systematically suppressed by distinct, non-preferred features. This tuning was identified by excellently combining advances in AI & high-throughput ephys.
  
  The study is comprehensive and convincing. Overall, this work showcases how in silico experiments can generate concrete hypotheses about neuronal coding that are difficult to discover experimentally, but that can be experimentally validated! I think this work is of substantial interest to the neuroscience community. I'm sure it will motivate many future experimental and computational studies. In particular, it will be of great interest to understand when and how the brain leverages dual-feature selectivity. The discussion of the article is already an interesting starting point for these considerations.
  
  Strengths:
  
  (1) Using computational models to predict neuronal responses allowed them to go through millions of images, which may not be possible in vivo.
  
  (2) The cross-species and cross-area consistency of the results is another major strength. Pointing out that the results may be a fundamental strategy of mammalian cortical processing.
  
  (3) They show that the feature causing peak excitation in one neuron often drives suppression in another. This may be an efficient coding scheme where the population covers the visual manifold. I'd like to understand better why the authors believe that this shows that there are low-dimensional subspaces based on preferred and non-preferred stimulus features (vs. many more, but some axes are stronger).
  
  We thank the reviewers for their constructive and helpful feedback on our manuscript. We are delighted that they found the study to be “comprehensive and convincing” and a “tour de force” in its combination of electrophysiological recordings with large-scale digital twin screening. We appreciate that the reviewers highlighted the strengths of our multi-species approach and the “cross-species and cross-area consistency” of the results, noting that the work showcases how in silico experiments can generate concrete, experimentally validatable hypotheses. Overall, we agree with the assessment of the reviewers. We have performed the following changes to the text to clarify and strengthen the manuscript, without introducing new analyses or altering the conclusions.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Page 3: The authors state that RFs were mapped using sparse noise, with the goal to ensure that the RFs align with the visual stimulus, but no data appear to be shown regarding this alignment. It would be important to provide a full analysis of the sparse noise-mapped RFs for both V1 and V4. Also, is it correct that the V4 data analyzed here came from a single animal? This could potentially be problematic and would need to be addressed, for example, by performing analyses also in V1 for participant animals separately. Please elaborate.
  
  We have added a sentence to the Results section clarifying the sparse noise RF mapping procedure, noting that probe insertions were targeted orthogonal to the cortical surface so that neurons sampled along the probe depth share overlapping receptive fields, allowing a single stimulus configuration to adequately drive the entire recorded population. We have also corrected the text to clarify that V4 data were collected from 2 animals (not 3 as previously stated in an earlier draft), consistent with the Methods section.
  
  (2) Page 4: Only half the neurons in V4 are "high confidence" in terms of test image performance, which seems a little low and probably significantly lower than the corresponding value for V1 of 84%. It is unclear how to interpret this confidence, but it seems to suggest that half of the V4 neurons are not well captured by the model. If true, this fraction appears large enough to cast doubt on the validity of the V4 results. Please elaborate.
  
  We have expanded the text to explicitly discuss the lower proportion of high-confidence in-silico neurons in V4 relative to V1. We attribute this to the greater complexity of V4 tuning compared to V1, as well as missing contextual information such as image surrounds and sequential image context—factors that likely limit model performance in higher visual areas. We note that our restriction of analyses to high-confidence neurons provides resilience against these limitations, and that the goal was not to maximize predictive performance per se but to identify response patterns—dual-feature selectivity—that are robust across neurons, areas, and species.
  
  (3) Page 5: It seems that identical L2 norms are valid for discounting contrast variations, particularly if the neural responses are linear, since the L2 norm is computed on the entire RF. It might be judicious to attenuate the claim that contrast variation has no effect.
  
  We have softened the claim that contrast variation has no effect. The revised text now states that L2 normalization controls for root-mean-squared contrast but does not fully equate effective contrast in nonlinear cells, whose responses depend on the spatial structure of the stimulus beyond its total energy. We note that residual contrast dependent effects, particularly in the suppressive regime, cannot be entirely excluded.
  
  (4) Page 6: The authors acknowledge that, at least for simple cells, a phase shift in the grating and concomitant ON-OFF overlap is an inhibitory axis, which is correct. It does not really become clear what other axes were found, and whether any of these represent a novel discovery about V1.
  
  We have clarified the description of inhibitory axes in V1, noting that while phase-shifted stimuli represent a well-established suppressive axis for simple cells reflecting linear On-Off subfield structure, and complex cells exhibit no coherent suppressive pattern due to phase pooling, neither model class accounts for the multidimensional suppressive structure we observe. We have made explicit that our unbiased approach reveals suppressive structure spanning simultaneous changes across orientation, spatial frequency, phase, and texture, exceeding what any single known suppressive mechanism predicts.
  
  (5) Page 7: Dreamsim is based on human similarity judgements, whereas the data is from macaques. Is there any evidence suggesting that macaque similarity judgements might be similar to those of humans?
  
  We have added a paragraph to the Discussion acknowledging that DreamSim was trained on human perceptual similarity judgments while our neuronal data are from macaques. We note that this cross-species application is supported by the deep homology between primate ventral visual streams, and that natural-image similarity judgments have been found to be highly consistent across macaques and humans. Importantly, we clarify that we deploy DreamSim not as a model of macaque perception but as an image feature embedding to test whether stimuli that cluster in perceptual space evoke similar neuronal responses—a use that is robust to the precise calibration of the metric. We also note that we are developing custom macaque-specific embeddings for future work.
  
  (6) Page 7: How many images were in the test set?
  
  We have added the number of test images to the relevant text (n=75 for V1, n=150 for V4) and to the Figure 1 caption.
  
  (7) Page 8: As mentioned above, performing the analysis on V1 data of individual subjects and demonstrating similar digital twins might be an additional way to confirm the models' accuracy.
  
  We have added text noting that for V4, 1digital twin models were fit independently per neuron without sharing information across animals, and that extreme image sets identified by the model elicited correspondingly extreme responses in neurons from the other animal, confirming that identified selectivity patterns are not idiosyncratic to individual subjects.
  
  (8) Page 11: The mouse data is presented very briefly only, and the authors seem to imply that there is a high degree of coding similarity between this rodent species and macaques and, by extension, humans. Were there any notable differences between the mouse and macaque data?
  
  We have added text explicitly noting that while macaque and mouse visual cortex differ substantially in their functional organization and the complexity of neuronal selectivity, the broader principle—that non-sparse neurons are jointly defined by distinct excitatory and suppressive feature sets—generalizes across mammalian visual systems. We clarify that this does not imply that mouse and macaque visual cortex share similar functional organization or equivalent complexity of neuronal selectivity; rather, within the representational regime of each area, neurons are organized such that excitatory and suppressive feature sets are jointly structured and distinct.
  
  (9) Page 13: One main finding of the study is that inhibition appears to operate along additional dimensions that had not been previously recognized, but what is the nature of these dimensions, how do they arise and relate to known inhibitory effects in V1 such as centre-surround effects? The fact that suppression is tuned in response to natural images or other complex objects is not a new finding, and there is plenty of published work along these lines; the authors may want to cite Tamura et al 10.1152/jn.01267.2003. I am not sure introducing the term "dual feature selectivity" is really a major conceptual advance.
  
  We have added a citation to Tamura et al. (2004) in the Discussion, alongside other prior work documenting suppression by non-optimal stimuli. We have also expanded the Discussion to more carefully position our findings relative to existing work on feature-selective suppression, noting that while prior work has established that inhibition can be structured and feature-selective, our results suggest a broader organizing principle: within each visual area, there exists a set of feature combinations from which individual neurons draw both their excitatory and suppressive preferences.
  
  (10) Page 14: The authors enumerate a number of technical limitations, which is to be commended. It would be useful for them to comment on the particular advantages of the digital twin model, compared to a more traditional analysis of the responses to the thousands of natural images that were experimentally obtained. It seems likely that the main finding, i.e. tuned inhibition, is also evident directly in this population (?). While the digital twin is to some degree validated by the test images, its responses to the much larger set of images studied are not validated, and one must trust that the ResNet50 indeed captures V4 selectivity. It would be useful to discuss some of these points, and highlight a potential way that digital twins (maybe as a shared model between laboratories) can learn from a large number of animals and datasets, and maybe even be used to generate novel visual stimuli suitable to test emergent hypotheses.
  
  We have added a paragraph to the Discussion explicitly contrasting the advantages of digital twin models with direct analysis of experimentally recorded responses, noting that digital twins enable screening of more than one million images per neuron in silico, gradient-based synthesis of stimuli precisely optimized to drive or suppress individual neurons, and cross-model verification of identified selectivity patterns—a test that has no analog when working with fixed experimental image sets.
  
  Reviewer #2 (Recommendations for the authors):
  
  Minor comments:
  
  (1) Call out Figure 1/b in the main text.
  
  We have added a callout to Figure 1b in the main text
  
  (2) Can you make a supplementary figure illustrating more examples with skewness around the middle (e.g. 1.5, 2, 2.5)? Namely, you state that 2 is a good threshold for deciding if it is non-sparse, but you only present clear-cut cases in Figure 2 (with <0.75 and >3.5). I am wondering if 2 is a good threshold?
  
  We have revised the text to clarify that the skewness threshold of 2.0 is adopted purely for analytical convenience to focus subsequent analyses on neurons with sufficiently graded response distributions, and that the key findings are not dependent on the exact threshold chosen. We explicitly note that the underlying distribution of sparsity is continuous, consistent with recent findings (Gondur et al., 2025).
  
  (3) The reference "A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex." Has no authors. I know it's anonymous, but maybe put that for now? I also congratulate including a paper that is anonymously under review at ICLR 2026. I don't find Unk, 2025 in the list of references. Perhaps related?
  
  We have updated the reference “A tale of two tails” to include the authors (Gondur et al., 2025) and ensured it appears consistently in the reference list. We have also resolved the missing “Unk, 2025” citation, which now correctly refers to this same work.
  
  (4) Why do you use a different model for the analysis in Figure 8?
  
  We have added text to the Methods and Results clarifying why a distinct architecture was used for the V4 evaluator model in Figure 8. Specifically, the V4 generator model uses a fixed, pretrained ResNet50 backbone whose weights are deterministic; any re-trained model sharing this backbone would not constitute a genuinely independent evaluation. By contrast, for V1, the ConvNeXt core is fine-tuned from different random initializations, producing architecturally equivalent but computationally independent models. A truly independent V4 evaluator therefore required a fundamentally different architecture.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.16.665209v4
www.biorxiv.org www.biorxiv.org

Neural activity profiles reveal overlapping, intermingled subpopulations spanning area borders in mouse sensorimotor cortex

1
1. Public_Reviews 11 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  In preparation for release of the analysis code used in the paper, we made many analyses more parallel to one another in their exact preprocessing. This resulted in very slight changes to many panels, but these changes are nearly invisible and conclusions did not change. In one case, though, we realized that the way we were presenting data was potentially misleading (the timing plot in Figure 3A). The original plot was of the distribution of pixel values from the spatially smoothed map instead of distributions over individual neurons. We have now swapped it out for better interpretability and changed the accompanying text accordingly.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Here, the authors address the organization of reach-related activity in layer 2/3 across a broad swath of anterodorsal neocortex that included large subregions of M1, M2, and S1. In mice performing a novel variant water-reaching task, the authors measured activity using two-photon fluorescence imaging of a GECI expressed in excitatory projection neurons. The authors found a substantial diversity of response patterns using a number of metrics they developed for characterizing the PETHs of neurons across reach conditions (target locations). By mapping single-neuron properties across the cortex, the authors found substantial spatial variation, only some of which aligned with traditional boundaries between cortical regions. Using Gaussian mixture models, the authors found evidence of distinct response types in each region, with several types prominent in multiple cortical regions. Aggregating across regions, four primary subpopulations were apparent, each distinct in its average response properties. Strikingly, each subpopulation was observed in multiple regions, but subpopulation members from different regions exhibited largely similar response properties.
  
  Strengths:
  
  The work addresses a fundamental question in the field that has not previously been addressed at cellular resolution across such a broad cortical extent. I see this as truly foundational work that will support future investigation of how the rodent brain drives and controls reaching.
  
  The quantification is thoughtful and rigorous. It is great that the authors provide an explanation for and intuition behind their response metrics, rather than burying everything in the Methods.
  
  The Discussion and general contextualization of the results are thorough, thoughtful, and strong. It is great that the authors avoid the common over-interpretation of classical observations regarding cortical organization that are endemic in the field.
  
  All things considered, this is the best paper regarding spatial structure in the motor system I have ever read. The breadth of cellular resolution activity measurement, the rigor of the quantification, and the clear and open-minded interrogation of the data collectively have produced a very special piece of work.
  
  Thank you! We really, really appreciate this!
  
  Weaknesses:
  
  The behavioral task is very impressive and an important contribution to the field in its own right. However, given that it appears substantially different from the one used in the previous paper, the characterization of the behavior provided in the Results is too brief. More illustration of the behavior would be helpful. For example, it is rather deep into the paper when the authors reveal that the mice can whisk to help localize the target location. That should be expressed at the outset when the behavior is first described. Other suggestions for elaborating the behavior description are included below.
  
  Thank you. Although the task will be treated in greater detail in the next paper (where we more closely relate neural activity to the kinematics), we have added more exposition of the task here. In particular, we now include a figure with a characterization of the trial-to-trial variability across reaches to the same target versus across reaches to different targets (Figure 2-figure supplement 1B). This supports the idea that the mice aimed their reaches. We have also expanded that text.
  
  Regarding whisking, we have now revised that text to make clear that we do not know how the mice localize the spout. The original work by Galinanes and Huber argued that they find the spout by sniffing the water; they may do the same here, or may find it via whisking. It is also possible that the whisking they do is simply because the spout moves in and they are excited, or startled, or do it by reflex. We simply have no evidence one way or another. We have therefore revised the text to make it clearer that whisking-related activation could have occurred for a variety of reasons.
  
  Statistical support for key claims is lacking. For example, "The five areas of interest varied in the fraction of neurons that were modulated: M2 had 14%, M1 had 23%, S1-fl had 30%, S1-hl had 25%, and S1-tr had 27%" - I cannot locate the statistical tests showing that these values are actually different. Another example is Figure 7, where a key observation is that distributions of PETH features are distinct across regions. It is clear that at least some distributions are not overlapping, but a clearer statistical basis for this key claim should be provided.
  
  Good idea. For the proportions, we have now added first a Chi-square test for homogeneity to show that there is variation in the proportions, then shown the results of pairwise two-proportion Z tests (Bonferroni-corrected for multiple comparisons) as a binary matrix in Figure 3-figure supplement 1B. For the area distributions in the t-SNE space (Figure 7), we have added a 2-dimensional Kolmogorov-Smirnov test, again corrected for multiple comparisons, with p-values quoted in the text.
  
  I understand that the authors are planning a follow-up study that addresses the relation between activity patterns and kinematics. One question about interpreting the results here though, is how much the activity variation across target locations may relate to the kinematic differences across these different conditions, as opposed to true higher-order movement features like reach direction.
  
  We agree this is a very important question. However, having done many of the analyses to examine the question for the next paper in the series, we do not know of a shortcut to the right answer. This question requires thorough treatment, and so we leave it to be covered in subsequent work. Instead, after our speculation about how responses suggest function, we are now explicit that these hypotheses needs testing:
  
  “In each of these cases, determining the relationships of the observed activity patterns to function will require specific attempts to link the activity to kinematics, target location, sensory feedback, and more; these relationships will be addressed in future work.”
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The functional parcellation of cortical areas is a critical question in neuroscience. This is particularly true in frontal areas in mice. While sensory areas are relatively well characterized by their tuning to sensory stimuli, the situation is much less clear for motor areas. This has become even more ambiguous since recent studies using large-scale neuronal recordings consistently report mixed sensory and motor-related activity throughout the brain, and motor mapping studies have shown that movements evoked by cortical stimulation are by no means limited to motor areas alone. Here, the authors use a correlation approach combining large-scale functional imaging at cellular resolution with movement-tracking in mice executing a reaching task. Across multiple recording sessions in the same animals, the authors have imaged a large portion of the sensorimotor cortex at cellular resolution in mice performing a reaching task, recording the activity of nearly 40,000 neurons. By aligning the calcium signal of each neuron to three task events-the Go cue triggering the reach, the onset of paw lift, and the contact between the paw and the target-for different target positions, the authors identified different response patterns distributed differently across cortical areas. They defined a set of features that describe the neurons' response pattern, representing the temporal dynamics and tuning properties for the different target positions. These features were used to construct cortical maps, and the authors show that, interestingly, gradient maps obtained from the first derivative of the feature maps reveal sharp discontinuities at the boundaries between anatomically defined cortical areas. Using dimensionality reduction of the neuronal response features, the authors found that, despite clear differences in their average response properties, individual neurons from the same cortical areas do not form distinct clusters in the reduced-dimensional space. In fact, most areas contain heterogeneous neuronal populations, and most neuronal populations are present in multiple areas, albeit in different proportions. Interestingly, the authors identified four neuronal subpopulations based on the distance between the components of the Gaussian mixture model used to model the distribution of neurons within each area. One of these subpopulations is almost exclusively represented in the anterior M2 cortex, while another is broadly distributed across the different areas.
  
  Strengths:
  
  This article is based on an impressive dataset of nearly 40,000 neurons covering a large portion of the sensorimotor cortex and on innovative analytical approaches. This study is likely the first to clearly demonstrate boundaries between cortical areas defined based on the responses of individual neurons. This innovative approach to functional mapping of cortical areas potentially opens up new perspectives for higher-resolution mapping of frontal cortical areas, using a broader repertoire of sensory and motor evoked responses.
  
  Thank you!
  
  Weaknesses:
  
  The second part of the article, which presents multimodal responses in the cortical areas, seems to be a perhaps overly complicated way of showing what has already been demonstrated in numerous recent publications, but these new analyses expand upon these previous observations by revealing an interesting functional organization of the sensorimotor cortex, highlighting interesting similarities and differences between certain areas.
  
  We understand the concern: a number of recent papers have also noted different neuron response characteristics distributed throughout the motor system. We compare and contrast in greater detail following the more specific comments on this below, but we briefly summarize here. The way previous work handled the data – for example, starting with PCA – mixes what neurons are tuned for and when they are tuned for it with what we refer to as the “response format”: properties like tuning sharpness, response duration, etc. We focused primarily on this response format, and designed our features to be mostly independent of tuning preferences or peak response timing. We therefore pick up on different properties of neurons’ responses than those prior works. In addition, no previous work we know of examined these properties across large swathes of cortex at single-cell resolution in the context of forelimb control. Together, these aspects of our work allowed us to produce high-resolution mapping of response properties in a way we have not seen in any prior work.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  In addition to addressing the weaknesses stated above, I suggest the authors also consider the following.
  
  The one big question left unresolved here is whether we should be thinking about these four subpopulations as distinct types with a biological basis and importance, or just reflections of activity pattern heterogeneity. The authors say that "we did not observe tight clusters in feature space separated by gaps," but their discussion here is light and a bit unclear, and their engagement with the issue of types versus heterogeneity, in my view, could be improved. We do not need "gaps" where the density goes to zero in parameter space, but we do need reproducible troughs between peaks. The authors should clarify if there are substantial and reproducible troughs in the parameter space between their four subpopulations.
  
  This is a great idea, and we have added three analyses and additional text to address it. We break this concern down into two more specific questions, based on the next comment by this reviewer.
  
  (1) Are the clusters well separated / do they have troughs between them? (Note that even with troughs, clustering might not be stable if the clustering algorithm is poorly matched to the shapes of the clusters.)
  
  (2) Is the clustering stable? (It can be stable even without troughs, if, for example, the distribution has a long tail and a GMM needs one Gaussian for the body of the distribution and a second for the tail.)
  
  First, to directly address the presence or absence of troughs between clusters, we have added Figure 9-figure supplement 2A and 2B. For each pair of subpopulations, we trained a logistic regression classifier to separate the 5D feature vectors of the neurons in one subpopulation from the feature vectors of the neurons in the other subpopulation, then projected the feature vectors onto this axis. Note that because the subpopulations are defined by GMMs, which have nonlinear boundaries, the (linear) logistic classifier does not typically produce perfect classification. Nevertheless, this analysis provides a window onto how well separated each cluster is from each other cluster in feature space. In 5 of the 6 pairwise comparisons, it is obvious that the distributions are different and have at least some dip in the distribution density at the boundary. The one pair of clusters without a trough between them were the forelimb somatomotor and hindlimb somatomotor subpopulations. This was surprising to us, given that their likelihood maps are so strongly distinct, but this presumably reflects trying to capture a nonlinear classifier boundary with a linear one (see below). Overall, this analysis argues that the clusters do have fuzzy edges that blend into one another, but reflect concentration of mass near the centers of the clusters we identified.
  
  Second, to address the same question with a different nonlinear method, we have added a version of the t-SNE plot from Figure 7 that is instead colored and contoured by subpopulation identity instead of area (Figure 9-figure supplement 2B). Agreement with the GMMs is not a given here either, because t-SNE is a fundamentally different and independent nonlinear transform from that performed by the GMM classification. Nevertheless, the subpopulations were again nicely separated – though not with troughs, possibly thanks to the inherent difficulty of interpreting point density with t-SNE. Interestingly, here the hindlimb somatomotor subpopulation was the best separated from the other subpopulations, supporting the idea that the lack of separation we observed above with the logistic projections was indeed due to a nonlinear boundary. This analysis again argues that neurons are more likely to have features that lie near the center of a cluster, but that the edges of the clusters run into one another. Additionally, this analysis makes clear that treating the hindlimb somatomotor subpopulation as a second cluster can be supported by other analyses, even if not by the logistic regression projection.
  
  Third, to address the question of cluster stability, we have performed random splits of our data, GMM clustered the two halves independently, applied the GMM from one half to the other, and asked how similar the clusterings are using the Adjusted Rand Index. This produced a value of 0.856, which for this sensitive measure argues that the clustering is rather stable (at least for the three clusters that can be found with all data together, which does not include the smaller-in-size Anterior subpopulation). Note that we did not perform this analysis on the more complicated version where we fit a GMM to each area separately then cluster those; in our main analysis, the hierarchical clustering agreed with what we found by eye, but determining the number of clusters for hierarchical clustering is in general very unstable and so we did not have an objective way to determine the “true” number of clusters.
  
  In addition to these new analyses, we note that three analyses we had already included bore strongly on this issue. Regarding separability of the clusters, the fact that our likelihood maps (Figure 9C-F) were quite distinct for different subpopulations argues that we picked up on ‘real’ differences. Second, Figure 9B found that when clustering non-overlapping data – different cells from different areas – we obtained clusters that were nearly identical in their feature distributions. Third, Figure 10E used the clusterings from different areas’ data to create likelihood maps, and found that they were extremely similar. These analyses together argue strongly that we are finding ‘types’ in a meaningful sense; given that we know the areas do have different distributions of properties, if there weren’t types then clustering would yield different clusters for different areas. Given the importance of the question, however, we are grateful that the reviewer encouraged us to find additional ways to make this point!
  
  The original t-SNE plot is beautiful and quasi-fractalic, but it does not show clear signs of four cell types. The single-neuron activity profiles are clearly heterogeneous in very interesting ways, but heterogeneous does not imply a strong or reproducible multimodality that would indicate meaningful cell types. Clustering algorithms will always spit out an answer. If you just have elements uniformly distributed across a parameter space, plus some noise, when you ask for X clusters, you will get X clusters that have different centroids. When you ask an algorithm to cluster without defining the number of clusters, noise can lead the algorithm to produce a particular number of clusters that again will have distinct centroids. The salient question, though, is whether in the present case there is a parameter space in which the clusters are substantially and or reproducibly distinct. Distinct here would mean that peaks in the density across some parameter space are separated by troughs - again, we don't need true gaps. The more substantial the differences between clusters are (again, not the differences between centroids but the prominence of the density troughs between them), the more biologically meaningful the clustering is likely to be. Reproducibility here could be addressed with resampling methods (e.g., how often do two separate halves of the cells produce the same clusters?).
  
  Please see the reply above, which includes our addressing of this concern.
  
  The Introduction is generally good, but it could further develop existing ideas about how function is distributed across cell types and regions. We would like to be able to imagine different answers to the question of how activity patterns are organized that might have divergent implications for how the circuit works. I understand we have very little to go on in terms of data, but I think it would be helpful for readers to be given more of a sense of what *could* be important.
  
  Good idea. We have added such a paragraph to the Introduction:
  
  “To frame possible outcomes, consider that single neuron responses can vary along many dimensions. Cells could differ according to which movements or time periods they are recruited for (tuning), what movement parameters their activities reflect (encoding), or how their responses are structured across different movements (e.g., nonlinear encoding structure). Further, differences in these response properties across cells could be distributed over the cortical sheet in a variety of ways. Cells could form distinct “categories” or clusters that are spatially well-aligned to the boundaries of anatomically defined regions. Or, categories of neurons might span area boundaries in spatial footprints that do not relate obviously to area boundaries, and that either abut or overlap. At a fine-grained scale, cells with similar responses could be physically located near one another as in primate and feline visual cortex, or similarly-responsive neurons might be salt-and-pepper intermingled as seen in rodent visual cortex or in primate motor cortices during reaching behaviors.”
  
  It should be clarified in the Results how the cue relates to the target location. Most would assume a different cue for each location, but this does not appear to be the case. The authors should clarify whether there was some amount of searching for the precise target location after the reach, or else how the block structure or other sensory information allowed mice to learn where exactly the target would be. In the absence of target-specific cues, some sense of how the mice achieved target-specific reach trajectories should be offered.
  
  Related to this, in Figure 1, it would be good to see some individual trajectories, as they all overlap near the target in the current plot. Clearly, the reaches were targeted, but it is unclear how targeted. Some of the adjustments at the end may reflect searching or palpation to resolve the precise spout location. It is very much ok if the mice were not reaching with micron precision each time to each of 15 different targets, but it would be good to provide the reader a better sense of what the mice were doing.
  
  These are important points. First, to clarify, the Cue is just a Go cue, and was the same for all targets. It is now described in the Results as “non-target-specific”. For additional explanation about supplemental analyses to assess “aiming”, see replies to Reviewer #1 Public Review comments above. Finally, regarding how the mice locate the target: we just don’t know. As discussed above, Galiñanes and Huber found evidence for the mice using stereo sniffing, but whisking, listening to the motors, or some other strategy are also conceivable. We simply don’t have data to weigh in on this. We now make this limitation clear where we describe the task.
  
  In Figure 1A, CFA does not look well aligned with Tennant et al. (2011). CFA should only extend to +1 AP. The overlap of CFA and RFO seems strange. RFO also does not totally align with the injection coordinates used in An et al, biorxiv 2022.
  
  Thank you for your attention to these points. Our designation of the name CFA to the red dashed outline in Figure 1A was consistent with an earlier version of our previous work (Grier et al 2026) wherein we referred to the anatomical outline “MOp-ul” from Munoz-Casteneda et al 2021 as CFA. We have since revised that nomenclature to now refer to the outline as M1-fl, or the forelimb representation of primary motor cortex.
  
  Our placement of RFO was obtained by aligning the Allen CCF from Figure 1K of An et al 2022 to our version of the Allen CCF and outlining the hotspot of RFO with a circle. We have slightly adjusted the location of RFO posterior and medial to more closely align with the injection coordinates reported in the methods of An et al. 2022 of “1.5-1.88 mm anterior from Bregma, 2.25-2.63 mm lateral from the midline.” Because (as far as we understand) the injection coordinates and the map are not perfectly in register, we show a compromise between the two.
  
  We stress that the Figure 1A map is meant to be descriptive in its illustration of the variety of organizational zones that have been identified across mouse sensorimotor cortex.
  
  Discrepancies in the alignment procedure, animal strain, and mapping modality all introduce heterogeneity across mapping attempts that we do not aim to reconcile or resolve here.
  
  Related to this, aspects of the results do seem consistent with the distinction between RFA and CFA, but this is not acknowledged or discussed. For example, the barriers in Figure 6H that lie along the M1/M2 border - these seem consistent with the gap between RFA and CFA. The same could be said for the dim trough along the M1/M2 boundary that appears to separate RFA and CFA in Figure 3B. A slightly more rostral and lateral location of CFA compared to Tennant's definition or the regions backlabeled from cervical spinal injections (see Wang, Maunze et al. J Nsci, 2018) could be expected if flattening the brain under the coverslip for imaging effectively stretches the ML axis, and Bregma (notoriously hard to define reliably at this spatial scale) was defined a bit more caudally here than in other studies. Related to this, it would be better for the field if people described their method for defining Bregma in the Methods. I suggest the authors do this here.
  
  We appreciate the suggestion and have acknowledged the suggested correspondence in the discussion. Given the difference in our approach from those that originally characterized RFA (through ICMS and deep layer projection tracing) we have avoided making overly strong conclusions about this correspondence in our data. See the quoted text below.
  
  “The spatial distribution of modulated cells in Figure 3 suggests a distinction between the caudal forelimb area (CFA, involving M1 and S1-fl) and the rostral forelimb area (RFA) in M2, while the feature gradient boundaries suggest a distinction between M1 and M2 more generally. The absence of a clearly delineated RFA was surprising, given its distinct projection patterns (Carmona et al. 2024; Hira et al. 2013b; Wang et al 2018) and functional differences from CFA (Kristl et al. 2025; Morandell and Huber 2017; Saiki-Ishikawa et al. 2025), but our results might suggest that the activity in layer 2/3 of RFA does not differ markedly from other nearby subregions of M2.”
  
  Regarding bregma, we did not use it for atlas alignment here. Alignment was accomplished through a combination of paw vibration mapping and the location of the central sinus. Bregma’s location was only relevant for our injection of tdTomato labeling, and that labeling was used here only to stabilize the image plane. We include an estimate of it on the map solely in an attempt to be helpful, but we cannot claim we have the most reliable method for defining it.
  
  The authors focus on activity aligned to cue timing. This is sensible, but it could be meaningful to know how this choice affects the definition of organization. If response clustering is largely different across time, it would seem important. I understand that addressing this question may be beyond the scope of this paper. I just wanted to raise the issue with the authors for their future consideration.
  
  We agree that this is important to address directly. There are two aspects to this comment: (1) does it matter if activity from approximately the same time period is aligned to the paw lift or contact instead of the cue? (2) What changes if we use data from a different period of time?
  
  Regarding the first question (alignment), if we switch to aligning our data based on lift or contact, we have more statistically modulated neurons (see Figure 3C), but everything else is qualitatively similar with one exception: the GMM optimization doesn’t separate out the Anterior subpopulation from the Forelimb Motor subpopulation. The Anterior subpopulation only has a relatively small number of members, and they mostly exhibit the strongest peaks in their PETHs when Cue-aligned, so this makes sense. We now show the modulation maps for all of the locking events (Figure 3-figure supplement 1).
  
  The issue of the time window is a little more complicated. There are many choices we made in this work, of course, not least of which are the task we used and the features we chose based on hand-inspection of thousands of PETHs. As we noted in the Discussion, different tasks or different features would likely distinguish more subpopulations from one another. We think of the time window as a feature choice, albeit an implicit one. We chose not to include later time points because this begins to strongly include reward signals, which are known to be large (Levy et al 2020) and can dominate other aspects of the responses. The largest differences we noted when trying time windows that extended later are that mouth-related areas are separated out in the subpopulation analyses, perhaps because of later licking/consummatory responses, but we have not explored fully enough to speak confidently on this point without much more work and another 10 figures. To keep the scope of the paper manageable, we now call out this choice explicitly (see text below). We thank the reviewer for raising these important points.
  
  “Crafting additional PETH features, or using end-to-end neural network approaches to discover other features, might enable the discovery of additional structure (Minderer et al. 2019; Wang et al. 2023b). For example, our PETH features were chosen to be invariant to the onset time of activity, but these onset times were markedly later in lateral M1 than in adjacent M2 or S1-fl. Including onset times, using a wider window of time that includes more of the reward/licking period, aligning data to other behavioral events, or adding other PETH features would presumably result in finer subdivisions of sensorimotor cortex.”
  
  The map in Figure 4 is very cool, and the spatial structure is quite striking. In terms of the actual values of the onset times in each region, I am a little concerned with a dependence on the level of reach-related activity modulation, especially relative to the level of background activity (potentially related to posture). Less reach-related activity and more background activity, which we might expect for trunk and hindlimb regions, could seemingly skew the onset times earlier. We could be getting the right answer, or an answer that makes intuitive sense, for the wrong reason. Can this potential confound be excluded with some sort of control analysis?
  
  The previous text wasn’t clear. We have now clarified what we meant, very much in line with the reviewer’s thoughts. In addition, note that our change to what is displayed in the histogram (now neurons, previously pixel values) makes clearer that there is a multi-peaked distribution of onset times and it is mostly the prevalence of each peak in each area that varies. The text now reads:
  
  “These distributions over neurons revealed clear differences in the overall profile of activation: early onsets were more prevalent in S1 trunk and hindlimb regions, perhaps due to activity related to the animal stabilizing itself even if the neurons became more active later; then M2, and finally S1-fl and M1. Nevertheless, each area contained neurons activated at any given time in the trial.”
  
  The "Peak time variation" metric could potentially vary with activity level, with lower, noisier activity levels making cells appear less persistent. Perhaps a control analysis, based on SNR or some reasonable assumptions of the linkage between calcium signals and spiking, could be performed to measure the extent to which this could be creating differences between regions.
  
  Good idea. We have now performed this analysis, and the reviewer was correct: the correlation between peak time variation and a simple metric of SNR (assessed as range of PETH / max s.e.m.) was substantial: ⍴=-0.53. We now report this correlation and describe in the Results that this metric is driven by both true peak time variation and trial-by-trial variation. Thank you for this!
  
  “Peak time variation. To quantify whether a neuron’s firing peaked at the same time for every target or varied by target, we found the peak firing rate of the response to each target, then computed the standard deviation of these peak times across targets. This value is therefore higher if the peak time varied and nearly zero if the timing was consistent. Notably, this measure correlated substantially with overall signal-to-noise ratio of a neuron’s PETH (Spearman’s ⍴=-0.53; Methods), and thus partly measures trial-to-trial variability, not just true peak timing variability. This metric was quite low in M1, indicating highly consistent timing of the activity peak (and reliable responses), and was highest in the posteromedial part of M2 (presumably corresponding to the hindlimb representation) and the posterior tip of S1-hl (Figure 5B).”
  
  One could argue that the likelihood calculations illustrated in Figure 8 are biased higher for neurons within each region since they were used for defining the likelihood for that region. I think these likelihood calculations should be done for separate neurons other than the ones used to compute the mixture model for each region.
  
  We agree with the point about bias: the by-area GMM in Figure 8 is biased toward cells within the area, though the effect is probably quite mild given the large numbers of neurons and modest number of parameters. However, this model was intended to make the point that even if you give an area an unfair advantage, you still can’t cleanly isolate it. This was intended to help motivate the following analysis of subpopulations, and we have now made this logic clearer. Doing it this way has the advantage that the GMM components are identical between Figures 8 and 9, while if we held out the test neurons it would not be possible to make them the same without some complicated version of bagging on the GMM components. The reviewer is right that we should make this bias explicit, though, and we have now done so:
  
  “This mapping approach is explicitly biased toward finding feature differences between areas, allowing for a direct test of the hypothesis that response profile distributions are area-specific.”
  
  To me, the last Results section (Spatial overlaps between subpopulations indicate intermingled members) does two things: it shows you get the same results when you map each cell to a subpopulation independently of its area, and it shows that defining the subpopulations with cells from each area gives you essentially the same results, arguing against spatial variation of properties within subpopulations. I worry that these two points are getting merged together or not made clearly enough here, especially the first one. In general, the logic of this section does not seem well conveyed.
  
  Thank you for the feedback. In particular, your first point is made by Figure 9-figure supplement 4 when we fit an area-agnostic GMM to all modulated cells in the five main areas. However, your second point is one of the two main goals of the last Results section, along with the demonstration of the spatial distributions of cells after hard-clustering them by subpopulations. We have tried to clarify these main points further through substantial edits of the results section for Figure 10.
  
  One set of ideas that is highly relevant and should be raised concerns an ethological organization of the motor cortex. Since the observations of Graziano, there has been a steady stream of results describing ethological organization in rodents as well. This literature is briefly reviewed in Kristl et al., Nature Communications, 2025. For example, because of the potential for a differential involvement of grasping movements across different target locations, some of the variation in neuronal tuning described in the present manuscript may stem from a region preferentially involved in grasping.
  
  We agree that the Graziano literature, and the substantial literature in rodent that was inspired by Graziano’s work, is highly relevant to understanding the organization of motor areas. Kristl 2025 handles these issues very thoughtfully. The challenge here is that there are many possible different reconciliations of the stimulation results with ours, and some seriously unresolved challenges in doing so. To name a few:
  
  Our subpopulations and high-gradient boundaries both give quite different pictures than microstimulation does in rodent motor and sensory cortices. In particular, microstim produces more subregions that evoke different movements than we identify, and the borders don’t generally line up. This implies that the mapping between the two approaches is probably complicated.
  
  There is a completely alternative possibility to explaining the Graziano-like results: microstimulation is thought to preferentially hit axons, and some of these projections reach the medullary motor regions. Given that the medullary motor regions have known topography in the movements they evoke (Yang et al 2023) – but may or may not be driving the movements during flexible behavior – the two approaches may not be reconcilable. Or, it may require a much deeper understanding of medulla as driving the primary movement and cortex acting as a residual controller. This is an exciting set of ideas, but as yet very underdeveloped in our understanding.
  
  We don’t know if the subpopulation structure exists at all in L5, or in the PT cells, and if it does whether it differs. This is crucial given the frequent targeting of deep layers by ICMS stimulation protocols.
  
  As we caution in the Discussion, it is possible that our subpopulation findings are at least partly specific to the task we used.
  
  Although it is beyond the scope of this paper and will be addressed thoroughly in separate work, we have spent significant time with encoding models for joint angles and high-level target encoding in these same data. Given those results, we are fairly confident that the reviewer’s reasonable guess, of tuning variation due to intersections between body parts, does not seem to be the main driver of the subpopulation structure we find.
  
  After careful thought and discussion amongst the authors, we did not think that including this discussion in the paper was likely to improve interpretability of the present results for most readers. We very much agree with the point, though, and when we can narrow down the possible explanations in the future (likely in our next paper on this topic, which will address encoding) we plan to address it. We thank the reviewer for encouraging us to think through this.
  
  Minor:
  
  (1) Page 3: "densely shared" - perhaps "broadly shared"? Dense implies most/all the neurons get the same signals, which may not be true.
  
  Changed to “widely”.
  
  (2) Page 4: "data-driven approaches" - could be more specific - isn't everything we do data-driven?
  
  Changed to “bottom-up”.
  
  (3) Page 4: "spanned areas" - perhaps "spanned multiple cortical areas", since everything spans an area.
  
  Changed to “spanned multiple areas” (we mention cortex just a few words earlier).
  
  (4) Page 5: "intervals were generally fast" - awkward, "short" perhaps.
  
  Agreed, changed.
  
  (5) Page 5: "which asks whether the activity for a neuron changes over time consistently in relation to any target" - Rephrase to disambiguate between consistent temporal variation in firing for all targets and variation across targets in the firing patterns. In other words, are we talking about cells that are just modulated during reaching, or cells whose firing patterns differ across targets?
  
  Changed ending to “to any given target”. The ZETA measure really does simply ask whether there is a change in firing rate over time that is consistent across trials, for each target independently. A neuron that exhibits an identical bump for all targets would register as modulated. We chose this measure in part because of the number of temporally-modulated but untuned cells. This wasn’t very clear as we had written the text, so we now note this explicitly in the Methods. Thank you for pointing out that this wasn’t clear.
  
  “For all analyses, only neurons modulated by the relevant locking event were included. Note that this measure looks for modulation over time to any target; it is indifferent to whether the neuron exhibits tuning across targets.”
  
  (6) Figure 1: It seems like some of the abbreviations used in 1A have not been defined yet in the paper.
  
  Yes. It’s a long list, and we wanted to put the citations for the description of each area together with the definition of the acronym. Moreover, we wanted all this info together with the description of how we aligned these area descriptions from others’ work with one another on the Allen atlas. This was impractical in the caption, and would be a long digression for what is intended as a simple point in the Results, which is why we refer to the Methods here.
  
  (7) Page 8: "Given that these areas have known spatial organization within them and structure was apparent by eye in the spatial scatterplot of modulated neurons (Fig. 3A)," - it is not clear what spatial structure we are supposed to see in 3A.
  
  Good point. We have changed the parenthetical to: “(for example, the less modulated band along the M1/M2 border in Fig. 3A)”
  
  (8) Page 8-10: The region-wide onset analysis breaks up the flow from PETHs to the metrics used to quantify them. I suggest moving this section (Onset of neural activity varied with somatotopy and subregion) to later in the manuscript.
  
  We appreciate the reviewer’s input on organization. We went back and forth many times in how to organize the many results in this paper. The reviewer is right that this analysis breaks the flow, but the reason we included it where we did was threefold. First, it uses an easily-understood metric to introduce the reader to how we made maps from single-neuron features. Second, it easily introduces the power of making such maps. Finally, it makes clear that if we are not careful with how we handle time in the feature design, timing will dominate.
  
  All these things said, this has helped inspire us to add a result in which we re-examine timing broken down by subpopulation (Figure 9-figure supplement 2C). It shows that subpopulations timing distributions appear more distinct than distributions for areas, but there is still substantial heterogeneity in timing that is explained by location in cortex and not subpopulation membership alone.
  
  (9) Page 12, Target tuning linearity: This metric should be clarified in the Results. It is not clear how the 2D of targets is turned into 1D. Also, the plot in the figure has correlation on the y-axis, and it is not clear how each target location gets its own correlation value. The phrase "optimized anchor target" is unclear.
  
  Agreed this needed to be clearer. The text in the Results now reads: “To quantify how linearly a neuron’s activity related to target location in physical space, we correlated the 15D vector of mean activity of the neuron for each target with the 15D vector of the targets’ ordinal distances from the neuron’s preferred target (Methods).” In agreement with your suggestion, we have dropped use of the phrase “anchor target” in favor of “preferred target”, which should be clearer. We have also revised the Methods text accordingly to clarify.
  
  To directly answer your question, we turn the targets from 3D positions into 1D by computing the ordinal distance of each target from a preferred target. (Note that the preferred target is actually the one that maximizes the resulting correlation; this is detailed in the Methods). There therefore aren’t 15 correlations; we’re correlating two 15D vectors, where each has one element per target and the “ordinal distance” vector has a zero for the preferred target. Hopefully the new description makes this clearer.
  
  The figure schematic was unclear, thank you for catching that. We have updated the Y axis to read “mean activity” and the X axis now reads “dist. to pref. target.”
  
  (10) Page 12, paragraph beginning "We also compared our metric maps simply using the top 20 PCs." - This paragraph is unclear, since both sentences refer to using the metrics. I would guess the authors mean that the metric maps were compared with and without PCA and basis rotation, but this is not clearly stated.
  
  Thank you, this was unclear as written. We have changed it to:
  
  “We also compared our metric maps with maps generated from the top 20 PCs of the PETHs (Methods), rotated using VARIMAX to identify a sparser basis (Musall et al. 2019).”
  
  (11) Page 18: "These results make clear that the working hypothesis - of areas with well-separated feature distributions - is incorrect." This is the clearest statement of the impact of the results. The authors could consider including this in the Abstract or Introduction.
  
  Thank you for pointing this out. We agree, and have added a similar phrase to the Abstract.
  
  (12) Figure 9: It would be great to also just see the average PETHs for each of the four clusters to get a better sense of how their time series differ.
  
  Good idea. The feature computations are a many-to-one mapping, so it’s not possible to literally generate a PETH from the mean of the cluster, but we have added PETHs from well-modulated neurons that are near the means of their subpopulations (Figure 9-figure supplement 1).
  
  (13) Figure 9B: Colorbar has no label.
  
  Fixed, thanks.
  
  (14) Figure 9C: Need a colorbar - need to see the difference in density for locations.
  
  The color map is the same Figure 8B, which is now noted in the caption for Figure 9C. The scaling of likelihoods is almost totally uninformative; they’re not well-behaved like probability distributions, so you’ll note that even on Figure 8B the labels are simply “max likelihood” and “min likelihood”. The important pieces of information here are that these are log likelihoods (noted in the Figure 8 caption), and the visualization of the color map itself (from the color bar). Given these considerations, we have elected to keep the maps themselves a little larger by not trying to squeeze in a minimally-informative colorbar to all of the plots, but thank you for noting that the reference to 8B was needed.
  
  (15) Page 22: "additional spatial structure could be present" - The nature of the additional spatial structure here is a bit opaque. The authors could clarify what additional structure may be present.
  
  Good idea. This paragraph now reads:
  
  “The overlaps in the subpopulation likelihood maps above imply that members of different subpopulations are spatially intermingled, but it is less clear whether each subpopulation has homogeneous response profiles across space. In particular, the use of likelihoods mixes two properties: the fraction of neurons in a given neighborhood that are members of each subpopulation, and the heterogeneity of response profiles amongst members of that subpopulation. These properties could vary systematically with respect to one another, and the spatial structure shown by the likelihood map does not disentangle them.”
  
  (16) Figure 10E, legend: "GMM component" - I think this should be "GMM subpopulation" to avoid confusion with the previous use of "component" above, referring to the components of the GMM models for each region.
  
  Thank you – good catch. Changed to “Likelihood map”.
  
  (17) Page 24: "Note that this consistency also validates the use of clustering to combine components and identify the subpopulations in the first place." - I don't totally get this, and how this result validates the method of combining components, as opposed to just clustering all the cells from all regions at once. Perhaps the implied opposing strategy is not clear here.
  
  We have changed this sentence to:
  
  “Note that this consistency mirrors the low Bhattcharyya distances between corresponding GMM components in Figure 9B, and further validates the use of clustering to combine components from different areas.”
  
  Regarding the reviewer’s larger point, we have three thoughts. First, we do also show the result of fitting the GMM to all cells together (Figure 9-figure supplement 4).The result is similar, but the Anterior subpopulation is lost because its membership is low and so the ICL criterion can’t justify a fourth cluster. Second, because we imaged more neurons in some areas than others, fitting the GMMs to each area separately put their representations on a more equal footing. Finally, doing the analysis this way allowed us to most directly compare our two hypotheses, as illustrated in Figures 8A and 9A.
  
  (18) Page 25: "in the zones where different subpopulations overlapped" - I would omit this, since "intermingled" seems to mean exactly this.
  
  We included this phrase to prevent quickly-skimming readers from incorrectly concluding that the subpopulations overlapped entirely and were therefore intermingled everywhere. The reviewer is right that it’s unnecessary for a careful reader, but we aimed to prevent misinterpretation by readers that might skip to the Discussion for a results summary.
  
  (19) Page 25: "content of the activity, but also its format" - the difference between content and format is not entirely clear. Metaphor not quite metaphoring here. Agreed. We have added examples to clarify.
  
  “This makes clear that there are potentially important differences not just in the content of the activity (e.g., encoding target vs. movement commands (Grier et al. 2026)), but also its format (e.g., linear encoding vs. nonlinear, persistent vs. brief responses).”
  
  (20) Page 30, bottom: In the description of the behavior, more details should be provided, especially since the paradigm is new. For example, it says the block size was reduced - what was the ultimate block size?
  
  Targets were cued randomly in the behavior performed during neural recordings. Blocked trials were used during training and were phased out incrementally as performance improved. This and various other details have been added. Please let us know if there are other specific details you would like to see in the final version.
  
  (21) Page 39, citation of An, Mulcahey et al.: There is a biorxiv version with a different author list that could be cited.
  
  This was an error with our citation manager, and has been corrected. Thanks for catching it.
  
  Reviewer #2 (Recommendations for the authors):
  
  Overall, this is a remarkable study with well-designed in-depth analyses, and I only have some minor suggestions that could help improve the clarity of the paper.
  
  Thank you!
  
  General:
  
  It is not immediately clear to me why the GMM approach used in this study is more interesting than a clustering approach based on single-neuron response patterns (See Esmaeli et al., Neuron 2021 or Oryshchuk et al., Cell Report 2024). But my impression is that it led to the same observation that most clusters are widely distributed across cortical areas, with different proportions, but a few clusters are quite specific to a few areas. A noticeable difference perhaps is the number of clusters - or response profile - that seems particularly low (only 4) in the current study. Could the authors clarify and comment on that, maybe?
  
  The reviewer brings up an interesting point: at heart, these works ask related questions, albeit about different effectors, tasks, recording modalities, and types of information encoded. Those differences probably mean that results cannot be directly compared, but we can certainly discuss the methodological tradeoffs. The two papers mentioned take a more traditional first step, using PCA on the vectorized PETHs to reduce dimensionality, then layer on a spectral approach to improve clusterability. These are good methods; we use something similar as our alternate method, applying VARIMAX to the PCs instead of spectral methods to preserve linearity of transforms. For the kinds of responses both they and we have, PCA will tend to most strongly pick up two aspects of the responses: tuning and timing. This is because vectorized PETHs will have large values in the rows corresponding to the target/condition and time points where the high activity is, and the alignment of these profiles with those of the other neurons will capture a large fraction of the variance. For data like either theirs or ours, this would tend to cluster apart left-tuned cells from right-tuned, and (more importantly here for revealing spatial structure) early-response cells from later response cells. That intuition is consistent with what those papers report, and examining our VARIMAX’ed PC plots closely (which have sharpened in the latest version thanks to improved normalization), we can see that they break apart sub-regions largely based on timing. In our feature approach, we intentionally chose our features to be largely invariant to both tuning preferences and timing. Instead, we chose our features to pick up on what we call the single cell “response format”: response duration; peak time variation (but not absolute timing); and tuning sharpness, persistence, and linearity. These different methods pick up on different aspects of responses.
  
  To double-check that the PCA-then-spectral approach reveals similar structure to our use of VARIMAX on the PCs, we tried applying the suggested method to our data. We applied spectral clustering to the N x 20 PETH PC feature matrix, then fit an area-agnostic GMM to the spectral features. We plot the likelihood map for the components of a GMM with 10 modes. The GMM components did not display clear spatial structure beyond that observed in the VARIMAX’ed PCs (Figure 5-figure supplement 1) and were less interpretable than those identified by area-agnostic clustering of our response features (Author response image 1). As noted, the number of subpopulations identified by the clustering of our hand-engineered features is lower than what would be obtained from clustering the PCs of the PETHs. This is likely the result of the substantial heterogeneity in activity onset and preferred target that is preserved by PCA. Because our central approach is largely agnostic to these two sources of variation, the number of identified clusters reflects the dominant patterns of variation beyond these two sources.
  
  Author response image 1.
  
  GMM fit to spectrally transformed PETH PCs, agnostic to anatomical areas. One GMM was fit to the spectrally-embedded PC feature vectors of cells from all 5 main areas. Each component of a 10 component model is shown.
  
  Also, I think it would greatly help the reader to return to PETHs at some point, if possible, to show the response profiles of each identified neuronal subgroup (page 20). To what extent are they similar or different across the cortical areas (for the same neuronal subgroup)?
  
  This is a good idea. We have added a figure to address this question and the related question by R1 (Figure 9-figure supplement 1). In short, given the wide variety of PETHs we observed, there is of course still substantial variation within subpopulation, and some mild but systematic differences in the distribution of what we observe across areas. We now discuss the conclusions from this plot in the Results:
  
  “As a qualitative depiction of the response profiles identified with each subpopulation, we plotted the two highest-likelihood cells for each area/subpopulation combination (Figure 9-figure supplement 1). These examples reveal stereotypy in the subpopulation responses across areas, but also show variation across areas, especially for the two somatomotor subpopulations.”
  
  Specific:
  
  (1) Figure 2B and M&M: the 3D spatial organization of the target locations is not immediately clear. What is the spacing between target locations? What is the 'final azimuthal spacing'?
  
  Added, thanks. The pairwise horizontal distances between targets were between 1.72 and 6 mm apart and the vertical spacing within a column was 1 mm. “Final azimuthal spacing” just referred to the targets being closer together during training and our gradually spacing them apart to their final locations. We have also added some relevant details about the training.
  
  (2) Figure 2C: It would help to have a scale bar (mm).
  
  Added, thanks.
  
  (3) Figure 2C: It would be easier to appreciate the variability of the trajectories across trials to plot an overlay of trajectories to one target only (could be a Supplementary Figure).
  
  The reviewer has a good point: the variability and accuracy of aiming was hard to ascertain from the plot. We experimented with a few options for making this clearer most effectively. We have now added Figure 2-figure supplement 1 that shows in the third subpanel of panel A the finger centroid trajectories for one of the 15 targets highlighted for the mouse shown in Figure 2C, mouse 3. The centroid trajectories for all other mice are shown as well to illustrate similarities and differences across animals as well as the overall variability. As noted elsewhere we have also included an analysis of the variability of the centroid trajectories, showing that reaches to a given target were more similar than reaches to different targets. We think this provides a fuller picture of the behavior and intend to provide still more detail in future work. Thank you for suggesting additional detail here!
  
  (4) Figure 4: It would be nice to also show the amplitude-normalized grand-average PETHs for the different areas.
  
  This is an interesting suggestion. After careful consideration, we think that this analysis is not as effective for depicting overall timing and modulation profiles as the current ones, given the strong amount of target selectivity and response time heterogeneity (now better visible in the revised Figure 4A). When computing the grand mean of all cells within each area, the dominant features distinguishing areas are onset time and response duration. The differences across areas in these two features are better supported by the analyses of Figures 4 and 5 due to the large amount of heterogeneity in responses within each area. We thank the reviewer for encouraging this exploration; more complicated spin-offs will likely inform additional timing analysis in the next paper on these data.
  
  (5) Figure 7C: figure legend - although it is quite self-explanatory, please explicitly indicate which pattern corresponds to the 'Three contour levels (98%, 95%, 90%)'.
  
  We have now added this as a legend on the figure panel itself (here and on similar plots). Thanks for pointing this out.
  
  (6) Figure 8: Is there also an interesting asymmetry between sensory are motor areas, with neurons in sensory areas being more likely associated with motor areas (B and C), whereas neurons in motor regions are less likely to arise from the distribution of sensory areas (dark blue color in frontal regions in D, E, and F)?
  
  This is an interesting observation, but we understand it to be an artifact of colormap scaling. As mentioned above, likelihoods are not well-behaved like probability distributions are: for example, they are not bounded at 1, and their sums over a dataset can have any positive value. The only things that can be interpreted are their relative values. This makes their scaling functionally arbitrary – you’ll notice we used “min likelihood” and “max likelihood” instead of numbers, which would be nearly meaningless – and therefore presents a problem for scaling the colormaps. We don’t know of a principled way around this problem. To deal with it, we simply put the ends of our colormap at the extreme pixel values. It so happens that both the M1 and M2 maps had a handful of neurons in a less-sampled spot at the bottom of M2 that were very low-likelihood, which results in what you noticed. We debated removing those neurons for this purpose, but we had no basis on which to do that kind of manipulation, so we left it as the most honest representation of the data we could produce.
  
  To clarify this, we now mention in the caption “The ends of the colormap were set to the maximum and minimum likelihood values for each map.”
  
  (7) Figure 9B: there are two-time 'S1-hl: 1' indicated at the two bottom rows of the distance matrix. I suppose one of them should be 'S1-tr: 1' instead?
  
  Fixed, thanks for catching it.
  
  (8) Page 20: 'This hinted at a second hypothesis: that some of the 'modes' (groups of neurons) discovered separately in each area might correspond.' ???
  
  We had meant “mode” as in “multimodal”, but it was very unclear. We have rewritten the sentence:
  
  “This hinted at a second hypothesis: that a peak in the multimodal distribution from one area might correspond to a peak in the multimodal distribution of a different area.”
  
  (9) Figure 9S2: Please indicate for which area each map is computed.
  
  The caption was not clear enough about what we were doing here: we fit the GMM on all neurons together, ignoring which area they came from. We have now clarified it in the caption:
  
  “One GMM was fit to the feature vectors of cells from all 5 main areas. Each map plots the likelihood for all cells to each of the three components of this area-agnostic GMM.”
  
  (10) M&M, Subjects and surgical procedures: 'ambient temperature of 71.5 {degree sign}F', please use international units.
  
  Done.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.10.02.680044v2
www.biorxiv.org www.biorxiv.org

Two time scales of adaptation in human learning rates

1
1. Public_Reviews 11 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  It was nice to see that the authors could distinguish differences between the OFC signals that they observed and those in the visual regions based on changes through the session. However, the linkage between these brain activations and a functional role in generating behavior was left unexplored. Without further exploration, it is hard to tell exactly what role the signals might be playing, if any, in the behavior of interest.
  
  To link the behavioral with the fMRI data, we now correlated fMRI decoding accuracy with behavioral performance. We studied behavioral performance in two ways: the difference in high versus low noise environment learning rates, and mean accuracy (i.e., absolute prediction error). We correlated both measures with the decodability of the environment in the central OFC. Each correlation was calculated either in the full experiment, or only the second half. However, none of these correlations were significant (all p > .1). Given the difficulty of interpreting this result, and our lack of statistical power for doing individual difference analyses, we decided not to report these analyses in the final paper.
  
  Reviewer 2 (public review):
  
  (1) The authors make the distinction between meta-learned "global" learning rates and within environment learning rate adaptation in response to "local" fluctuations/observations. Though the experimental paradigm is novel, there are certainly links to prior work - for instance, though change point structures don't entail revisiting unique environments, they do require meta-learning from environmental statistics that is distinct from transient local adaptation to prediction errors. This tendency to increase one's learning rate after large prediction errors is appropriate in change point environments, though, as is true in this study, the amount of increase should be dependent on. This represents a similar kind of slower-timescale learning or reuse of more "global" parameters, and can be seen to different extents in prior work. It might benefit readers if the authors were to link the current work to previous research more explicitly to draw clearer connections between the approaches and findings.
  
  We thank the reviewer for their very helpful literature suggestions and now contextualize and discuss our findings in light of relevant literature.
  
  (2) Throughout much of the paper, the authors refer to the distinctions between environments primarily as differences in "initial learning rates" or "environment-specific learning rates." This is particularly prominent when discussing fMRI results. Though the optimal initial learning rate did differ across environments, this was the result of differences in underlying task statistics. It will be important to clarify this throughout the text, because of the confounds between task statistics and initial learning rate (and to some extent, the position on the screen), it is not possible to separate the impact of these specific variables. This is also relevant to understanding the justification for using methods like RSA to test whether brain regions represent task states similarly. If the main hypothesis is that neural activity reflects the (initial) learning rate itself, then a univariate analysis approach would seem more natural.
  
  We agree that task statistics are not the same as differences in learning rates. However, we do not consider this as a confound: The point of the differences in task statistics is exactly to generate differences in learning rates. With our paradigm, we deliberately tried to dissociate variations in learning rate that were induced by learned environmental differences versus local task statistics. We tried to make this dissociation more clear, especially when discussing the fMRI results.
  
  (3) For the neuroimaging results in particular, the specificity of some of the results (e.g. ventral striatum showing an effect of prediction error only in the low noise condition in the second half of task experience, only on the first trial) is a bit surprising. Additional justification of or context for these results would be useful to help readers gauge how expected or surprising these findings are.
  
  We agree some of these findings were unexpected. We now also highlight that while we expected the ventral striatum to be involved in prediction error processing, we had no strong a priori expectations regarding these further modulations by time and environment. We also tried to contextualize these interactions more.
  
  (4) There are some methodological details that are unclear (e.g., how were the positions of the crabs selected relative to the location they emerged from? Looking at Figure 1C, it looks like the crabs spread out unevenly, and that the single position they emerge from is not necessarily at the center of the crab locations.) Additional detail and clarity would help address some unanswered questions (more details below).
  
  We clarified the experimental procedure at several places, and now added a video that helps illustrate the trial timeline better.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) With regards to the primary weakness mentioned above, it would be nice to have some link between the brain signals of interest and upcoming behavior. For example, can you read something out of OFC that enables you to better predict what the participant will do next? Or even better, do so beyond any behavioral variability that is explained by the computational model?
  
  To link the behavioral with the fMRI data, we now correlated fMRI decoding accuracy with behavioral performance. We studied behavioral performance in two ways: the difference in high versus low noise environment learning rates, and mean accuracy (i.e., absolute prediction error). We correlated both measures with the decodability of the environment in the central OFC. Each correlation was calculated either in the full experiment, or only the second half. However, none of these correlations were significant (all p > .1; see plots in Author response image 1). Given the difficulty of interpreting this result, and our lack of statistical power for doing individual difference analyses, we decided not to report this analysis in the paper.
  
  Author response image 1.
  
  (2) A number of the learning analyses are based on splitting the session into halves. As a first pass, this seems like a reasonable thing to do, but I certainly wonder what the dynamics of the meta-learning actually look like, and it seems like the data collected would be sufficient to gain some insight into those dynamics through some sort of sliding window analysis.
  
  We thank the reviewer for this interesting suggestion, which was also raised by Reviewer 2. We now calculated the learning rate in a sliding window of 20 trials (i.e., trial x to x + 19), and provide revised figures for each experiment separately (Fig. 2E and Fig. 4E, respectively).
  
  (3) The model selection procedures described make sense, but it would still be useful if the authors justified them by showing that they work in synthetic data (ie, generate a confusion matrix). I may be confused about what delta-SE is, but I'm confused about why two models with very different fits have the same value (211) for that metric.
  
  We report model recovery on synthetic data, which yielded model recovery rates of 100%, and added these to our Methods section. To clarify the Reviewer’s second point, ∆SE is the standard error of the difference between a model’s LOOIC and the top ranked model’s LOOIC. There is no one-to-one mapping between the ∆SE and a model’s LOOIC.
  
  (4) Was the central OFC anatomical ROI overlapping with the cluster surviving in the whole brain analysis? I didn't see this mentioned in the text, and it certainly would be important for interpreting the two results together.
  
  The central OFC indeed overlapped with the cluster surviving whole brain analysis, which we report on page 17-18.
  
  (5) The authors found regions that reflected learning rate at the "island presentation" phase of the task - it could be distinguishing this analysis and its meaning from other work that has focused on representations of learning rate at the time of feedback.
  
  We agree that this is an important distinction worth emphasizing. Therefore, we added the following lines to our discussion paragraph:
  
  “Importantly, previous studies examined neural correlates of learning rates during outcome evaluation, where learning rates may be adjusted online as a function of locally experienced prediction errors (e.g., (Behrens et al., 2007; Browning et al., 2015; Nassar et al., 2012). In contrast, our RSA analysis targeted neural activity at island presentation, before any outcome information was available. At this moment, learning rates cannot be updated based on current feedback and instead reflected the retrieval of a previously learned, environment-specific learning-rate settings. This difference reflects our hypothesis that the OFC represents the latent states in a cognitive map of the task (Knudsen & Wallis, 2022; Moneta et al., 2024; Schuck et al., 2018; Wilson et al., 2014), which are expected to activate as soon as the agents can infer which task state it is in. Several studies have identified such “partially observable” task states in the medial OFC (Bradfield et al., 2015; Schuck et al., 2016; Tan et al., 2025; Wimmer & Büchel, 2019), in line with the region identified here (but see e.g., (Ongur & Price, 2000), for important anatomical distinctions between medial and lateral OFC and (Tan et al., 2025) for an example of related functions in lateral OFC). Our finding extends this notion by suggesting a link between OFC and meta learning, wherein meta-learned information becomes encapsulated in task states (Hattori et al., 2023; Moneta et al., 2024).”
  
  (6) "Specifically, it showed a more negative response to larger (location) prediction errors, which is consistent with its documented role in showing a more positive response to more positive reward prediction errors (Calderon et al., 2021) - keeping in mind that being closer to the centre of where the crabs appeared (i.e., smaller location prediction errors) is less negatively or more positively surprising (i.e. smaller negative or larger positive reward prediction errors)."
  
  I found this sentence very hard to parse. Do PE responses in the high noise environment get "compressed" in their representation over time (ie, it takes a larger error to get the same BOLD response)? If so, this relates to claims made in Diederen 2016... but see also Mah 2024 Cell Reports, who fails to see learning rate encoded in DA system in striatum of rodents that appear to adjust their learning rates.
  
  Thank you for pointing to this. We agree that this sentence was hard to parse, and so we now split it in three revised sentences. We also agree with the Reviewer’s interpretation, and would like to thank the Reviewer for their useful literature suggestions which we now added to our discussion.
  
  (7) Figure 7 should use a different color scheme because many of the activations just appear black, and I can't tell whether they are positive or negative. It was also notable in Figure 7A that regions are not visible, including ACC, which is typically thought to encode prediction errors in such paradigms. It would probably be useful for the authors to include a table of all clusters exceeding multiple comparisons correction and to on differences to other work examining absolute prediction errors. ACC does appear on the second trial, which made me wonder whether there were changes in the prediction error coding from first to subsequent trials.
  
  Thank you for pointing this out. We now revised our color scheme which we agree makes it much clearer now. Although the ACC is frequently implicated in prediction error–related signals (e.g., Behrens et al., 2007), models suggest that ACC responses more strongly reflect unsigned prediction errors, surprise, or the need for control and model updating (Alexander & Brown, 2019; Hayden et al., 2011; Silvetti et al., 2018). In our task, ACC activity only emerged on the second trial, when participants had formed an initial estimate and prediction errors could meaningfully signal the need to update internal models or control settings. We now added a to the Discussion highlighting this distinction and relating our findings to this prior work emphasizing prediction errors and control-related signals in ACC.
  
  (8) The authors suggest that fast learning would presumably occur in a neural activation space, whereas slow learning would occur through weight adjustments. This makes sense, but activity-based dynamics have been suggested to do rapid adjustments by encoding a "latent state" though (Razmi 2022 j neurosci) -- and such a latent state has been shown in OFC (Schuck etc)... but here OFC is more implicated in the slow learning. I am curious about whether authors could on this a bit in the discussion.
  
  Thank you for bringing up this interesting question. We can only speculate but a crucial factor is on which level of resolution tasks states operate. On the one hand “detailed” trial-level states are needed that map a specific sensory input onto a specific latent state and its value. Such states would change quickly, possibly through activation dynamics, and are in line with how they have been operationalized in Razmi or Schuck etc. On the other hand, successful task performance also needs “higher level” states that describe entire task phases or full tasks, as in the present experiment. Due to the different speeds of learning, it appears plausible that these would be learned with synaptic changes. We expand on this in the discussion as follows:
  
  “Our finding extends this notion by suggesting a link between OFC and meta learning, wherein meta-learned information becomes encapsulated in task states (Hattori et al., 2023; Moneta et al., 2024). Consistently, OFC has been shown to represent task states (Moneta et al., 2024; Stalnaker et al., 2015; Wilson et al., 2014). While earlier evidence shows that the OFC represents concrete aspects of task states, such as task-relevant stimulus features (Schuck et al., 2016), we hypothesized that the OFC also represents more abstract aspects, such as learned, environment-specific learning rates. Indeed, we showed that the central OFC gradually came to represent these environment-specific learning rates (or the environment-specific statistics that drive them). While previous work speculated that these different levels could have different neural underpinnings (Sharpe et al., 2019), our findings indicate OFC might signal states on multiple levels. This does not imply identical learning dynamics; fast-changing trial-specific states might be learned through activity dynamics, while higher-level contextual states could involve synaptic plasticity.”
  
  (1.9) Also, as a more minor point in the same section, the sentence about blocking synaptic plasticity in OFC sounded interesting, but should have a reference.
  
  Thank you for noticing, we now added the reference (Hattori et al., 2023).
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Additional links to prior literature: In terms of prior work in which there is something akin to more "global" adaptation, some examples of potentially relevant prior work include:
  
  McGuire, Nassar, Gold, & Kable (2014) Neuron
  
  D'Acremont & Bossaerts (2016) Cerebral Cortex
  
  Lee, Gold, & Kable (2020) Decision
  
  Bakst & McGuire (2021) JEP: General
  
  Bakst & McGuire (2023) Cognition
  
  We would like to thank the reviewer for pointing us to these different literature suggestions which we agree help us contextualize and discuss some of our findings better. We now refer to McGuire et al. (2014) when discussing the fMRI results, and d'Acremont & Bossaerts (2016) when discussing potential alternative strategies in the high noise environment (the Reviewer’s last point). Finally, we integrated the clearly relevant works of Bakst & McGuire (2021; 2023) and Lee et al. (2020) in our discussion of meta-learning different adaptive strategies.
  
  (2) Individual differences: Though not always the focus of work on predictive inference, one common finding has been that there are pronounced individual differences in behavior (see, e.g., coefficients in Figure 2 in Nassar et al. 2019 eLife, or Figure 2 McGuire et al. 2014 Neuron, or Bakst & McGuire 2023 Cognition). There appears to be substantial variability between individuals in your data as well (i.e., Figure 2B, 4B, and the modeling figures). It would be interesting to see some direct exploration of this variability: baseline learning rate appears to differ between participants to a large extent, does their rate of adaptation (across trials within a block) also differ? Does their metalearning occur at different rates (in fact, do some participants not show evidence of appropriate meta-learning at all)?
  
  Relatedly, your computational modeling approach fits the six candidate models hierarchically, and therefore the reported results show the overall best fit for the group. It might be worthwhile to determine whether individuals have different best-fitting models. This could be another way to characterize the variability between individuals.
  
  In concert with this, it could be a useful complement to determine whether either the strength of the OFC neural similarity results or their time course reflects aspects of behavior. Put another way, is it the case that not only does OFC activity and behavior both come to reflect task structure, but that these changes happen to a similar extent and over a similar time course across individuals?
  
  We agree it would be highly interesting to investigate meaningful individual differences in both fast and slow adaptations in learning rate. However, our sample was not set up and is underpowered to conduct such analyses. In response to a similar by Reviewer 1, we did run correlational analyses between differences in learning rate, performance accuracy, and the responsiveness of the OFC. However, none of these analyses yielded a significant effect. We decided to not include these results in the paper, for reasons of statistical power, but we report them in Author response image 1.
  
  (3) fMRI:
  
  (3a) The primary finding in OFC is restricted to the central OFC. The manuscript would benefit from additional explanation regarding this specific subregion.
  
  Thank you for bringing up this important distinction. In the discussion we now clarify as follows:
  
  “This difference reflects our hypothesis that the OFC represents the latent states in a cognitive map of the task (Wilson et al., 2014; Schuck et al. 2018; Knudsen & Wallis, 2022; Moneta et al, 2023), which are expected to activate as soon as the agents can infer which task state it is in. Several studies have identified such “partially observable” task states in the medial OFC (Schuck et al., 2016; Bradfield et al., 2015; Wimmer et al., 2019; Tan et al., 2025), in line with the region identified here (but see e.g., Öngur & Price, 2000, for important anatomical distinctions between medial and lateral OFC and Tan et al., 2025, for an example of related functions in lateral OFC).”
  
  (3b) Though the main clusters visible in Figure 6 are the occipital and OFC clusters, there appear to be others. Did other clusters indeed rise to statistical significance in the whole-brain analysis? If so, is there a reason they aren't included or discussed?
  
  All clusters visible in Figure 6C survived FDR correction. However, we refrained from interpreting these other clusters, because we had no prior hypotheses about them like we did for the OFC.
  
  (3c) Why do you posit that the ventral striatum becomes less sensitive to RPE on the second trial over time? And why is the ventral striatum only sensitive to RPE in the low noise environment generally?
  
  We reasoned the ventral striatum should be more responsive to more positive reward prediction errors. While we further assumed this response could be modulated by both time and environment, we would like to emphasize that we had no specific hypotheses about the direction of this modulation. We now also make this clearer in the manuscript. This being said, we believe both the pattern that its responsiveness to the second trial decreases over time, and the pattern that it was most sensitive to the low noise environment, can be considered fitting with its broader involvement in coding behaviorally relevant reward prediction errors. Namely:
  
  First, we believe that as the participants learn more about the global reward structure of the task, they should obtain a better understanding of the fact that, per round, all crabs always center around a fixed mean. Therefore, the first RPE is most behaviorally relevant, and every later RPE has an exponentially decreasing relevance. As participants obtain more experience with this aspect of the task over time, the VS should show a lower responsiveness to the second RPE over time.
  
  Second, as participants learn more about the local differences between the three different environments, they should learn that especially in the low noise environment, RPEs are most behaviorally informative. That is, in this environment it makes most sense to have a high learning rate and thus let the RPEs substantially inform the placement of the cage on the next trial. Accordingly, participants showed that the ventral striatum was most responsive to RPEs in these environments.
  
  (4) Methods
  
  (4a) This section could generally benefit from some proofreading.
  
  We now proofread the method section.
  
  (4b) The main results text states that 49 participants performed Experiment 1, while the methods section reports 50 participants. Which is correct?
  
  (4c) Following this, on page 8, statistical results are reported with a df = 49 (which would be appropriate only if n=50).
  
  The correct sample size was actually 50, we adjusted the text and degrees of freedom where incorrect accordingly (note: only text is in track changes, but degrees of freedom were also changed accordingly).
  
  (4d) Additionally, I am a bit surprised by the Experiment 1 findings that learning rates on the second trial were significantly different between low and high noise conditions, in that the effect size found using all trials was stronger than both the first half of trials (no significant effect) and the second half (significant but weaker than all trials). Are these all the same type of statistical test? Double-checking the statistics might be worthwhile.
  
  It is not the effect size that is larger across the full experiment, but the t-statistic. This is possible because a t-statistic depends on both effect size and noise estimate, and the latter is smaller with more data.
  
  (4e) The methods and results both state that the five crabs always emerged from one position in the sand. How were the locations of the crabs selected relative to this position? Looking at Figure 1C, it looks like the crabs spread out unevenly, and that the single position they emerge from is not necessarily at the center of the crab locations.
  
  The crabs did indeed spread out evenly. However, we can see how the graphic in Figure 1C can be confusing, as two crabs are shown to be caught, which breaks the symmetry of the dispersion (because some crabs can run away after the even spreading phase, see Methods). We emphasized the even spreading more clearly in the new version of the paper. We think the flow of events will be much clearer with our newly added animation (Video 1).
  
  (4f) The methods section states that the crabs "spread out to cover the same proportion of the screen width as the cage (18.75%)" (page 23). The corresponding visual in Figure 1C appears to show something different.
  
  This looks different because the graphic illustrates the last 500 msec, where crabs can run away (see also response to 4e, and the novel animation that was added).
  
  (4g) Information on the timing of the trials would be useful to include in Figure 1C or similar.
  
  The reader can find this information in the Methods section. We chose not to include it in the caption to avoid information overload.
  
  (4h) The methods section specifies that there was a 3-7s ITI after the first and second trials of each block. How was the ITI selected for each trial? Were there ITIs between the other trials? If so, what were they?
  
  The ITIs were selected from a truncated exponential distribution. This selection was not random, but rather a distribution was carefully constructed for each environment (and event of interest: boat presentation, first trial of each block, second trial of each block) separately to ensure that enough longer ITIs were selected for each environment (and event of interest). Of course, the order in which the ITIs were used across blocks, was random. The same approach was used to determine the duration of the presentation of the boat at the start of each block. There were no ITIs after later trials.
  
  (4i) Please provide a link to the data and analysis materials on OSF in the text.
  
  We now provide a link to the data and analysis materials in our methods section.
  
  (4j) In the methods section, there are some references to information provided "below" (page 26: "The two approaches resulted in different posterior densities (see below) for estimate uncertainties, but in similar posterior densities (see below) for learning rates..."). Where in the paper is this referencing?
  
  We indeed did not detail this further as we considered it not further relevant to our main study, and now removed the references to “below”.
  
  (4k) The methods section specifies using uniform priors between the lower and upper bounds of the relevant parameters. This seems likely to be 0 and 1, but should be listed explicitly.
  
  Thank you for noticing. We now added this to our manuscript.
  
  (4l) For parameter recovery, correlations are provided to indicate effective recovery. These correlations are indeed high and suggest excellent recovery, but correlations wouldn't reveal if there was systematic over- or underestimation occurring. It might be useful to provide some visualizations of the parameters and their estimates to speak to this potential issue.
  
  We now visualize the parameter recovery results in Author response image 2, which show that, indeed, there was a slight underestimation of the decay rates, but not the learning rates. Importantly, our main analyses and results all pertain to the learning rates, and we never made hypotheses or conclusions about the decay rates.
  
  Author response image 2.
  
  (4m) The methods section ends with a reference to a reward localizer (page 32). This localizer doesn't appear to be mentioned/used elsewhere.
  
  Indeed. We implemented the localizer because we wanted to independently identify reward processing areas. However, this localizer did not succeed in localizing a reward area (no significant results), possibly due to the fact that (1) it was performed by the end of the experiment when participants may have been fatigued, and (2) there was no learning component in this localizer task. For these reasons, we did not use it after all.
  
  (5) Analysis:
  
  (5a) Did you consider fitting a Bai model that only allowed for environment-specific initial learning rates (with a non-environment-specific decay rate)? Given that the data (e.g., Figure 2, Figure 4) seems to support differences in initial learning rate but not necessarily a difference in the rate of change, it might be worthwhile to see whether a model like that fits best.
  
  We now fitted this extra model, which we called the semi-environment-specific Bai model. See Author response tables 1 and 2 for result in experiments 1 and 2, respectively) for the results. This new model has the best (in Experiment 2) and second-to-best (in Experiment 1) LOOIC. In a way, this is not surprising, because the model formulation is entirely based on the data. We think that we can draw the same substantive conclusions with or without this extra model, so for simplicity we did not include this new model in the paper itself.
  
  Author response table 1.
  
  Note. Models are ranked in descending order according to how well they fit the data. LOOIC refers to a model’s approximated expected log pointwise predictive density. Higher values indicate higher out-of-sample predictive fit. SE refers to the standard error of a model’s LOOIC. ∆LOOIC refers to the difference between a model’s LOOIC and the top ranked model’s LOOIC. ∆SE refers to the standard error of the difference between a model’s LOOIC and the top ranked model’s LOOIC.
  
  Author response table 2.
  
  Note. Models are ranked in descending order according to how well they fit the data. LOOIC refers to a model’s approximated expected log pointwise predictive density. Higher values indicate higher out-of-sample predictive fit. SE refers to the standard error of a model’s LOOIC. ∆LOOIC refers to the difference between a model’s LOOIC and the top ranked model’s LOOIC. ∆SE refers to the standard error of the difference between a model’s LOOIC and the top ranked model’s LOOIC.
  
  (5b) If part of the goal is to investigate whether there is a distinct local change in LR between conditions (dependent on prediction errors), then there might be more direct ways of doing so as a complement to the modeling approach. One potential way could be to visualize the LR or change in LR as a function of PE.
  
  We agree that it’s beneficial to use a direct (model-free) approach to represent learning rate as a function of condition; that is also part of our approach. For example, see Figures 2, 4, which shows learning rate as a function of condition, but in a model-free manner. We think learning rate as a function of prediction error is less informative, because the idea is that prediction error can (in Kalman-filter terminology) be indicative of either noise variance or process variance, and participants are able to distinguish between them. This is also why we constructed the conditions in such a way that on the very first trial, prediction errors were on average the same across conditions. The fact that participants did respond appropriately to prediction errors on the very first trial (i.e., larger updates or learning rates in the low noise condition), suggested they are able to assign the prediction error to process variance (in the low noise condition) versus noise variance (in the high noise condition).
  
  (5c) In addition to looking at the evolution of LR across trials within a block separated by task epoch (i.e., Figure 2C-D & Figure 4C-F), the structure of the task would lend itself very nicely to visualizing the evolution of the second trial LR on its own across instances. This could provide additional insight into the meta-learning process.
  
  We thank the reviewer for this interesting suggestion, which was also raised by Reviewer 1. We now calculated the learning rate in a sliding window of 20 trials (i.e., trial x to x + 19), and provide revised figures for each experiment separately (Fig. 2 and 4, respectively).
  
  (6) The environment-specific Bai model appeared to become less good at capturing participant behavior with increased environmental noise. Why do you think this is?
  
  We thank the reviewer for raising this point. In this environment, individual outcomes are considerably less indicative of the latent mean, which may reduce the usefulness of the trial-by-trial, prediction-error–driven learning-rate adjustments that we see in the other environments. Under such extreme conditions of variability, people may rely less on delta-rule updating and more on alternative strategies (D'Acremont & Bossaerts, 2016; Reynders et al., 2026), such as exploratory adjustments or heuristics that are not explicitly captured by the Bai model but also outside the scope of the present paper.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.06.05.658048v3
www.biorxiv.org www.biorxiv.org

Cytoplasmic circular dsDNA is a key constituent of stress granules

1
1. Public_Reviews 11 Jun 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, Demeshkina and Ferré-D'Amaré showed that extrachromosomal circular DNA (eccDNA) and chromatin-associated proteins are present in stress granules, based on proteomic and sequencing analyses. Using HCR-FISH combined with imaging, the authors showed the colocalization of eccDNA with stress granule proteins. Furthermore, they found that CRISPR machinery targeting the eccDNA component of stress granules disrupts stress granule assembly, and that this effect is largely independent of Cas9 endonuclease activity. Notably, expression of cytoplasmic chromatin factors restores stress granule formation in the presence of CRISPR machinery in yeasts. This also rescues the growth defect caused by hypoxic stress, which correlates with impaired stress granule formation. Together, this manuscript provides insight into the presence of eccDNA in cytoplasmic membraneless organelles, specifically stress granules, and suggests a functional role for eccDNA within these structures under stress conditions.
  
  Strengths:
  
  The authors used a panel of ribonucleases to demonstrate that stress granule cores isolated from yeast and HEK293 cells are resistant to plasmid-safe DNase, an enzyme that does not degrade circular double-stranded DNA. To further support the presence of extrachromosomal circular DNA (eccDNA) in stress granules, they performed Circle-Seq on stress granule cores. The gel electrophoresis and sequencing experiments complement each other well, providing consistent evidence for eccDNA within these granules. Overall, this study provides insight into potential cytoplasmic roles for eccDNA, an area that remains largely unexplored.
  
  Weaknesses:
  
  (1) Figure 1F suggests that stress granule cores are susceptible to DNase I but not to plasmid-safe DNase (psDNase). However, its smearing pattern in the psDNase condition appears similar to that in the DNase I treatment shown in Figure 1E, although psDNase produces more discrete bands. The authors should comment on these differences between Figures 1E and 1F, or consider revising Figure 1F to improve consistency with Figures 1E and 1D.
  
  We suggest that the appropriate comparisons are between the DNase I and psDNase treatments within each figure panel, and not between panels (e.g., Figures 1E vs. 1F). The electrophoretic gels in the different panels were run for different lengths of time, and therefore the comparison between gels would be spurious. In Figure 1E, electrophoresis after DNase I treatment results in a characteristic smear, while after psDNase treatment yields discrete bands (lanes 2–3 vs. 4–5). Electrophoretic conditions for this figure were optimized to minimize diffusion and allow quantitative evaluation. The electrophoresis shown in Figure 1F, which compares yeast and mammalian stress granule core nucleic acids, was run for a longer period — as evidenced by the greater migration distance from the loading wells — yet still clearly shows the same qualitative difference between DNase I (smear, lane 3) and psDNase (discrete bands, lanes 1–2) treatments for the yeast samples. The apparent discrepancy noted by the referee therefore simply reflects the difference in electrophoretic conditions between the gels shown in the two separate figure panels.
  
  (2) The authors should clearly define "colocalization". Does it refer to complete spatial overlap between two signals (i.e., VCP and T30), or partial overlap (i.e., AHNAK DNA and G3BP)? Figure 3 and the associated text are descriptive. Quantitative analysis would strengthen the conclusions. For example, the authors could analyze the fraction of molecules localized to stress granules or provide Pearson's correlation coefficient or similar measurements.
  
  In our considered opinion, categorizing colocalization as either "partial" or "complete" implies a level of molecular precision that is physically unattainable at the resolution limits of any current light microscopy modality, and would therefore be misleading. Our approach employs super-resolution confocal laser scanning microscopy (Airyscan) with hybridization chain reaction fluorescence in situ hybridization (HCR-FISH) or with immunofluorescence. The detection method used offers higher spatial resolution and signal-to-noise ratio than single-point detector/physical pinhole confocal (or widefield epifluorescence) microscopy used in most prior stress granule studies. Despite these enhancements, the system retains inherent diffraction-imposed limits: a lateral (XY) resolution of ~130 nm and an axial (Z) resolution of ~350–400 nm, defining the minimum separable distance between two fluorescent signals. Structures smaller than these thresholds remain unresolved within a single point spread function (PSF) maximum – a volume sufficiently large to simultaneously accommodate multiple stress granule cores or tens of thousands of individual proteins (such as G3BP) and dozens of nucleic acid molecules several thousand nucleotides in length. Consequently, any detected fluorescence signal may represent the superimposition of a large and indeterminate number of individual molecules or particles. True molecular interaction analysis remains for future studies using technologies with angstrom resolution (e.g., cryo-electron tomography, cryo-EM, X-ray crystallography, smFRET, EPR, NMR, etc.). Metrics such as Pearson's correlation coefficient report solely on the degree of signal overlap at the PSF scale (hundreds of nanometers) and would not provide any insight beyond what is already conveyed by our data.
  
  (3) The authors used a CRISPR-based approach to target the Ty1 LTR retrotransposon, an abundant stress granule eccDNA, and they observed a loss of stress granule formation. However, this phenotype may be specific to Ty1 eccDNA rather than representative of all eccDNA species present in granules. In particular, the title "Cytoplasmic circular DNA is a key constituent of stress granules" implies a broader role. To support this claim, the authors should consider approaches that more globally deplete eccDNA rather than targeting a single eccDNA.
  
  We respectfully disagree with the referee that further depletion of eccDNA would alter our conclusions. A central finding of our study is that stress granules can be abrogated cytoplasmically by co-expressing a Cas9 endonuclease, active or inactivated by point mutations (D10A /H840A), and a gRNA (which is itself a fusion of the crRNA and trcrRNA, natively separate RNAs in the source bacterium). We show in Figure 4 that when the gRNA targets the Ty1 sequences, endonucleolytically active holoenzyme co-expression in the cytoplasm results in loss of the corresponding eccDNAs, as assayed by sequencing of the relevant cytoplasmic fractions. Critically, when a catalytically inactive Cas9 protein (dCas9) is co-expressed with the gRNA instead of the wild-type endonuclease, depletion of the eccDNAs containing Ty1 sequences no longer takes place (Figures 4D and 4E), but stress granule formation is still abrogated (Figure 4C).
  
  In our manuscript, we indicated (as "data not shown”) that co-expression with Cas9 of a gRNA "targeting" a sequence that is absent from the S. cerevisiae genome still results in abrogation of stress granule formation. These data are shown in Author response image 1. The gRNA is targeted to the sequence 5’-agaatcgatgcattt, which is absent in the genome of the yeast strain used.
  
  Author response image 1.
  
  It follows from our experiments that stress granule abrogation (1) is not a result of the catalytically active Cas9 endonuclease; (2) is not a result of the presence of a gRNA-directed but catalytically inactive Cas9 holoenzyme, but (3) is the result of the presence of a CRISPR holoenzyme (as defined above) in the cytoplasm.
  
  To reiterate, abrogation of stress granules occurs when a Cas9-gRNA complex is present in the cytoplasm, regardless of whether the nuclease activity exists, or the gRNA targets a sequence that is present in the genome. Importantly, the holoenzyme is required for this phenomenon: presence of the endonuclease or the gRNA alone does not abrogate stress granule formation (Figures S5).
  
  It is because of this unexpected observation that we next hypothesized that activities of the Cas9-gRNA complex other than sequence-specific gRNA-targeted endonucleolytic activity is driving the suppression of stress granule formation. The best documented such activity is DNA sequence sampling (1-dimensional diffusion). We think that 1-dimensional diffusion of the Cas9-gRNA holoenzyme is displacing from the cytoplasmic eccDNA interactors whose association with the DNA is required to drive stress granule assembly. The fact that the stress-granule suppressive effect of cytoplasmic Cas9-gRNA expression can itself be suppressed by two completely unrelated proteins whose only shared feature is action on chromatin (CHD1 and GCN5) strongly supports this hypothesis (Figures 4G, 4H and S6; also response to point 4, below), in addition to confirming that cytoplasmic eccDNA is packaged by histones in a conformation that CHD1 and GCN5 can both recognize.
  
  (4) The authors should provide additional experimental evidence to support the claim that eccDNA is packaged in a chromatin-like state. The rescue of stress granule formation by ectopic expression of modified chromatin-associated proteins (CHD1NES and GCN5NES) following CRISPR treatment does not necessarily demonstrate that eccDNA is packaged like chromatin under basal conditions.
  
  We would like to reiterate the temporal order in our experimental design (detailed in full in Methods and summarized in Results). Cas9<sub>NES</sub>-gRNA and CHD1<sub>NES</sub> (or GCN5<sub>NES</sub>) were expressed simultaneously (not sequentially) in the cytoplasm. This was intentional, so as to give each player ample opportunity to engage its preferred substrate under non-stress conditions, prior to the brief oxidative stress. The referee appears to believe that cytoplasmic eccDNA was pre-exposed to Cas9<sub>NES</sub>-gRNA, and then the bound endonuclease challenged with chromatin-modifying enzymes.
  
  Our experimental design accounts for the contrasting substrate specificities of CRISPR and chromatin-modifying enzymes. Cas9-gRNA (holoenzyme) binds to nucleosome-free DNA with sub-nanomolar dissociation constant (Kd 0.1–1 nM) but its association with chromatinized DNA is impeded 5- to 100-fold (Isaac et al., 2016; Yarrington et al., 2018; Strohkendl et al., 2021). In contrast, whereas CHD1 binding to DNA is strictly nucleosome-dependent — its chromodomains actively block engagement with protein-free DNA (Hauk et al., 2010), and its productive binding (Kd 10–200 nM) relies on obligate multivalent contacts with the histone octamer, H4 tail, and wrapped DNA (Farnung et al., 2017; Sundaramoorthy et al., 2018).
  
  Our observation that stress granule formation was unperturbed following oxidative stress is most parsimoniously interpreted as CHD1<sub>NES</sub> outcompeting the CRISPR machinery for cytoplasmic binding to eccDNA by virtue of the latter existing in a histone-bound state that is recognized as chromatin by CHD1 –simultaneously favoring CHD1<sub>NES</sub> engagement and impeding Cas9 access. Thus, our experiment in effect employs stress granule formation as a readout for differential binding to chromatin or chromatin-like eccDNA.
  
  Farnung, L., Vos, S.M., Wigge, C., and Cramer, P. (2017). Nucleosome-Chd1 structure and implications for chromatin remodelling. Nature, 550(7677), 539–542.
  
  Hauk, G., McKnight, J.N., Nodelman, I.M., and Bharat, T.A.M. (2010). The chromodomains of the Chd1 chromatin remodeler regulate DNA access to the ATPase motor. Mol Cell, 39(5), 711–723.
  
  Isaac, R.S., Jiang, F., Doudna, J.A., Lim, W.A., Narlikar, G.J., and Bhatt, D.L. (2016). Nucleosome breathing and remodeling constrain CRISPR-Cas9 function. Nature Struct Mol Biol, 23(12), 1097–1103.
  
  Strohkendl, I., Saifuddin, F.A., Gibson, B.A., Bhatt, D.L., Russell, R., and Bharat, T.A.M. (2021). Inhibition of CRISPR-Cas9 by bacteriophage-encoded proteins. Mol Cell, 81(8), 1665–1679.
  
  Sundaramoorthy, R., Hughes, A.L., Singh, V., Wiechens, N., Ryan, D.P., El-Mkami, H., Petoukhov, M., Svergun, D.I., Treutlein, B., Sproll, P., and Owen-Hughes, T. (2018). Structural reorganization of the chromatin remodeling enzyme Chd1 upon engagement with nucleosomes. eLife, 7, e35720.
  
  Yarrington, R.M., Verma, S., Schwartz, S., Trautman, J.K., and Carroll, D. (2018). Nucleosomes inhibit target cleavage by CRISPR-Cas9 in vivo.PNAS, 115(38), 9450–9455.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors report the presence of extrachromosomal circular DNAs (eccDNAs) within the core of stress granules purified from both yeast and mammalian cells.
  
  Strengths:
  
  This study is important for understanding the molecular mechanisms underlying stress granules containing eccDNAs and is likely to have a major impact on future research. A major strength of the study is the extensive experimental validation performed in yeast cells. In particular, cytoplasmic CRISPR-mediated targeting of eccDNAs suppresses stress granule formation and impairs recovery from hypoxic stress in yeast cells.
  
  Weaknesses:
  
  The conclusions would be further strengthened by validating the functional findings in an additional model system, such as mammalian cells.
  
  Comments:
  
  (1) Section: "Stress granule cores contain eccDNA"
  
  (a) The presence of eccDNAs would be more convincingly demonstrated using an orthogonal validation approach, such as DNA FISH targeting MYC and Centromere 8 (CEN8) on metaphase spreads from HEK293T cells (as performed in PMID: 34819668).
  
  The relationship between eccDNA dynamics and stress granule assembly across distinct cell cycle phases remains an important and poorly explored question. To our knowledge, no published data currently describe how stress response mechanisms are regulated during mitotic division, particularly in metaphase. Our identification of eccDNA as a component of stress granule cores can provide a first tractable framework to investigate this relationship. However, a systematic and in-depth characterization of this phenomenon warrants a dedicated future investigation.
  
  (b) The study would also benefit from assessing the presence of eccDNAs in the extracellular medium. For example, DNA could be extracted from conditioned media and analyzed by PCR using primers spanning eccDNA breakpoint junctions (as performed in PMID: 40074906; PMID: 36123406).
  
  We agree with the referee that eccDNA biology represents a fascinating and rapidly evolving area of research, particularly given the emerging role of eccDNA in oncogenesis. In this context, our identification of eccDNA as a core structural component of stress granules opens a novel avenue for exploring the connection between stress-dependent translational regulation and disease-associated eccDNA dynamics. While we acknowledge the importance of this direction, a rigorous investigation of this relationship requires extensive multifaceted experimentation that falls beyond the scope of the current study.
  
  (2) Section: "eccDNA-CRISPR abrogates stress granules"
  
  These findings should be further validated under additional stress conditions, such as drug-induced stress (like methotrexate) or nutrient deprivation in the cell medium. In addition, the same set of experiments should be performed in HEK293T cells to support the broader relevance of the observations.
  
  We agree with the referee that the composition and dynamics of stress granules arising from different stressors is an important endeavor. However, given the range of stressors documented to result in stress granule formation, those studies fall well beyond the scope of this manuscript. We will note however that the presence of eccDNA in stress granules of yeast and human cells is strong evidence for conservation of function(s). We think that exploration of the role of eccDNA in stress granule formation across the kingdoms of life (stress granules were first observed in heat-shocked tomato plants), cell cycle stages, stressors, etc. will be important research programs for the future.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Figures 3D and 3I: The use of magenta and red makes it difficult to distinguish between the two labeled signals. Consider using more contrasting colors to improve visual clarity.
  
  We appreciate the comment regarding color choices in the figures. In our view, magenta and red are sufficiently distinguishable as nucleic acid labels, particularly when combined with the green signal representing G3BP in these panels.
  
  (2) Figures 3F and 3G: Do the authors have an explanation for why AHNAK or MAPT DNA (white) does not colocalize with the anti-DNA immunofluorescence signal?
  
  Immunofluorescence (IF) is standard for detecting protein antigens but has limitations when the target is a non-protein molecule such as DNA, owing to its compacted chromatinized state. Anti-DNA antibodies can miss a significant fraction of their targets because the DNA backbone remains largely inaccessible, a limitation that DNA-FISH overcomes by directly hybridizing probes to denatured DNA sequences with high specificity. The fixation step required for both IF and FISH imaging can introduce additional steric barriers that disproportionately restrict antibody access compared to small nucleic acid probes. Even under optimized conditions, the IF signal with anti-DNA antibodies is inherently reflective of a subset of the total cellular DNA content.
  
  (3) Adding a subtitle on page 12 ("The abundant histones in purified stress granule...") would improve the overall structure and readability of the manuscript.
  
  We think that an additional subtitle would not substantially improve the readability of what is, admittedly, a very dense manuscript that employs a diversity of experimental approaches.
  
  (4) It would strengthen the analysis if statistical significance were included for the different time points in Figure 5C.
  
  We appreciate the reviewer’s suggestion. Figure 5C shows the largest difference at 40–45 hours after stress recovery, which is statistically significant between Cas9NES-gRNA (or dCas9NES-gRNA) and Cas9NES or gRNA only (two-tailed Student’s t-test, *, p ≤ 0.05). All primary experimental data are publicly available (FigShare) so further analyses can be performed by interested future parties.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.12.711345v1
www.biorxiv.org www.biorxiv.org

Exploration of the structural and functional diversity in the metamorphic RfaH subfamily

1
1. EMBOpress 11 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  RESPONSE TO REVIEWERS
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  Summary:
  
  This is an interesting and ambitious study by Tabilo-Agurto and co-workers. It combines deep learning structure prediction (AlphaFold2), targeted molecular dynamics simulations, and in vivo functional assays to probe structural, functional, and evolutionary aspects of the metamorphic protein RfaH. More broadly, the work addresses an important question: whether intermediate structural states may exist along evolutionary trajectories of metamorphic proteins. A particular strength of the study is the integration of computational and experimental approaches. The manuscript is generally well written and clearly organized.
  
  Major comments:
  
  A key aspect of the study is the classification of predicted structures into three classes based on the conformation of the C-terminal domain (CTD): the autoinhibited alpha-helical fold, a beta-barrel fold, and a mixed alpha/beta fold. These classes are further described as corresponding to metamorphic (alpha fold), mixed alpha/beta, and monomorphic (beta fold) proteins.
  
  While I can see how this organizational scheme is helpful in some respects, it may also overstate what can be concluded from the data. As the authors are well aware, AlphaFold2 tends to predict a single conformation even for genuine metamorphic proteins, and therefore does not, on its own, distinguish between monomorphic and fold-switching proteins. I note in particular that the functional data indicates that the "monomorphic" variants studied in the in vivo assays behave similarly to the RfaH E48A mutant. However, E48A is known to remain metamorphic, populating both alpha and beta folds with roughly equal probability. This suggests that the sequences in this class may retain some degree of fold-switching capability, even if the underlying regulatory mechanism differs from that of wild-type RfaH. In other words, the presented data does not fully support these sequences as monomorphic. I am not suggesting that the authors must revise their classification scheme. However, it may strengthen the manuscript if the authors explicitly acknowledge this alternative interpretation and moderate the corresponding claims.
  
  We appreciate the comment from the reviewer, which can be seen from two different perspectives.
  
  On the one hand, it might be reasonable to think that the ‘monomorphic’ RfaH orthologs have lower transcription elongation activity than E. coli RfaH. Other highly divergent orthologs of E. coli RfaH (Salmonella enterica serovar Typhimurium, Klebsiella pneumoniae, Yersinia enterocolitica and Vibrio cholerae) have similar in vitro recruitment and pausing at the C45 nucleotide from the ops element, as well as restoring the RfaH-dependent hemolytic activity of E. coli in a strain that lacks chromosomal RfaH to levels similar to the wild-type strain (doi: 10.1128/jb.186.9.2829-2840.2004). However, V. cholerae RfaH (43% sequence identity to E. coli RfaH) exhibits diminished antitermination effects in in vitro transcription assays, better resembling the antitermination levels in the absence of RfaH (doi: 10.1128/jb.186.9.2829-2840.2004), despite this protein also being predicted in the alpha-folded state when using AF2 (10.1016/j.csbj.2022.10.024). A particular observation from the RfaH complementation work is that increasing the concentration of V. cholerae in in vitro transcription assays lessens the transcription elongation effects observed when using concentrations similar to E. coli RfaH. These transcription elongation defects can be extrapolated to potentially similar issues with transcription in vivo and, therefore, luciferase translation in our in vivo translation assays for our ‘monomorphic’ proteins.
  
  On the other hand, it is possible that these so-called ‘monomorphic proteins’ still populate the alpha-folded state, but that their predominant fold in solution is the one corresponding to the active beta-fold. This can be biophysically tested using circular dichroism to distinguish their alpha or beta propensity, as proposed in a remarkable work from Porter et al (10.1038/s41467-022-31532-9).
  
  In both cases, quantification of the protein titers obtained after attempts of protein purification of the ‘monomorphic’ RfaH orthologs would be required. In this way, we can ascertain whether the differences in activity are due to differences in expression levels and determine if sufficient amounts of stable and well-folded protein can be obtained for these RfaH orthologs, followed by measuring their circular dichroism spectra to ascertain their secondary structure propensity.
  
  Our current attempts are to recombinantly express these proteins for determining their protein titers and solubility in the supernatant, which will enable us to indirectly ascertain their expression levels, and test those solubly expressed proteins biophysically using circular dichroism experiments. If the circular dichroism experiments prove to be unsuccessful due to problems with the solubility of the purified proteins, we strongly believe that the aforementioned discussion should be included in the manuscript to take into account the limitations of the methods utilized in our work.
  
  Therefore, we will add the following paragraph in the discussion, while we work on ascertaining the feasibility of the circular dichroism assays:
  
  “It is worth noting that, in the absence of RBS (Figure 5C-F), the putative monomorphic RfaH orthologs have similar or lower in vivo activity than the E. coli RfaH E48A mutant; a similar mutant (E48S) exhibits a 1:1 equilibrium between the autoinhibited and active states (Burmann et al, 2012). This observation can be partly explained by two factors. First, sequence divergence and expression levels may limit functional compatibility with the host machinery. Highly divergent V. cholerae RfaH ortholog, which shares only 43% sequence identity with E. coli RfaH but is predicted to fold into the autoinhibited state (Artsimovitch & Ramírez-Sarmiento, 2022), maintains both ops-dependent recruitment and hemolysin secretion in the ∆rfaH E. coli strain, yet exhibits transcription elongation defects in vitro, requiring a 5-fold higher concentration than E. coli RfaH to match increased elongation rates of E. coli RNAP (Carter et al, 2004). Low in vivo protein titers or structural mismatches between the monomorphic orthologs and E. coli RNAP may prevent higher luciferase expression relative to the E48A mutant. This limitation is supported by the fact that IPTG-induced overexpression rescues activity when an RBS is present (Figure 5B). Second, these proteins may be predominantly folded in the active state while still transiently populating the autoinhibited state. Confirming this conformational equilibrium would require overexpression and purification of these proteins followed by biophysical assays, such as circular dichroism (Porter et al, 2022).”
  
  Reviewer #1 (Significance (Required)):
  
  An intriguing, but speculative, aspect of the study is the finding that some sequences are predicted to adopt a CTD with mixed alpha/beta secondary structure, and that such structures also appear in targeted molecular dynamics simulations. If this idea holds up, it could represent an intermediate along the evolutionary pathway between the alpha-helical and beta-barrel folds of RfaH. Although the evidence is only computational, it is a compelling idea and it would benefit from further investigation.
  
  It is indeed very compelling, and this is something that we should immediately address in a revised version of our manuscript. We somehow missed an article published in 2025, regarding the study of the structural interconversion of the isolated CTD using NMR, finding at least three intermediate states along the fold-switching pathway of RfaH (doi: 10.1073/pnas.2506441122). One of such intermediate states observed, which is also one of the highest populated ones (~23%), corresponds to an ensemble of largely unfolded structures that include the formation of transient alpha-helix a5 (corresponding to helix a2 in our article) and beta-hairpin (b1/b2) secondary structure elements, which fold to form a compact ensemble of structures in which the beta-hairpin lies on top of the alpha-helix. This is fully consistent with our predictions of a mixed alpha/beta state in full-length.
  
  We will add this external experimental validation of the mixed alpha/beta secondary structure of the CTD of RfaH in the discussion of our final manuscript:
  
  “Interestingly, a recent nuclear magnetic resonance spectroscopy study of the E. coli RfaH CTD, aimed to uncover transient states potentially en route of the αCTD interconversion (Cai et al, 2025), described an intermediate state (populated in ~23% of the captured ensembles) in which a β-hairpin formed by β-strands β1-β2 lies on top of a transient α-helix α5 that corresponds to helix α2 in our article. This finding is fully consistent with the mixed α/β CTD structures found both in our TMD simulations and our AF2 predictions of divergent RfaH orthologs.”
  
  In summary, the work is a valuable contribution to the field of protein fold switching. The combination of computational tools with experimental validation makes it interesting and the results should be of broad interest. The manuscript should be well positioned for publication in a high-impact journal.
  
  We are very thankful for the reviewer’s comments on our manuscript.
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  Summary:
  
  In their paper "Exploration of the structural and functional diversity in the metamorphic RfaH subfamily," Tabilo-Agurto et al. use AlphaFold2 to predict the structures of ~3,900 RfaH homologs, sort the predicted C-terminal domains into α-helical (autoinhibited), β-barrel (NusG-like), and mixed α/β topologies, and find that about 14% of homologs come out predominantly in the β-barrel state. They then take nine representative homologs and run them through a heterologous *E. coli* DH5α Δ*rfaH* reporter assay. The putative monomorphic candidates behave a lot like the constitutively active E48A variant - active across every ops context and even without an RBS - while the mixed α/β candidates barely show activity. Targeted MD simulations of *E. coli* RfaH, run through AF2Rank, also pulls out the mixed α/β state as its own distinct cluster, hinting that it sits somewhere along the fold-switching transition path.
  
  This is a genuinely interesting piece of work that pulls together structure prediction, in vivo activity, and genomic context to make a concrete case for extant monomorphic βRfaH proteins - a long-hypothesized but until now unseen intermediate in the proposed stepwise evolution of RfaH from NusG. The experimental design is thoughtful, especially the five-construct ops/RBS matrix, and comparing the monomorphic candidates against the E48A benchmark is a nice touch as a positive control. Overall, I think the paper deserves to be published, but a few things would need shoring up before acceptance.
  
  Major comments:
  
  The paper would be a lot stronger with at least one biophysical measurement (a CD spectrum, say) on a purified monomorphic candidate. I get that this might be outside the planned scope, but even a single CD trace showing β-rich content for an isolated full-length protein would move the claim from "putative" to "demonstrated."
  
  We agree with the comment from the reviewer, and as such we are currently attempting to recombinantly express these proteins for determining first their solubility after purification (which will largely determine our ability to characterize them by circular dichroism) and then follow up with circular dichroism experiments if the solubility and protein concentration of these ‘monomorphic’ homologs is sufficient to pursue these experiments. In case this is unfeasible, we will include the solubility analysis in our revised version of the article, as well as a discussion on this topic – and also on the topic of why the activity of the ‘monomorphic’ proteins resembles the E48A mutant of E. coli RfaH that co-exists between two folds – as indicated in our response to the major comment from reviewer #1.
  
  Only nine homologs were tested - three per category. The conclusions about monomorphic behavior generalizing across the whole βRfaH clade are basically resting on three proteins. Bringing in even one or two phylogenetically distant βRfaH candidates would help guard against the possibility that what they're seeing is just a genus-specific quirk. If new experiments aren't on the table, the limitation should at least be called out explicitly in the Discussion.
  
  We agree with the reviewer that drawing conclusions from a single clade of RfaH could raise concerns about bias, although we must note that the tested putative monomorphic candidates were selected before a phylogenetic tree was constructed. What we propose is to perform a phylogenetic analysis for the InterPro sequences and look at their genomic neighborhood as well, replicating what was done in the manuscript for the Genomic Cluster group. We hope this would provide more compelling evidence that the predictions, phylogeny and gene organization of these extant monomorphic RfaH is distinct from those metamorphic.
  
  The classification thresholds (α > 32.5% / β 30.0% / α
  
  Thanks to the reviewer for raising this concern. We will perform a sensitivity analysis by slightly nudging the cutoffs by ±5% as recommended by the reviewer and indeed we see minimal changes in the number of structures in each class. We have added a small paragraph indicating this sensitivity test:
  
  “To determine that these values were adequate for our analysis, we performed a sensitivity test by changing the thresholds by ±5% over the data for all structures predicted from all databases, showing that the predictions of RfaH orthologs with monomorphic CTD and mixed secondary structure in their CTD is robust, and only metamorphic RfaH orthologs were reduced with an increase in uncategorized structures (Supplementary Figure S13)”
  
  The Discussion notes that uncontrolled, ops-independent RfaH recruitment could be lethal, since RfaH outcompetes the much more abundant NusG. But if monomorphic RfaH proteins really are extant and stably maintained in these genomes, there has to be something keeping them from interfering with NusG's essential functions - maybe very low expression, restricted induction, or compensating differences in NusG affinity. The paper would benefit from tackling this directly, even speculatively.
  
  We agree with the reviewer in this point, and after careful consideration we believe that we did not emphasize this point appropriately in the manuscript. In fact, we included Figure 7 to state our perspective on how RfaH may have evolved but we did not emphasize how this perspective stems from a previous work that we thoroughly discussed in the introduction (doi: 10.1038/emboj.2008.268) and that explicitly states that low solubility of the dissociated NTD and CTD could be a factor imposing this restricted action in cis operons. We have included this in our revised version of the manuscript as follows:
  
  “Our findings are in line with the previous hypothesis regarding the emergence of RfaH within the universally conserved family of NusG transcription factors (Belogurov et al, 2009). Under that model, a gene duplication event produced an intermediate variant (NusG2 in Figure 7) that lost its Rho-binding capability and acquired a deletion in the NTD that reduced the protein's overall size and remodeled its hydrophobic profile. Crucially, this intermediate retained an exposed, hydrophobic RNAP-binding region, a feature shared by monomorphic RfaH and the ancestor of all RfaH orthologs (NusGSP in Figure 7). This increased hydrophobicity would have reduced solubility, restricting its regulatory activity to the site of synthesis, i.e. in cis. Indeed, when structural alignment is used to identify conserved NTD residues that bind to RNAP, orthologs contain more than 70% hydrophobic residues (Supplementary Figure S11) at those positions. This percentage is much closer to that of RfaH (80%) than NusG (57.14%). The protein only regains solubility and the ability to operate in trans when its CTD refolds into a helical conformation. Ultimately, our results strengthen this evolutionary model by demonstrating that several extant RfaH orthologs appear to resemble this insoluble, cis-acting ancestral state.”
  
  Minor comments:
  
  Table 1 should show percentages alongside the raw counts. 7/7 LPS-in-operon for monomorphic candidates is striking, but with n=10, the small denominator really deserves to be flagged.
  
  We agree with the reviewer in that the higher raw count of metamorphics may undersell the message the article conveys. We added the percentages next to raw counts in Table 1, regarding “Total” and “Next to operon” categories. We also modified the legend as follows:
  
  “A summary of genomic contexts of RfaH orthologs classified according to the AF2 predictions. The numbers indicate how many rfaH genes are next to an operon and whether the operon contains lipopolysaccharide biosynthesis genes, and the percentages next to them display the relation to the previous category, i.e, “Next to operon”/”Total” and “LPS in operon”/”Next to operon”.”
  
  In Figure 3, the sequence logo on top is informative - consider adding the number of sequences per dataset to the axis labels so readers can interpret the boxplot widths.
  
  We believe that this would be rather confusing for the readers, because it is counting all 5 structures predicted by AlphaFold2 for each sequence in each dataset that fit each classification, and thus the same sequence can lead to structures that are monomorphic, metamorphic of have mixed secondary structure in their CTD. Thus, the number of sequences per box plot will be higher than the number of sequences per dataset. For example, one sequence from InterPro can be present in more than one box plot, because different AlphaFold2 models can lead to the prediction of different states from the same sequence. We believe it is less confusing if it is presented as it is.
  
  There's some redundancy between the Results (pp. 14-17) and the Discussion that could probably be trimmed, particularly the recap of the ops/RBS construct logic.
  
  Thanks for the recommendation. We reduced this redundancy in the new version of the manuscript, mainly on page 16:
  
  “The orthologs classified as monomorphic, and thus expected to be constitutively active, exhibited activity across all tested ops contexts, including in the absence of RBS (Figure 5B-F). Notably, their activity levels were comparable to the ops-independent E. coli RfaH E48A mutant, in which the key salt bridge at the NTD:CTD interface is disrupted. All monomorphic orthologs were found to lack a few key residues that make contacts to ops DNA in RfaH, as well as the conserved residues in loop 2 that mediate contacts with Rho in NusG (Supplementary Figure 11). This mosaic architecture enables these orthologs to promote the expression of the long lux operon even when the RBS is absent. Our study provides the first indirect evidence of putative, constitutively active RfaH proteins, which are predicted to have monomorphic NusG-like fold, in other bacteria.”
  
  Reviewer #2 (Significance (Required)):
  
  If the central claim holds up, this is a meaningful contribution to the metamorphic-protein and bacterial-transcription literatures: it identifies what appear to be extant evolutionary "way-stations" in the NusG→RfaH transition, and it does so using a tractable computational pipeline that could be applied to other suspected fold-switch families. The work is timely given the ongoing discussion about how AF2 and its descendants handle conformational heterogeneity. With the strengthening suggested above - particularly any direct biophysical confirmation of a monomorphic candidate - I would expect this to be a well-cited paper in its niche.
  
  We are very thankful for the reviewer’s comments on our manuscript.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  Summary:
  
  The manuscript by Tabilo-Agurto et al. uses in silico and experimental methods to elucidate the diversity of the metamorphic RfaH protein family. Of particular note is the sophisticated usage of AlphaFold2 to reconstruct the evolutionary tree of RfaH as well as the in vivo luminescence assays to substantiate the different structural states of the RfaH-CTD. Overall this is a well-written manuscript providing deeper insight into the structural and functional diversity of RfaH proteins, potentially relevant for other metamorphic proteins as well.
  
  Minor comments:
  
  3rd paragraph of the introduction: The sentence starting with "To date, ...and nuclear magnetic resonance of these ancestors.." seems incomplete as this reviewer believes the author´s wanted to say "..and structural characterization by nuclear magnetic resonance spectroscopy of these ancestors..."
  
  Thanks for the attention to these details, we will amend this paragraph appropriately.
  
  4th paragraph of the introduction: "..., that binds Rho or the ribosome (Mooney et al. 2009b). Whereas this citation is correct for NusG-Rho interactions it does not indicate ribosome binding. The direct interaction of NusG with the ribosome was shown in Burmann et al. Science 2010 and this reference should be added here.
  
  Thanks for the recommendation, we will include both citations in this section of the manuscript.
  
  More a curiosity question, did the author also test for a subset of the RfaH variants the AlphaFold3 predictions and obtain similar or different results?
  
  Thanks for the comment. The reason we did not use AlphaFold3 predictions to check on the variability of the results is that there is much more known about the use of AlphaFold2 – and its limitations – regarding their use in the study of metamorphic proteins, whereas a deep understanding of the advantages and limitations of AlphaFold3 for studying metamorphic proteins is still under development.
  
  Referees cross-commenting:
  
  Overall there is an agreement among all reviewers that the present MS is an interesting and timely study. The point raised by reviewer 2 to add simple biophysical characterization, if feasible, would be clearly an excellent addition and likely make the MS stronger. In general all three reviewers mainly point to minor changes and additions to improve the MS in a rather short timeframe.
  
  We indeed agree with this comment, which is why we will commit to attempt the recombinant expression and protein purification of the RfaH orthologs and to perform circular dichroism assays if the solubility of the obtained proteins allows for such experiments to be done.
  
  Reviewer #3 (Significance (Required)):
  
  The present MS is an interesting large-scale usage of the AlphaFold2 algorithm to reconstruct the evolutionary tree of the specialized transcription elongation factor RfaH. Revealing a different degree of this evolution in a diverse set of bacterial strains indicating its evolutionary distance from the cognate NusG transcription elongation factor. Of particular note is the experimental verification of the obtained in silico finding by in vivo luminescence approaches.
  
  We are very thankful for the reviewer’s comments on our manuscript.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.03.16.712203
www.biorxiv.org www.biorxiv.org

Targeting ALC1 can safely expand the therapeutic utility of PARP inhibitors across high-grade serous ovarian cancers

1
1. EMBOpress 10 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Manuscript number: RC-2026-03474
  
  Corresponding author(s): Priyanka, Verma
  
  [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.
  
  *
  
  If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]
  
  1. General Statements [optional]
  
  Point-by-point rebuttal is presented below. Reviewer’s comments are in BLACK; author’s response is in BLUE and figure numbers corresponding to the manuscript are in RED.
  
  2. Point-by-point description of the revisions
  
  *
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required):
  
  ALC1 suppression has been shown to potentiate PARP inhibitor lethality in HR-deficient cells. Rather than revisiting the underlying mechanism, which has been characterized and remains an active area of investigation, this study aims to define the clinical contexts in which combined ALC1 and PARP inhibition may be beneficial. The clinical efficacy of PARP inhibitors, and their FDA approval, is largely restricted to HR-deficient tumors. This study dissects the combined effects of ALC1 and PARP suppression across a panel of HRD ovarian cancer cell lines, multiple classes of PARP inhibitor, and cells harboring distinct PARPi resistance mechanisms. In doing so, the authors delineate both the potential utility and the limitations of combined ALC1 and PARP inhibitor treatment in HRD ovarian cancers. The most impactful finding of the study, however, is likely the demonstration that ALC1 suppression sensitizes HR-proficient, CCNE1-amplified high-grade serous ovarian cancers to PARP inhibitors. These tumors are associated with particularly poor outcomes owing to the current absence of effective targeted therapies, making this observation of considerable clinical relevance.
  
  We thank the reviewer for appreciating the significance of our work in “HR-proficient, CCNE1-amplified high-grade serous ovarian cancers to PARP inhibitors” which is a critical unmet need.
  
  Of note, the study relies on genetic rather than pharmacological depletion of ALC1, a choice likely reflecting the current lack of a commercially available ALC1 inhibitor. While genetic suppression may not fully recapitulate the effects of combined drug treatment, it offers the advantage of not being tied to any specific compound, allowing the authors to establish more general principles. I have only a few comments.
  
  We are grateful to the reviewer for providing the unique perspective on our genetic study that “it offers the advantage of not being tied to any specific compound, allowing the authors to establish more general principles.”
  
  We have included this in our discussion to strengthen the study.
  
  The effect of ALC1 KO on PARPi sensitivity is less pronounced in OVSAHO cells (BRCA2-mutated) than in BRCA1-mutated cells. In these cells, it looks like there is an additive effect rather than synergy. 1- The authors should calculate, if possible, whether there is synergy or additive effect of ALC1-KO lethality (BLISS).
  
  We thank the reviewer for recognizing our limitations to perform BLISS score analysis, as our experiments were conducted at a single level of total protein depletion. Ideally, synergy assessments require a range of depletion levels to generate a full response matrix. Regardless, to address the reviewer’s concern regarding the impact of ALC1 on olaparib response in BRCA1- and BRCA2-mutant cells, we performed a BLISS score calculation under the conservative assumption that total ALC1 depletion alone has no effect on cell viability. We then employed the following formula for BLISS score calculation:
  
  Bliss Score =Eobs- (EA+EB-EAX EB)
  
  Where Eobs is viability of ALC1-depleted cells at a given drug concentration. This is observed impact upon combined loss of ALC1 and olaparib treatment.
  
  EA is impact on viability upon ALC1 depletion only. This was considered to be zero.
  
  EB is impact on viability on ALC1 WT in the presence of drug. This assesses the impact of drug alone.
  
  BLISS score was calculated at all non-saturating drugs concentration and then averaged to obtain a final BLISS value. We used the following cut off:
  
  > 10: Synergistic (the interaction is considered significant);
  
  -10 to 10: Additive (no significant interaction);
  
  __
  
  Olaparib
  
  Rucaparib
  
  Niraparib
  
  Veliparib
  
  Cisplatin
  
  UWB1.289
  
  22.34
  
  25.21
  
  13.24
  
  14.95334
  
  0.26
  
  JHOS-4
  
  37.27
  
  47.14
  
  26.3
  
  27.94
  
  -0.37
  
  OVSAHO
  
  19.34
  
  27.6
  
  23.2
  
  19.15
  
  7.04
  
  Kuramochi
  
  11.38
  
  11.98
  
  -3.56
  
  6.79
  
  -0.39
  
  We observe that ALC1 loss synergistically enhances olaparib and rucaparib response in both BRCA1- and 2-mutant cells. However, as correctly noted by the reviewer, we notice that the BLISS score is higher in BRCA1-mutant cells compared to BRCA-2 mutant, OVSAHO.
  
  In the revised manuscript, we have also included data for another BRCA-2-mutant cell line: KURAMOCHI (Fig.1d; Supp. Fig1b). We chose this cell line because, despite having a BRCA2-mutation, it is highly resistant to PARP inhibitors and cisplatin, owing to KRAS amplification. Notably, we observe that ALC1 loss can synergistically enhance the response of Kuramochi to olaparib and rucaparib.
  
  We have included a statement in the manuscript that the impact of ALC1 loss was more profound in BRCA1- versus BRCA2-settings. However, if acceptable to the reviewer, we would prefer not to include the BLISS values in the manuscript, as these calculations were not performed using the standard approach of titrating multiple levels of protein depletion.
  
  2- Another BRCA2-mutated cell line should be included.
  
  As discussed above, we have now included data from another BRCA2-mutant cell line, Kuramochi. Consistent with data in other BRCA-mutant cell lines, loss of ALC1 enhances olaparib and rucaparib sensitivity in these cells (Fig. 1d; Supp. Fig.1b).
  
  Minor comments: • Figure key is missing for S2C (I assume it's grey DMSO, blue olaparib)
  
  We apologize for this oversight. Figure key has now been included.
  
  Page 8: "BRCA1-mutant ovarian cancer cells eventually develop chemoresistance when exposed to PARPi for a prolonged period. Mechanistically, this is due to rewiring of ATR signaling, which enables RAD51 loading at DNA breaks and reversed forks independent of BRCA1 protein(25)." This sentence suggest this is the only existing resistance mechanism, which should be correct. Modify to "mechanistically, this CAN be due to", or "this is OFTEN due to".
  
  We thank for the reviewer for suggesting this important correction. This has now been fixed.
  
  Reviewer #1 (Significance (Required)):
  
  ALC1 inhibitors have been developed and clinical trials are starting. The significance of this manuscript lies in establishing the clinical potential for combined ALC1-PARP inhibition in high grade serous ovarian cancer. Especially, the authors demonstrate that combined ALC1 suppression with PARP inhibition efficiently kills HR-proficient CCNE1-amplified ovarian cancers, which represent 20% of ovarian cancers and are resistant to current therapies.
  
  __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ The manuscript by Lindsey et al. explores the role of ALCN1 (Amplified in Liver Cancer 1) loss in enhancing the sensitivity of PARPi in ovariar carcinomas, including BRCA1/2 mutated tumors (both sensitive and resistant to platinum) as well as cyclin E amplified settings. The data are interesting but the in some cases there is an overinterpretation of the results. I have listed below my major concerns.
  
  We appreciate that the reviewer finds our data interesting. We also appreciate the reviewer insightful comments and have addressed them below.
  
  Figure 1. Could the authors demonstrate that OVASAHO cells are BRC2 muted? Indeed, I have always though they were BRCA wt type (10.1016/j.ygyno.2015.08.017).
  
  OVSAHO cells have a homozygous deletion in the BRCA2 gene (PMID:23839242), which could be the reason why a mutation was not detected in the study referred to by the reviewer (PMID: 26321251). We have now included the Domcke et al; 2013 reference in manuscript. The loss of BRCA2 expression in OVSAHO is also evident in our blots (Fig. 1a), as well as in data from protein atlas analysis.
  
  While the data on cisplatin suggest that indeed ALC1 loss do not impact its sensitivity, I disagree with the statant that "the correlation between dispensability of ALC1 in platinum response suggests that this chromatin remodeler likely does not contribute to MMEJ (page 6)" or " is dispensable for HR (page 7). Indeed, it is has to be stressed that cisplatin induced DNA damage (interstrand crosslinks) are substrates also for nucleotide excision repair, that has a key role in repairing these lesions.
  
  We agree with the reviewer that transcription-coupled NER is the key pathway for the resolution of cisplatin-induced damage. We therefore have revised this statement in the manuscript as “Our data showing the dispensability of ALC1 in cisplatin response, both in BRCA1 and 2-mutant settings, is consistent with previous reports demonstrating the dispensability of this remodeler for MMEJ or transcription-coupled nucleotide excision repair.” We have cited previous work where ALC1 has been shown to be dispensable for MMEJ or TC-NER. Similarly, we have modified the text on page 7 as “Furthermore, ALC1 loss did not impact sensitivity to cisplatin in HRP cyclin E1-high cells. This observation is consistent with previous studies showing its dispensability for HR repair.”
  
  Figure 2. Please explain better why niraparib is not active in cyclinE1-high cells.
  
  Our comprehensive studies examining the impact of ALC1 depletion on PARPi response uncover the generalized theme that targeting is most effective in enhancing sensitivity of olaparib and rucaparib, which have moderate PARP1/2 trapping ability, as compared to niraparib and talazoparib, which are strong trappers. One possible explanation could be that moderate PARP1/2 trappers are more amenable for combination strategies because their effects do not reach full saturation, preserving a dynamic range that allows for additive or synergistic enhancement. This was included in the discussion section of the manuscript.
  
  It is not clear to me if the authors consider a cyclin E "gain" an overexpressing tumor (i.e. OVCAR8). The authors need to show the response to PARPi in one (possibly two) cell lines with very low expression of cyclin E and knock-down of ALC1.
  
  We have present data in multiple BRCA1-WT cell lines with very low expression of cyclin E compared to OVCAR8. These include: FT282 cell line (Fig. 4), two FT282 clones of BRCA1-/+ FT cells (Fig. 5), and full length BRCA1 addback UWB1.289 (Fig. 3c). Additionally, we have added immunoblotting data showing that in OVCAR8, the level of cyclin E1 protein and activity as assessed by pCdk2 is comparable to OVCAR3 and OVCAR4, two CCNE1-amplified lines (Fig. S2d). In contrast, FT 282 and UWB1.289 BRCA1 add back cells have low levels of cyclin E and thus low pCdk2.
  
  The deletion of ALC1 do interfere with tumor take and tumor growth? No clear is the in vivo experiments.
  
  Tumor uptake: We injected OVCAR8 cells in mice three days post-transduction of sgALC1. Depletion of ALC1 is only achieved at 14 days post transduction. This explains why tumor uptake is not impacted. We do not observe a significant impact of ALC1 loss on tumors derived from OVCAR8 cells. This is consistent with the dispensability of ALC1 in the proliferation of HR-proficient cells (PMID: 33333017; PMID: 33462394). We have added text in the manuscript to clarify this point.
  
  Injecting OVCAR8 cells in the peritoneum is not associated with the formation of ascites?
  
  We thank the reviewer to bring up this important point. The objective of this study is to examine how ALC1 loss can enhance PARPi responses and therefore we chose an earlier time point (~50 days) to assess the impact on tumor growth. Ascites formation upon intraperitoneal injection of OVCAR8 cells has primarily been reported at late stages of disease development. For example, Anirban Mitra et al. (2015) (PMID: 26050922) reported consistent ascites formation, but only at extended timepoints (up to ~90 days post-injection). Similarly, Yong-Tae Shen et al. (2019) (PMID: 31117198) injected 5-10 x106 cells and observed ascites emergence beginning around day 49, with progressive accumulation toward the endpoint, indicating that fluid buildup coincides with advanced peritoneal dissemination. In contrast, studies using comparable inoculation doses (e.g., 1×10⁶ cells) and shorter observation periods (~6 weeks) such as Luis Hernandez et al. (2016) (PMID: 27235858) did not report detectable ascites. Taken together, these findings suggest that, while OVCAR8 cells can generate ascites, this phenotype typically manifests at later stages of disease progression and is not expected within shorter experimental windows. Therefore, the absence of ascites in our model is consistent with the study design and timeframe, rather than indicative of a failure of tumor establishment.
  
  We have added relevant discussion in the results section to clarify this point.
  
  How was tumor weight calculated?
  
  Tumor burden was quantified by direct collection and measurement of peritoneal tumor nodules. For the sacrificed mice, all visible tumor nodules within the peritoneal cavity were carefully excised, counted, and pooled per animal. The total tumor weight was then determined by weighing the combined mass of all collected nodules using an analytical balance. Thus, “tumor weight” represents the cumulative mass of macroscopic peritoneal implants per mouse. No estimations or indirect calculations were used. This has now been elaborated on in the methods section.
  
  It seems that tumors grow as solid mass, but how were nodulesAll mice at endpoint exhibited disseminated peritoneal disease, characterized by multiple tumor nodules and invasion into the peritoneal wall. Tumor nodules were quantified by direct visual inspection during necropsy. Small nodules ( Why survival curves were not shown?
  
  Survival analysis was not included because the study was designed with a predefined experimental endpoint to enable controlled comparison of tumor burden across groups. Animals were therefore euthanized at the same timepoint rather than followed longitudinally to survival. As a result, Kaplan–Meier analysis was not applicable to this experimental design. We agree that survival is an important outcome and would be valuable in future studies specifically powered and designed for that purpose.
  
  The dose of 50mgr/kg every third day is a very low olaparib dose. Generally the in vivo dosing is 100mgr/kg , 5 days a week for 4 weeks (doi: 10.1158/1535-7163.MCT-21-0420; 10.1158/2767-9764.CRC-22-0423).
  
  We agree that higher doses of olaparib (e.g., 100 mg/kg, 5 days/week) are commonly used and have demonstrated single-agent efficacy in vivo. In this study, however, our objective was to specifically evaluate the combinatorial effect of olaparib with genetic knock-out of ALC1. To enable this, we intentionally employed a reduced dosing regimen (50 mg/kg every third day) to minimize single-agent activity. This approach allowed us to establish a condition in which olaparib in sgAAVS1 control tumors had limited impact on tumor burden, thereby providing a dynamic range in which to detect potential sensitization effects mediated by sgALC1. Using a fully efficacious dose would likely mask such interactions by producing a near-maximal response in the control group. Thus, the selected dosing strategy reflects a deliberate experimental design to assess potentiation effects rather than to model maximal therapeutic efficacy of olaparib as a monotherapy.
  
  Figure 4. I could not find the data of the minimal impact of ALC1 in UWB1.289 cells. What the author refer to? They refer to the fact that ALC1 deletion di not cause any cell growth alteration or to something else? But were there the data?
  
  The minimal impact being referred to was PARPi responses in BRCA1-proficient UWB1.289. We have now fixed the statement to read: “The minimal impact of ALC1 in BRCA1-proficient UWB1.289 cells on PARPi responses suggested that targeting this remodeler may have minimal impact on normal healthy cells.” and included the relevant figure number (Fig.3c) for clarity.
  
  The modest increment in pRPA in hTER-FT282 is statistically significant and not very different from what observed in UWB.289, suggesting that ACL1 deletion could indeed impact normal cells. These data should be interpreted more conservatively.
  
  The increase in pRPA levels upon ALC1 loss in hTERT FT282 BRCA1 het cells and UWB1.289 cells is 1.2 and 1.4 respectively. This is consistent with the literature that BRCA1-/+ het cells have compromised replication stress response. Unresolved replication stress gets processed into double-strand breaks (DSBs). Consistent with the proficiency of hTERT FT282 BRCA1-/+ het cells in DSBs repair, ALC1 deficiency does not increase yh2ax in these cells. Hence, despite an increase in pRPAS33 signal in hTERT FT282 BRCA1 het cells, these cells can resolve downstream breaks. In contrast, a profound, 1.7-fold increase in yh2ax signal was observed upon ALC1 loss in BRCA-mutant UWB1.289 cells, reinforcing that ALC1 loss has a more profound response in BRCA-mutant cancer cells.
  
  To align with the reviewer’s suggestion, we have removed the word “modest’ and have retained the fold differences in the median values.
  
  Figure 6. Questionable is the OS as endpoint in this heterogeneous patient population (treated in front line and recurrent) and in my opionion OS, much more than PFS, is influences by the many different treatment these patients underwent and that could influence the OS. Why not considering PFS after/or on PARPi treatment? The authors should clarify the patient population, Indeed, 48 patients were treated with PARPI and were platinum sensitive and possibly HRD. What patients are the HPR patients? How many were they? It is not clear the HRP and high replication stress cohort were treated with PARPi? How many of these were Cyclin E amplified or with high levels? Figure 6F should also include, beside UVB+BRCA1, other tumor cells with no Cyclin E overexpression and non BRCA mutation or HRD. The discussion of limitations should be addressed to strengthen the manuscript.
  
  We thank the reviewer and agree that PFS is often preferred for evaluating treatment-specific effects. However, in this cohort, PFS was not a reliable endpoint for several reasons. Tumor samples were obtained at diagnosis, whereas PARPi was administered later, in either the frontline maintenance or recurrent setting, introducing temporal and prognostic heterogeneity that limits the interpretability of PFS. These factors confound attribution of PFS specifically to PARPi response. We therefore selected OS from the time of PARPi exposure as a more consistently defined endpoint across this heterogeneous cohort, while acknowledging its limitations.
  
  Reviewer #2 (Significance (Required)):
  
  The manuscript by Lindsey et al. explores the role of ALCN1 (Amplified in Liver Cancer 1) loss in enhancing the sensitivity of PARPi in ovarian carcinomas, including BRCA1/2 mutated tumors (both sensitive and resistant to platinum) as well as cyclin E amplified settings. The data are interesting but the in some cases there is an overinterpretation of the results.
  
  __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __ The manuscript by Aubuchon, Wong et al. presents strong insights into the value of ALC1 as novel target for sensitization strategies against PARPi. The authors show that a PARPi resistance is reversible when ALC1 is knocked down and convincingly highlight the genetic circumstances for these approaches. Also, the authors point out that especially the weak PARP-trappers olaparib and rucaparib could benefit from concomitant ALC1 inhibiton and high levels of replication stress by elevated p-T21 RPA2 could serve as biomarker in clinical settings. Furthermore, the authors show that benign fallopian tube cells are not affected by ALC1-kd, which is an important finding for in vivo approaches.
  
  We thank the reviewer for acknowledging that our work provides “strong insights” and makes “important finding for in vivo approaches”.
  
  As the manuscript covers a broad experimental field, I would only suggest a few additional experiments to further strengthen the overall story:
  
  How does an ALC1 knock-down affect the expression of PARP1 and if so, how does this contribute to the effects seen by ALC1-kd? The authors could add Western Blot experiments for cell lines belonging to the respective groups that are distinguished in the manuscript: BRCA wt, BRCA mutated and Cyclin E1-high cancer cells and also a benign fallopian tube cell line.
  
  This was an interesting point brought up by the reviewer. To address this, we examined and compared total PARP1 protein levels in BRCA1 add-back UWB1.289, BRCA1-mutant UWB1.289, cyclin E1-high OVCAR8, and FT282, between ALC1 WT and depleted cells. However, we do not observe any consistent alteration in PARP1 level upon ALC1 depletion (Fig. Supp. Fig. 6a, b).
  
  In some of the Western Blot data, it also looks like BRCA1 expression is affected by ALC1 kd. The authors could provide some quantified protein expression or qPCR data if there is a correlation between both expressions.
  
  To address the reviewer’s question, we quantified changes in BRCA1 levels upon ALC1 loss across all cell lines used in this study. As expected, BRCA1 levels were higher in UWB del 11q and Cyclin E1-overexpressing cell lines. In contrast, cell lines harboring heterozygous BRCA1 mutations or BRCA1 promoter methylation were among those with the lowest BRCA1 expression. This trend provides us confidence in reliably quantifying our immunoblotting data. Although minor fluctuations in BRCA1 protein levels were observed following ALC1 depletion, no consistent trend towards either an increase or decrease was evident (Fig. Supp. Fig. 6c). Likewise, when cell lines were grouped according to their sensitivity to PARP inhibition upon ALC1 loss, no clear pattern emerged (Fig. Supp. Fig. 6d). Together, these data suggest that ALC1 depletion does not substantially affect BRCA1 protein levels, consistent with our previous RNA-seq and functional studies indicating that this chromatin remodeler is dispensable for transcriptional regulation or homologous recombination (PMID: 33462394).
  
  To further strengthen the hypothesis that the effects of strong PARP-trappers are not improved by ALC1 kd, the authors should add data regarding the viability of the cells presented in Figure 3b upon treatment with niraparib and talazoparib in sgALC1 cells (versus vector control). Also, the authors should add cell viability data using talazoparib for the sgALC1 OVCAR cell lines (versus vector control) in Figure 2 and Supplement Figure 3.
  
  Sensitivity to niraparib and talazoparib upon ALC1 depletion have now been added in Figure 3b, and for OVCAR lines in Supplement Figure 3. As correctly pointed by the reviewer, we consistently observe that impact of ALC1 loss is more profound on olaparib and rucaparib compared to niraparib and talazoparib.
  
  Some minor points I noticed while reading the manuscript:
  
  We apologize for the oversight and thank the review for pointing this out.
  
  in Figure 3b, both graphs have the same title. I think the right one should be "SYr14" instead of "SYr12" again
  
  Fixed. - In the heading of Figure 2 an "in" is missing
  
  Fixed.
  
  There are some citations, that seem to be made with another citation style (superscript numbers) than numbers in brackets across the manuscript.
  
  Fixed.
  
  Reviewer #3 (Significance (Required)):
  
  The most important aspect resulting from this manuscript is that ALC1 inhbitors could improve the response to some PARPi without damaging healthy cells. Thereby, the authors also mention the limitation of the use of ALC1 as a target and offer a potential biomarker for combinatory approaches. This study offers a very detailed insight into the potential role of ALC1 as a target for sensitization approaches under the different genetic conditions that can occur in HGSOC. These novel insights contribute to further broaden the therapeutic options by PARPi in clinical settings if the results can be approved by in vivo trials.
  
  *
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2025.12.04.692458
dhq.digitalhumanities.org dhq.digitalhumanities.org

DHQ: Digital Humanities Quarterly: The Landscape of Digital Humanities

2
1. nathanielelder 10 Jun 2026
  
  in Public
  
  The individual term digital humanist may be problematic because it may seem both too general in not relating to a specific discipline or competence (thus deemphasizing the discipline-specific or professional) and too specific in emphasizing the “digital” part of the scholarly identity (if you are scholar) or giving too much prominence to the humanities part of your professional identity (if you are a digital humanities programmer or a system architect)
  
  I agree with the authors critique of the term digital humanist, most specifically the point about the term being too specific in emphasizing the "digital" part, as I don't believe we exist in a digital vacuum where there could be something that is purely digital. I think that the world we exist in has a lot of digital technology at play, but it is almost redundant to emphasize something coming from a digital perspective.
  
  I found an additional article (URL below) that also discusses this paradox in the term digital humanist. https://www.sciencedirect.com/science/article/pii/S2666659624000015?via%3Dihub
2. AnnaRendon 07 Jun 2026
  
  in Public
  
  In a conceptual and disciplinary map of the digital humanities, the encounters described above and the examples cited earlier would be distributed over a rather diverse territory. One important, distinguishing parameter is how different perspectives and initiatives relate to information technology and the digital. For example, as we have seen, traditional humanities computing tends to have a rather instrumental relationship to information technology, which serves primarily as a tool, whereas a cultural or media studies-based approach is more likely to focus on digital culture and the cultural construction of information technology as a study object.
  
  Interdisciplinarity solves some of these problems, I think. Rather than viewing the "digital" and the "humanities" and separate entities that we are trying to reconcile, perhaps there is value in addressing them as partners in addressing a unique problem according to its inherent complexities. The interesting question becomes, then, not which discipline owns the problem, but what kinds of relationships are required to understand it.
  
  This perspective aligns closely with the work of Julie Thompson Klein. In her foundational text, Interdisciplinarity: History, Theory, and Practice, Klein (1990) argues that “Interdisciplinarity is a means of solving problems and answering questions that cannot be satisfactorily addressed using single methods or approaches” (p. 196). Currently, this is the perspective that I maintain. Interestingly, this article and work on interdisciplinarity seem to have emerged around the same time. Perhaps this represents a societal as well as an academic turning point. As social, technological, and cultural problems became increasingly complex, traditional disciplinary boundaries may have become less capable of addressing them independently, creating a need for collaborative and interdisciplinary approaches to knowledge preservation and production.
  
  References
  
  Klein, J. T. (1990). Interdisciplinarity: History, theory, and practice. Wayne State University Press.
Visit annotations in context

Annotators

AnnaRendon

nathanielelder

URL

dhq.digitalhumanities.org/vol/4/1/000080/000080.html
hsigstad.github.io hsigstad.github.io

Paper Draft — Judicial Convictions and Political Accountability

1
1. econ_siggi 10 Jun 2026
  
  in Public
  
  Sample definitions are in Section 3.3. Panel A of Table 3 is a politician-level panel with one observation per politician per reference election. We use four reference elections (2008/2012/2016/2020) and include every politician who is a career politician at the reference election, defined as either currently holding elected office (mayor or councillor incumbents) or having won at least one prior election, regardless of whether they ran at the reference election themselves. Including sat-out career politicians (e.g. term-limited mayors taking a forced break before running again) avoids conditioning on having chosen to run in a given cycle, which would mechanically exclude precisely the politicians the filing event may push out of the cycle. First-time / never-won candidates are excluded: an improbidade allegation against such a politician cannot be tied to a documented office tenure. The mayor subsample additionally excludes politicians who are term-limited at the next election (consecutive two-term mayoral incumbents ending at the reference cycle), since they are mechanically barred from running for mayor again at lead and their Running=0 is not behavioural. Panel B is the case-level Disposition-OLS sample: cases filed in the one-to-four year window before the politician’s next election, restricted to politicians who held office at filing time (mayor or councillor at the pre-filing election) or have any prior electoral win. The lower bound at one year excludes filings landing in the election year itself, when no pre-election disposition is feasible. The upper bound matches Panel A’s window and keeps the case-filing event term-aligned. For Panel A, filed before electioni equals one if politician i had at least one improbidade case filed against them in the one-to-four year window before the corresponding next election. For Panel B, defined at the case level, convicted before electionij and acquitted before electionij equal one if case j naming politician i was decided as a conviction (resp. acquittal) before the politician’s next election, and zero otherwise. The two indicators are mutually exclusive within a case; their common zero category is the case still being pending as of the next election (cases decided after the election cannot affect voters by definition and are equivalent to pending cases for this analysis).
  
  I think all of these can be dropped. This is just repetition from Section 3.3. Sample construction, Instead, this paragraph should be straight to the point, saying something like: To estimate the association between filing and electoral outcomes, we estimate:"
  
  Then introduce the equations, and explain the variables, with emphasis on the variable of interest.
  
  I also think it is better if we explain the equation for Panel A first, and then for Panel B. We could potentially split into two paragraphs. All variables should be explained, and all outcomes.
Visit annotations in context

Annotators

econ_siggi

URL

hsigstad.github.io/ficha/paper/index.html
www.biorxiv.org www.biorxiv.org

A pilot study for whole proteome tagging in C. elegans

1
1. Public_Reviews 10 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Eroglu and Hobert demonstrate that injecting CRISPR guides and repair constructs to target three genes at a time, tagging each with a different fluorescent protein, and selecting which gene to tag with which fluorophore based on genes' expression levels, can improve efficiency of gene tagging.
  
  Strengths:
  
  This manuscript demonstrates that three genes can be targeted efficiently with three different fluorophores. It also presents some practical considerations, like using the fluorophore least complicated by agar/worm autofluorescence for genes with low expression levels, and cost calculations if the same methods were used on all genes.
  
  Weaknesses:
  
  Eroglu has demonstrated in a previous publication that single-stranded DNA injection can increase efficiency of CRISPR in C. elegans, while inserting two fluorescent proteins and a co-CRISPR marker into three loci, and Paix et al 2015 demonstrated simultaneous insertion of two fluorescent tags. The current work is valuable and incremental advance. In general, I applaud the authors' willingness to strategize about how whole proteome tagging might be accomplished. I predict that the advance here will be one of many small advances that will get the field to that goal. The title oversells the advance presented, in my view, since seems like one among many key advances, and the first sentence of the Discussion seems a more apt summary of the key advance here.
  
  Some injections targeted genes on the same chromosome together, which will create unnecessary issues when doing crossing that will be useful for some future experiments. This made me wonder if injecting 3 together really is helpful vs targeting each gene separately, since only 5 worms need to be injected. It cuts time down by 2/3, but perhaps avoiding targeting the same chromosome with two tags would be useful.
  
  The limited utility of current blue fluorescent proteins makes me wonder if it's worth using at this stage, before there are better blue fluorescent proteins, or better yet, far red, to avoid issues with live imaging under phototoxic UV or near-UV illumination.
  
  These comments are a repeat of the original comments, and we refer the reader to our response to the original comments.
  
  Reviewer #2 (Public review):
  
  Original Review:
  
  The manuscript by Eroglu and Hobert presents a set of strains each harboring up to three fluorescently tagged endogenous proteins. While there is technically nothing wrong with the method and the images are beautiful, we struggled to appreciate the advance of this work - who is this paper for?
  
  As a technical method, the advance is minimal since the first author had already demonstrated that three mutations (fluorophore insertion and co-CRISPR marker) could be introduced simultaneously.
  
  As a pilot for creating genome-scale resources, it is not clear whether three different fluorophores in one animal, while elegantly designed and implemented, will be desired by the broader community.
  
  Finally, the interpretation of the patterns observed in the created lines leaves much to be desired. A Table with all the observations must be included and can replace the tedious (and often wrong) descriptions of the observations with the different lines. It would be too much to point out every mistaken expectation of protein expression. Two examples include:
  
  The expectation that ACDH-10 is enriched in the intestine and epidermal tissues (hypodermis) is naïve - there are multiple paralogs of this protein (look at WormPaths or WormFlux) that may share functions in different tissues. There is also no reason to assume that fatty acid metabolism does not occur in other tissues (including the germline). Finally, there are no published studies about this enzyme, so we really don't know for sure what it's doing.
  
  The expectation that HXK-1 is ubiquitously expressed is similarly naïve. There are three paralogous enzymes that are all associated with the same reaction, and we have shown that these three function redundantly in vivo, perhaps in different tissues (PMID: 40011787). Moreover, single cell RNA-seq data (PMID: 38816550) also shows enrichment of hxk-1 in gonadal sheath cells.
  
  The table should have at least the following information: gene/protein name - Wormbase ID - TPM levels of single cell data assigned to tissues for L2, L4 and adult (all published) - tissues in which expression is observed in the lines presented by the authors.
  
  Other points:
  
  (1) We would encourage the authors to provide systematic validation of the reported insertions. The manuscript reports that 24 of 30 tags were isolated and visible but does not clearly state whether each isolated line was confirmed by sequence‑level validation to be correctly in‑frame and free of unintended mutations at the target locus.
  
  (2) The manuscript presents aggregated success counts (e.g., 8/10 mTagBFP2 tags, 9/10 mStayGold, 7/10 mScarlet3) and useful narrative descriptions of injection outcomes. We suggest also to include per‑locus success rates.
  
  (3) For pools that required re‑injection after initial failures, we would like to see a description of the specific changes that were made to the injection mixes or procedures (e.g., new repair template prep, different Cas9 reagent lot, guide redesign). This will be useful troubleshooting information for others.
  
  (4) The authors states that the fluorophore sequences are codon-optimized for C. elegans. We suggest they provide the exact donor/tag sequences used specifically state whether the fluorophore sequences contain any synthetic/artificial introns or other sequence modifications (e.g., silent PAM‑disrupting mutations) were included in the donor templates.
  
  (5) Page 3: Include a reference for "The C. elegans genome encodes around 20,000 genes"
  
  We hope these comments are useful.
  
  Comments on Revised Version:
  
  Overall, we found the responses to be quite recalcitrant.
  
  We have one remaining composite concern about the comparison between observed expression patterns with the new strains versus published data.
  
  First, the authors only report patterns for one stage while it should be not too much effort to image the different life stages. However, since this is a revision, we are not formally requesting they do this.
  
  Second, in the now provided Table (thank you) 'observed expression' (last column) is lacking for 9 of the 30 proteins, and for 6 of these the procedure was not successful. Why not report patterns for the other three? It is confusing also because on page 5, the authors say that "overall, 24 of 30 tags ...all of which were visible with fluorescence stereomicroscopy" - are we missing something? Also, they then said that they "obtained 6/9 of the originally failed tags"; why are the corresponding patterns not included in table 1, and are 9 proteins still labeled as "no" in the "success?" Column?
  
  We appreciate the chance to clarify this matter: There are only 6 “no” in the “success” column. In two cases, HAT-1 and CBP-1, expression was dim at F1 but still sufficient to pick positive worms and quantify success rate at the locus. We noted these as “dim” on the table to indicate that if expression was lower, we likely would not have been able to isolate them at F1. In one case, COX-6B, expression was too dim at F1 to be isolated but was sufficient at F2 to be visualized and isolated from parents that were positive for the other two tags. We now clarified this distinction in the table and accompanying text: “Fluorescent signals of HAT-1::mScarlet3 and CBP-1::mScarlet3 in F1 progeny were dim but still sufficiently visible for quantification of knock-in efficiency, indicating that they are at the lower end of detectability for mScarlet3.”
  
  We imaged worms that had multiple tags as proof of principle and are happy to provide strains to those who would like to image/study them. At this point we are not convinced that imaging more worms would add to the conceptual framework.
  
  Third, we strongly feel that the response to our comments about expression patterns is not adequate. On page 5 the authors say that "all proteins were expected to be ubiquitously expressed" and that "scRNA-seq indicated that transcript abundance was ubiquitous and without strong tissue-specific enrichment with few exceptions". However, in their rebuttal, the authors now argue for tissue-specific expression for proteins with paralogs, turning around their own argument! Moreover, their Table indicates that many genes show tissue-enriched expression by RNA-seq while many of their tagged proteins exhibit ubiquitous expression.
  
  We respectfully disagree that there is contradiction. In our response, the discussion on paralogs was added as a clarification in response to the referee’s original comments (e.g., regarding ACDH-10): “There is also no reason to assume that fatty acid metabolism does not occur in other tissues (including the germline).” We wanted to make it clear that we were not concluding fatty acid metabolism (or other processes) does not occur in other tissues.
  
  We wish to stress that we never argued that paralogs could not fulfil the same essential function across tissues. The proteins were selected because their biological functions (e.g., glycolysis, fatty acid β-oxidation, translation) are broadly required, and that scRNA seq generally predicted broad expression with few exceptions as detailed in the text. Paralogs with similar activities (e.g., hxk-1, -2, -3) may overlap broadly in expression, or individual paralogs may carry out the process in different tissues provided one carries out the reaction in each tissue. For acdh-10 and hxk-1 specifically, both appear broadly expressed across tissues by scRNA-seq, with no consistent enrichment or depletion across datasets. So, our central point is that: for a specific gene involved in an essential process, transcript data alone are not sufficient to accurately predict tissue specific enrichment. Not that the processes do not occur in tissues where one paralog is absent. The possibility that a paralog may compensate for lack of expression is in no way contradictory with our conclusion.
  
  The table does not generally show tissue-enriched expression: it simply lists three tissues with the highest quantitative value in the respective dataset. For instance, taking the first gene from the list (Y82E9BR.3) and looking at the Ghaddar dataset, the top 3 tissues (log2(TPM)) are: pharyngeal muscle (13.4), gonadal sheath (12.9), marginal cells (12.9). The next 3 tissues are: body wall muscle (12.9), pharyngeal epithelium (12.8), and intestine (12.3). Even when there were apparent enrichments among the top 3 tissues, there were significant disagreements between datasets, and beyond top 3 even greater disagreements (the datasets agreed on the top tissue only 4 times over the 30 genes). These indicate that much of the variation is attributable to experimental noise rather than true predicted enrichment. The referee points to HXK-1 being correctly gonadal sheath enriched in one scRNA dataset; however, the other two datasets actually show different sites as being highest, and the same dataset misses effects in other cases. This is precisely why protein level data is needed.
  
  We further clarified this issue in the text: “We thus selected 30 genes across a variety of bulk transcript expression ranges which are generally predicted to be broadly expressed based on molecular function or, where molecular function was unknown (e.g., ZK632.9), single cell RNA sequencing (scRNA-seq) data (Table 1, Fig. 2A, B) (Gao et al., 2024; Ghaddar et al., 2023; Taylor et al., 2021).”
  
  Overall, this indicates that both the overall accomplishment of generating tagged protein strains and analyzing their expression is oversold.
  
  We have tried to make clear that our contribution is not a handful of new tagged strains added to the many that already exist. Rather, as stated in the abstract and elsewhere, we propose a strategy and provide proof-of-concept for scaling up tagging efforts. We believe the importance of this cannot be oversold.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors argue that establishing the expression pattern and sub-cellular localisation of an animal's proteome will highlight hypotheses for further study. This claim is probably accepted by many in the community. This manuscript seeks to confirm the feasibility of establishing such a resource, by using current transgenic methods to knock in DNA encoding different colored fluorescent tags into C. elegans genes.
  
  Strengths:
  
  The authors make the points above. For example, they provide evidence that the C. elegans germline harbors two populations of mitochondria that differ qualitatively in the proteins they express. They also confirm that labelling the whole proteome is an achievable goal with relatively limited resources and time.
  
  Weaknesses:
  
  The work is somewhat incremental in that it uses existing transgenic technology. Cell biology in C. elegans is challenging because of the small size of many of its cells, notably neurons. This can make establishing the sub-cellular localisation of a fluorescently tagged protein, or co-localizing it with another protein, tricky. The authors point out in their introduction that advances in light microscopy such as diSPIM, STED and ISM (a close relative of SIM), have increased the resolution of light microscopy. They also point out that recent advances in expansion microscopy can similarly help overcome the resolution limit. However, they do not use these technologies to characterize their transgenic strains.
  
  Reviewer #4 (Public review):
  
  Summary:
  
  Tagging the entire proteome of a metazoan would be a landmark achievement, providing a powerful complement and extension to existing "omic" catalogs in model systems. Here, Eroglu and Hobert argue that efficiently tagging multiple loci in a single "batch" would make the community-based achievement of this goal realistic. They provide rigorous evidence that such an approach is indeed feasible, exploring issues related to efficiency, design and screening strategies, disruption of gene function, and the potential for endogenously tagged alleles to reveal unexpected aspects of protein expression and localization. While the work has some minor gaps that are important to rigorously assess the feasibility of the proposed effort, the detailed and valuable insights that emerge should provide impetus to the community to coordinate efforts to make this ambitious goal a reality.
  
  Strengths:
  
  The work has numerous strengths. The authors provide compelling evidence that:
  
  Three distinct loci can be efficiently targeted with three distinct fluorescent tags in a single injection.
  
  Thoughtful targeting design can reduce the likelihood of disruption of function by the tag.
  
  Systematic design principles based on expression level and predicted localization/function can be used to optimize tagging strategies.
  
  The resulting tags can provide unexpected insight into patterns of protein production and subcellular localization.
  
  Not all of these advances are novel in themselves, but taken together, they represent an important technical and conceptual advance. The most important strength comes from the exceptionally high value of the goal itself, in that the work is that it has the potential to spur a community-wide effort toward achieving the ambitious goal of proteome-wide tagging.
  
  We appreciate the referee’s enthusiasm and hope that this will engage members of the community in a collective effort.
  
  Weaknesses:
  
  The work's shortcomings are minor.
  
  One concern has to do with the feasibility of the proposed screening strategies. The experimental design cleverly coinjects tags for three loci in different gene expression 'zones'; this expression level determines which tag will be used. As the authors allude to, there is an important distinction between genes with the same overall FKPM value between those that are expressed broadly and those focally expressed in a specific tissue. The proposed strategy claims that there are a sufficient number of highly expressed genes "to be used as visible markers" for recovering successfully edited animals. It would be useful for the authors to discuss the issue of broad vs focused expression among this set of genes a bit more thoroughly, with an eye toward the issue of how likely it is that these genes could indeed consistently be used as visible markers, particularly for those at the low end of this limit.
  
  To give two examples, this principle aided us with screening F54C8.1 and HAT-1. We added additional discussion on this to the first paragraph of the discussion: “For instance, we could clearly visualize F54C8.1::mScarlet3 in adult sperm by fluorescence stereomicroscopy despite a bulk FPKM of 16. Similarly, nuclear localized proteins will likely be easier to detect even at low expression levels, given the concentration of signal in small subcellular compartments. Indeed, this helped us detect HAT-1::mScarlet3 (56 bulk FPKM), which may have been too dim if distributed more broadly within cells.”
  
  What fraction of the proteome (on a per-gene basis) is secreted proteins? How difficult will it be to screen these for successful tags? Are there specific tags that would be more optimal for secreted proteins? (The authors mention the use of an SL2 or T2A cassette to label the cells in which these proteins are expressed but note that there are technical challenges associated with doing this at scale.)
  
  We added some of these points to the discussion: “Moreover, around 17% of the C. elegans genome (3,484 genes) may encode for secreted proteins (Suh and Hutter, 2012). Endogenous tagging of a substantial fraction of these proteins could reveal spatial patterns of secretion, distinguishing components that remain near their cell of origin from those that disperse to distal sites (Keeley et al., 2020). Tagging secreted proteins can also reveal sites of secretion – such as apical or basolateral membranes, or neurites – as has been observed for specific insulins (Sural et al., 2025) and for neuropeptides that localize selectively to synaptic regions (Toker et al., 2025).”
  
  Various tags have been used for secreted proteins including Venus, TagRFP, and mNeonGreen. The pH of secretory vesicles is ~5.0-5.5, so chosen FPs should have a pKa below this range to avoid denaturation. All 3 fluorophores used here (mStayGold, mScarlet3 and mTagBFP2) have pKa’s below this range and would likely be fluorescent within secretory vesicles.
  
  For secreted and/or weakly expressed genes, it would be useful for the authors to estimate for what fraction of these would successful insertions need to be screened by PCR, and what resources (time and money) this would likely entail.
  
  We think that the bulk of ECM proteins would likely be visualizable without PCR due to their broad and stable expression, and as mentioned a good portion of these have been already tagged. However, it is likely that most of the secreted small peptides will have to be screened by PCR. We use homemade Taq, which makes material cost of the reagents minimal. A pair of genotyping primers costs ~$8 (~$27,872 for all secreted genes).
  
  Hands on time for lysis of 48-96 worms is approximately 20-30 minutes, with time to set up PCR around 5-10 minutes per target, and time to load a gel of 10 mins. In a given pool, 2/3 could be a putative secreted protein; thus, the same lysed population would enable screening for two targets at once. Collectively, around 40-60 mins of hands-on time would be required for two genes (around 20-30 mins per gene). Given 18 targets are injected per day, if 12 are screened by PCR, the screening could be done in 6 hours per day without affecting throughput. Most of the time spent on PCR would be replacing fluorescence screening time and would not overlap with the rate limiting injection step, performed by a separate specialist.
  
  For how many genes would a single tag not capture all predicted isoforms?
  
  Around 25% of C. elegans genes are thought to undergo alternative splicing (PMID: 21177968), with on average, ~2 isoforms per transcript. Among our selected genes, we only had one case where a single tag would not capture all isoforms (flad-1). We examined an additional 30 random genes and found no more examples by chance. So, in our view, this will be rare though we recognize in some cases a practical decision will need to be made, which could involve consideration of expression levels of each terminal exon.
  
  Finally, some readers might object to the authors' assertion in the abstract that this work is "a first step in this direction" (presumably referring to designing a strategy for whole-proteome tagging). There is no concern that the authors are disregarding the extensive work of other groups, as they explicitly mention the contributions of other groups to the foundation that enables the present work. However, the spirit of the abstract could be misinterpreted by a well-intentioned reader.
  
  We appreciate the referee’s perspective and have reworded this phrase in the abstract to: “As proof-of-principle for scalable pooled tagging, we undertook a pilot study in the nematode C. elegans, in which we set out to tag 30 different genetic loci with three different fluorophores, with 3 tags being introduced at a time.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.02.09.704846v3
www.biorxiv.org www.biorxiv.org

A systematic interactome of SET1C expands its functional landscape and identifies candidate regulatory connections

1
1. Public_Reviews 10 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This study uses the yeast two-hybrid assay to identify proteins that may interact with yeast Set1 and other subunits of COMPASS/Set1C, the histone H3K4 methyltransferase, providing also some evidence for Set1 sumoylation and a role of SET1C methylating other factors in vitro. The results are valuable, and they should contribute to understanding the functions of the conserved SET1C complex, as they suggest potential functional connections with RNA biogenesis, chromatin remodeling, and non-histone methylation, whose implications would yet need to be explored. Nevertheless, apart from the fact that only a small subset of the Y2H interactions is further examined, the validating experiments are only partial or inconclusive, the strength of evidence being at this point incomplete.
  
  We present a systematic SET1C interaction map that provides a structured resource for generating and testing new hypotheses on SET1C function. We emphasise that these interactions represent a hypothesis generating resource rather than a set of validated protein–protein interactions. To reflect this, the manuscript has been carefully revised to distinguish clearly between observation and interpretation, and to avoid overstatement of the data. Accordingly, we have revised the title and the abstract. Selected examples are explored further to illustrate how candidates from the dataset can be followed up, but the primary contribution of this work is to provide a structured framework and resource that can guide future mechanistic studies of SET1C function.
  
  We thank the reviewers for their thoughtful comments. We have followed their recommendations by modifying the structure of the manuscript, removing distracting results and relocating some figures to the supplementary materials to improve the readability of the manuscript. At the same time, the reviewers acknowledge that the dataset is extensive and that aspects of the validation work are valuable.
  
  The changes made to the manuscript's structure in accordance with the reviewers' recommendations are as follows:
  
  (1) Figure 1 is accompanied by a table (Table S2) with the raw data describing all the interactions from the ten 2H screens. This table also lists common interactors found in the independent screens. I'm afraid Table S2 was omitted from the initial submission of the manuscript
  
  (2) Figure 2 has been modified to include an AlphaFold modeling of a seven-subunit Set1C complex (Set1– Bre2–Sdc1<sub>2</sub>–Swd1–Swd3–Spp1) together with Kap104. Figure 2D has been moved to a new Figure S2
  
  (3) The initial figure S2, which was problematic, has been removed, along with the accompanying text.
  
  (4) Figure 3 of the original paper has been moved to the supplementary material and is now shown as a new Figure S3.
  
  (5) Figure 5 in the original paper becomes Figure 3 in the revised version
  
  (6) Figure S3 (Co-IP between Set1 and Prp22), which serves as validation data, has been moved to the main figures and is now presented as Figure 4.
  
  (7) Figure 6 in the original paper becomes Figure 5 in the revised version
  
  (8) Figure 4 from the original paper has been repositioned as the first figure (new Figure 6) of the biochemical characterization of the interaction between Snf2 and Set1C.
  
  (9) Figure 7 has been removed from the manuscript. We have kept the original Figure 7E as a new Figure S6.
  
  (10) Figures 8, 9, 10 become Figures 7, 8, 9.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  We thank Reviewer 1 for the careful and thoughtful evaluation of our manuscript. We fully agree that yeast two hybrid screening provides candidate interactions that require cautious interpretation, and we recognise that our original version did not always make this sufficiently explicit.
  
  In the revised manuscript, we have made substantial changes to address this central concern. All Y2H interactions are now consistently presented as candidate or potential interactions, and speculative statements have been either removed or explicitly framed as hypotheses. Our intention is that the reader can clearly separate the dataset itself from any proposed biological implications.
  
  Second, we have refocused the manuscript to better reflect its primary contribution. We now present the Y2H screens as a comprehensive resource that defines a set of candidate interactions for SET1C, rather than as a set of validated functional relationships. In line with this, we have reduced the emphasis on speculative models and removed sections where the connection to experimental evidence was not sufficiently strong. This includes the removal of Fig. S2 and Fig. 7 and the associated text, as well as the relocation of several figures to the supplementary material. Where appropriate, we have added statements highlighting the limitations of the approaches used and the need for future work to establish physiological relevance.
  
  More generally, we agree with the reviewer that the value of Y2H data lies in generating testable hypotheses rather than establishing conclusions. We have therefore revised the manuscript throughout to ensure that the interpretation remains proportionate to the strength of the evidence.
  
  We hope that these changes address the reviewer’s concerns and result in a clearer and more appropriately balanced presentation of the data.
  
  The manuscript by Luciano et al is a collection of experiments about the yeast histone 3 lysine 4 methyltransferase, Set1, starting with 10 yeast two-hybrid screens (Y2H). Y2H screens were briefly popular 20+ years ago, but the persistently unfavourable false-to-true positive ratios limited their utility, and the conclusion emerged that Y2H is an unreliable approach for gathering protein-protein interaction data. Y2H outcomes are candidate interaction lists at best, strongly contaminated by false positives. Here, the authors employed a company (Hybridomics) to perform the Y2H screens.
  
  The primary data is not presented, and the outcomes are summarized using the Hybridomics in-house quality scoring system in Figure 1A. It is not possible to evaluate these data, and the manuscript presents cartoon summaries that the reader must accept as valuable.
  
  Hybrigenics brings extensive experience from conducting numerous screens, enabling the team to recognize recurring false positives that commonly arise in screening assays. In their detailed analysis, Hybrigenics reports the number of clones recovered and the extent of overlap among interaction regions, both of which contribute to the confidence scores they assign. Table S2, provided in the revised version, more accurately reflects the raw data obtained by Hybrigenics. Nevertheless, we agree that false positives contaminate the list of potential interactors. Some interactions may also be indirect through a common interactor and do not reflect a physiological interaction.
  
  (1) Based on the extensive knowledge about Set1C/COMPASS acquired from genetics and biochemistry by many labs (including the Geli lab), the results presented here from the 10 Y2H screens are notably patchy. Of the 7 subunits of this complex, only one (Spp1) was identified using Set1 as bait. Conversely, as baits, Swd2, Spp1, Shg1, captured Set1, and the Bre2-Sdc1 interaction was reciprocally identified. These interactions were scored at the highest confidence level, which lends some confidence to the screens. However, the missing interactions, even at the third confidence level, indicate that any Y2H conclusions using these data must be qualified with caution. The authors do not appear to be cautious in their lengthy evaluations of these candidate interactions, which are illustrated with cartoons in Figures 2 and 3, with some support from the literature but almost without additional evidence. Snf2 is a particularly interesting candidate, which the authors support with pull-down experiments after mixing the two proteins in vitro (Figure 4). After Y2H, this is the least convincing evidence for a protein-protein interaction, and no further, more reliable evidence is supplied.
  
  We thank the reviewer for raising this important point regarding the strength of the evidence supporting the Set1– Snf2 interaction. We agree that the current data do not establish a definitive physiological interaction. In the discussion, we explicitly note the limitations of the current data.
  
  For Figure 2, as recommended by referee 2, we performed AlphaFold modeling of a seven-subunit Set1C complex (Set1–Bre2–Sdc1<sub>2</sub>–Swd1–Swd3–Spp1) together with Kap104. Consistent with the Y2H data, the model recapitulates binding of the Kap104 SID to the PY-NLS region of Set1 (residues 40–90).
  
  We have moved Figure 3 in the supplementary materials.
  
  (2) Figure 5 continues the cartoon summary of extrapolations from the Y2H screens, again without supporting evidence, except that the authors state.
  
  Figure 5 is now Figure 3. We have added the statement in the text: “It is not feasible to validate all of these interactions within the limits of this manuscript, and their validity should therefore be interpreted with caution. Nonetheless, these findings provide a useful basis for future research”.
  
  "We have refined the interaction region between Set1, Prp8 and Prp22, showing that Prp8 and Prp22 interact strongly with Set1-F4 (n-SET). Prp22 interacts in addition with Set1-F1 (Figure S2)." However, Figure S2 does not show this evidence and is incoherent.
  
  When we say that we have refined the interaction region between Set1, Prp8, and Prp22, we mean that we have restricted the interaction regions according to Y2H criteria. Indeed, we have not shown the spots illustrating the results. This statement has been deleted as well as Fig. S2
  
  The figure legends for Figure S2B and C do not correspond to the figure.
  
  (B) Expression of the F1-F5 fragments in yeast cells. Fusion proteins were detected with an anti-GAL4 monoclonal antibody. TOTO yeast cells (Hybrigenics) were transformed with the different pB66-Set1-F1 to F5 plasmids and subsequently with either P6, pP6-Snf2 762-968, pP6-Prp8 37-250, or pP6-Prp22 379-763 that were identified in the Y2H screens. Transformed cells were incubated 3 days at 30{degree sign}C on SD-LEU-TRP and then restreaked on SD-LEU-TRP-HIS with 3AT. Cell growth was monitored after 2 days at 30{degree sign}C.
  
  (C) Solid and dotted arrows indicate that transformed TOTO cells transformed with pB66-Set1-F1 to F5 and the indicated prey (Snf2, Prp8, and Prp22) are growing in the presence of 20 mM and 5 mM of AT, respectively.
  
  Figure S2D is two almost featureless dark grey panels accompanied by the figure legend D) Control experiment showing that TOTO cells transformed with p6 and pB66-Set1-F4 are not gowing (sic) in the presence of 5 mM or 20 mM AT.
  
  We agree that the legend for Figure S2 was unclear and does not accurately describe the panels shown in the figure. Fig; S2 has been deleted in the revised version. The results shown in the original Fig. S2 add limited information and may detract from the clarity of the main points.
  
  In the revised version, we have moved the CoIP analysis demonstrating the interaction between Set1 and Prp22 (previously shown in Figure S3) into the main figures (now Figure 4) to further support and validate the two-hybrid screening results presented there.
  
  Line 343. Interestingly, the two-hybrid screens reveal that Set1 1-754 interacted with Gag capsid-like proteins of Ty1 (Figure S5), raising the possibility that Set1 binding to Ty1 mRNA is linked to the interaction of Set1 1-754 with Gag.
  
  This is another example of the primary mistake repeatedly made by the authors -Y2H interactions are candidate results and not conclusive evidence.
  
  This statement is supported by our previous findings showing that Set1 binds Ty1 mRNA independently of its dRRM domain and represses Ty1 mobility at a post-transcriptional stage (Luciano et al., Cell Discovery, 2017; PMID: 29071121). One possible explanation for Set1 association with Ty1 mRNA is its interaction with the Gag capsidlike protein. In this context, the observed interaction between Set1(1–754) and Gag capsid-like proteins is consistent with this model.
  
  To further illustrate this point, the authors highlight the candidate interaction between Nis1 and 3 Set1C subunits.
  
  While we agree that the Nis1-Set1C interaction has not been demonstrated beyond doubt, we feel that our Y2H and in vitro binding experiments provide reasonable evidence that the interactions may be relevant. It is important to consider that any interaction assay can provide negative (and false positive) results, this includes Y2H, in vitro binding and mass-spec analysis of purified complexes from cells. We feel that it is not appropriate to only trust protein interactions that are strong and stable enough to be demonstrated via purified complexes. It is clear that some protein interactions do occur in transient and weak manner and therefore are not compatible with biochemical purification approach. This indeed is the strength of alternative methods like Y2H and in vitro binding assays, that interactions can be identified and tested even if the physiological context of the interaction may be more complex.
  
  (3) After multiple speculations based on the Y2H candidates, the authors changed to focus on sumoylation of Set1, which has previously reported to be sumoylated. Evidence identifying two sumoylation sites in Set1, in the N-SET and SET domains, is valuable and adds important progress to the role of sumoylation in the regulation of H3K4 methyltransferase, relevant for all eukaryotes. This illuminating part of the manuscript is only tenuously connected to the preceding Y2H screens and concomitant speculations.
  
  We thank Referee 1 for their comment. While it is true that there is only a modest connection between Set1 interactors involved in direct or indirect sumoylation and the characterization of Set1 SUMOylation sites, we believe that this does not constitute a weakness of the manuscript.
  
  (4) The manuscript then describes a red herring exercise involving Set1 methylation of Nrm1. In an already speculative and difficult manuscript, it is exasperating to read a paragraph about a failed idea. Apart from panel E, Figure 7 is a distraction, and I believe it should not be shared.
  
  (5) However, despite the failure with Nrm1, Line 443 - The H3K4-like domain in Nrm1 raised our attention to other yeast proteins that carry such sequences.
  
  This line of thinking is even less connected to the Y2H screens than the sumoylation work.
  
  However, the authors present a reasonable evaluation of the yeast proteome screened for six amino acids similar to the known H3K4 motif ARTKQT (Figure 7e).
  
  (6) However, this evaluation goes nowhere and has no connection with the next section of the manuscript, which is entirely speculation about the regulation of metabolism and stress responses based on the Y2H results and selected evidence from the literature.
  
  In response to comments 4 and 5, we have removed Fig. 7 and the paragraph titled “The transcriptional corepressor Nrm1 interacts with SET1C.” Part of this paragraph and the section describing the screen of the yeast proteome for six–amino acid sequences resembling the H3K4 motif (ARTKQT) has been kept as Fig. S6.
  
  In the abstract, we have removed the sentence: We demonstrate that the transcriptional corepressor Nrm1 is methylated by SET1C in vitro suggesting that H3K4-like domains may represent a class of non-histone substrates for SET1C.
  
  At the end of the introduction, we have deleted “the transcriptional corepressor Nrm1” in the sentence: In addition, we demonstrate that the transcriptional corepressor Nrm1 and the Snf2 AT-hook are both methylated by SET1C in vitro
  
  (7) The manuscript then describes more failed experiments regarding lysine methylation of Snf2 by Set1C, which unexpectedly reports arginine methylation rather than lysine. The manuscript does not currently meet the standard expected for this type of paper - the composition is somewhat incoherent and there are no previous reports of arginine methylation by SET domain proteins.
  
  We have integrated extensive in vitro reconstruction experiments with complementary in vivo studies, all conducted according to the rigorous standards expected by leading journals. These approaches have allowed us to reach the conclusions presented in this manuscript. While some of these findings are unexpected, they are supported by the data. We have carefully discussed the results and their limitations to provide a comprehensive interpretation.
  
  The manuscript presents a very experienced grasp of the literature and a sophisticated appreciation of the forefront issues, but a surprising failure to eliminate uninformative failures and peripheral distractions. The over interpretation of Y2H results is a dominating failure. There are some valuable parts within this manuscript, and hopefully, the authors can reformat to eliminate the defects and appropriately qualify the candidate data.
  
  We thank Referee 1 for these insightful comments. In the revised version, we have followed the advice to remove non-informative failures and peripheral distractions. Additionally, we exercise greater caution to avoid over-interpreting the Y2H results.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This paper starts with a large-scale yeast two-hybrid (Y2H) screen using Set1 (full-length and smaller parts) and other Set1C/COMPASS subunits as bait. There are hundreds of possible interactions identified, but only a small number are given any follow-up. While it's useful to document all the possible interactions, the unfocused and preliminary nature of the results makes the paper feel scattered and incomplete.
  
  Strengths:
  
  The Y2H screen was very comprehensive, producing lots of interesting possible leads for further experiments.
  
  Weaknesses:
  
  The results are useful but incomplete because only a small subset of the Y2H interactions is further examined. Even in the case of those that were further tested, the validating experiments are only partial or inconclusive.
  
  Referee 2’s comments align in some respects with those of Referee 1. In the revised version, we have followed the detailed Referee 2 suggestions to reduce the scattered nature of the manuscript. In addition, we include an AlphaFold model of the interaction between the Set1 N-term 1-754 with the SID domain of Kap104 that involves the proposed Set1 PY-NLS sequence.
  
  Reviewer #3 (Public review):
  
  The SET1C/COMPASS complex is the histone H3K4 methyltransferase in Saccharomyces cerevisiae, where it plays pivotal roles in transcriptional regulation, DNA repair, and chromatin dynamics. While its canonical function in histone methylation is well-established, its full interactome remains poorly defined. Moreover, whether SET1C methylates non-histone substrates has been an open question. In this study, Luciano et al. employ systematic yeast two-hybrid (Y2H) screening to uncover novel interactors and functions of SET1C. Their findings reveal potential functional connections to RNA biogenesis, chromatin remodeling, and non-histone methylation.
  
  The authors performed multiple Y2H screens using Set1 (full-length, N-terminal, and C-terminal fragments) and each of its seven subunits as baits. They identified high-confidence interactors that link SET1C to diverse cellular processes, including chromatin regulation (e.g., the SWI/SNF complex via Snf2), DNA replication (e.g., Mcm2, Orc6), RNA biogenesis (e.g., spliceosome components Prp8 and Prp22; polyadenylation factors Pta1 and Ref2), tRNA processing (e.g., Trm1, Trm732), and nuclear import/export (e.g., importins Kap104 and Kap123). Some of these interactions were further validated by immunoprecipitation or in vitro assays.
  
  Given the interaction of Set1 with Slx5 and Wss1 - proteins involved in SUMO-dependent processes - the authors investigated and convincingly demonstrated that Set1 is sumoylated. This modification may influence the function and regulation of the SET1C complex.
  
  Finally, the authors provide evidence that SET1C methylates proteins beyond histone H3K4, notably Nrm1, a transcriptional corepressor, and Snf2, the catalytic subunit of the SWI/SNF chromatin remodeling complex. Although Nrm1 contains a domain resembling the H3K4-methylated sequence (H3K4-like domain), this region does not appear to be required for its methylation. The search for other proteins containing similar domains as potential methylation candidates (p.12, first paragraph) seems less justified, given the lack of evidence supporting the requirement for the H3K4-like domain in methylation.
  
  This study offers valuable insights into the interactome of SET1C, suggesting potential links between the complex and a wide range of cellular processes. However, the functional implications of the Y2H interactions remain to be explored further. Additionally, the study provides intriguing information on the possible regulation of Set1 by sumoylation. The discovery of Nrm1 and Snf2 as methylation substrates could significantly expand the known targets and functions of SET1C.
  
  The results are supported by high-quality data.
  
  We thank referee 3 for their positive comments
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Restructure the manuscript into at least two papers.
  
  We thank the reviewer for this suggestion. In the revised manuscript, we have addressed this concern by substantially restructuring and streamlining the presentation. We consider the dataset, validation experiments, and functional observations to be closely integrated, and we believe that presenting them together provides the most coherent and impactful account of the work.
  
  Minor points
  
  There are several basic flaws in the manuscript that I feel indicate the co-authors have not proofread the manuscript sufficiently - 4 examples from early in the manuscript are listed below.
  
  (1) The reference for Hybridomics is (73) - obviously from an earlier version that used a different referencing system that has not been corrected.
  
  Thank you. This has been corrected.
  
  (2) Line 194 - 197. These screens have proven their power and effectiveness. In particular, they identified ...... the CTD of Rpb1 as an interactor of the N-terminal region of Set1 (Bae et al, 2020) (Figure S1). Rbp1 interaction is not identified in the screens presented here, and Figure S1 is a cartoon and not primary evidence.
  
  The interaction between the CTD of Rpb1 (Rpo21) and Set1 is reported in Table S2. The detailed characterization presented in Bae et al. (2020) was subsequently carried out as a direct follow-up to this screen.
  
  (3) Line 205-211. The highly confident interactors of the seven SET1C subunits are shown in Figure 1C-E. We found that Spp1, Shg1 and Swd2 interact alone with Set1 (Figure 1C). The minimum Set1 region for which an interaction is found for each of these 3 subunits is shown in Figure 1C. The high confidence interactors of the seven SET1C subunits are shown in Figure 1C-E. We found that Spp1, Shg1 and Swd2 display Y2H interactions with Set1 (Figure 1C). The high confidence interactors of Spp1, Shg1 and Swd2 are indicated in Figure 1D (see also Table S2).
  
  It is possible that Table S2 was omitted from the original submission, as it was requested during the production stage.
  
  (4) Line 335. We have classified all Set1 and subunit interactors according to these SET1C roles (Figure S5). However, this refers to Figure S4 - many further references to Figure S5 are also to Figure S4.
  
  Thank you. This has been corrected.
  
  Reviewer #2 (Recommendations for the authors):
  
  General recommendations:
  
  (1) Figures 1, 2, 3, and 5 and their associated main text are essentially just lists of interactors, put in graphic form and grouped to allow speculation about possible biological functions for the interactions. But almost none of the ideas are tested, so these sections take much more space than warranted. Having so much preliminary Y2H data actually distracts attention from the follow-up experiments that are shown. I would move most or all of this to the supplement, consolidating the Y2H results into fewer figures (or even just the Table).
  
  As mentioned earlier, the manuscript has been reorganized and Table S2 is provided.
  
  (2) The Snf2 interaction gets the most follow-up, so separating Figure 4 from Figures 8-10 broke the flow of that story. I would group these figures together since all are related to the Snf2 AT hook story.
  
  This was done accordingly.
  
  (3) I understand that it's impossible to validate all the possible interactions, particularly if resources are limited. However, at least for the interactions that get further attention, it could be very useful to try some AlphaFold multimer predictions. A high confidence AlphaFold score would provide a second orthogonal piece of evidence to support the Y2H results.
  
  We generated an AlphaFold model (Figure 2C) that recapitulates the key predictions for the Set1-Kap104 Y2H interaction.
  
  Comments on specific sections:
  
  (1) Y2H results. The text says Figure 1 shows all the high-confidence interactors. But the Set1 NTD interaction with the Rpb1 CTD is not shown here (it's in the supplement).
  
  In Table S2, an interaction is observed between full-length Set1 and the Rpb1-CTD (14 repeats), where Rpb1 is referred to as Rpo21.
  
  Figure 2 shows additional high-confidence interactors that do not appear in Figure 1, while others (like the Shg1Mog1 interaction) are shown in both Figures 1 and 2. It's confusing to scatter the data like this, which is why I recommend consolidating into a single figure or table.
  
  In Figure 2, the high-confidence interactors of Set1 (1–754) are highlighted in red and green (Snf2, Gbp2, and Kap104), and all are also present in Figure 1. Dbp1, identified as a high-confidence interactor of Spp1, likewise appears in Figure 1. Table S2 summarizes all of these interactions.
  
  (2) Line 219. How does a "high confidence" Set1-Kap104 Y2H interaction suggest the interaction is direct? Couldn't an indirect interaction also be tight and reproducible? This is an example where it would be worth seeing if AlphaFold also predicts an interaction and, if so, whether it involves the proposed NLS sequences.
  
  Y2H screening indicated that Kap104 binds to the N-terminal region (aa 1–754) of Set1 via its Set1 interaction domain (SID). To validate this, we used AlphaFold to model the seven-subunit Set1C complex (Set1-Bre2-Sdc1(x2) Swd1-Swd3-Spp1) with Kap104. The resulting model showed borderline confidence for the overall fold (pTM = 0.53) and low confidence in subunit positioning (ipTM = 0.5). Visualization in PyMOL confirmed Kap104 SID binding to Set1(1–754), consistent with Y2H results. The structure highlights Kap104 SID interaction with Set1’s PY-NLS at residues 40–90; the second PY-NLS is neither visible nor engaged in this model.
  
  (3) In the discussion of nuclear import interactors, what does it mean to say the Shg1-Mog1 interaction is "along the same line" as Set1-Kap104?
  
  We meant that the interaction between Shg1 and Mog1 represents another example of an interaction between a Set1C subunit and a protein involved in nuclear import. Along the same line has been deleted in the revised version.
  
  (4) To follow up on the Swd1-Nrm1 Y2H interaction, the paper shows that Nrm1 is methylated by Set1 in vitro (Figure 7), but it's not clear whether this has any biological significance. Without any in vivo follow-up, this figure is probably more appropriate for the Supplement.
  
  As noted above, Figure 7 has been removed, only panel E of Figure 7 is retained in the revised version.
  
  (5) Figures 6 and S8 show that Set1 is SUMOylated. Although it's not clear what this does to Set1 function or which E3 is responsible, the modification data looks convincing. The legend to Figures 6A and B says the Elutes samples are purified on nickel columns. Why are the Myc-Set1 and GB-Set1 proteins without the his-SUMO modification also binding to the nickel column? That's not happening in panels C and D. In the blots on the right for his-SUMO, is there any way to show that one of those bands is Set1? Maybe IP for MYC and then probe for the His tag?
  
  We thank the reviewer for this observation. His-SUMO purification using Nickel beads was used to purify HisSUMOylated proteins. Purified proteins were analyzed by Western blot using anti-MYC or anti-GAL4 antibodies to detect SET1-His-SUMO, as well as anti-His antibodies to confirm the presence of purified His-SUMOylated proteins. As mentioned by the reviewer, we detected unmodified MYC-Set1 and GAL4-Set1 in both the (-) and (+) His-SUMO eluates. This phenomenon is most likely due to the stickiness of unmodified Set1 to the beads. This is a commonly observed phenomenon in this type of biochemical assay, particularly when analyzing large proteins such as Set1 (124 kDa). This stickiness behavior has been observed in similar SUMOylation assays, e.g., for Hpr1 (88 kDa) (Bretes H, 2014. PMID: 24500206), Nup1 (114 kDa), and Nup2 (78 kDa) (Folz H, 2019. PMID: 30837289). This stickiness was not observed when using Set1 fragments (panels C and D), most likely because the fragments lost the stickiness to the beads, a characteristic belonging only to the full-length Set1. We mention this point in the legend of the new figure 5.
  
  (6) The Snf2 interaction gets the most follow-up. The GST pulldown validation of Set1 interaction with Snf2 AThook looks pretty good. However, the RGG repeats are necessary for the Set1 interaction with recombinant Snf2 proteins, but not for the co-IP of in vivo material. Again, AlphaFold could lend further support here.
  
  Thank you for this helpful suggestion. We agree that structural modelling could, in principle, provide an additional and orthogonal line of support for the Set1-Snf2 interaction. We did explore this using AlphaFold. However, both Set1 and Snf2 contain extensive intrinsically disordered regions, including the regions implicated in the interaction, and none of the models we obtained provided interpretable structural insight into the interaction interface. In particular, the predicted complexes showed low confidence in relative domain positioning, which limits their usefulness for supporting or refining the interaction model. One possible explanation is that additional components are required to stabilise a meaningful interaction in silico. While we modelled Set1 within a seven-subunit Set1C complex, Snf2 was necessarily included in isolation from its native context. Given that Snf2 functions as part of multiple, heterogeneous chromatin remodelling complexes, the absence of its physiological binding partners may prevent AlphaFold from resolving a relevant interaction interface. In light of these limitations, we have not included the AlphaFold models in the manuscript, as we felt they would not provide reliable or informative support. Instead, we have focused on the experimental evidence presented. We have clarified this point in the revised discussion to acknowledge both the potential and the current limitations of structural prediction approaches in this context.
  
  (7) The Snf2 methylation by Set1 is less convincing, and its biological significance is still unclear. I think it's pretty unlikely that Set1 could methylate arginine. The mass spectrometry is used for in vivo validation (mass spec), but mutating the lysines (Figure S11, S12) or Set1 deletion (Figure S14) doesn't seem to affect the signal. Could there be quantitative differences? Is there any way to quantitate the mass spec data to estimate the modified/unmodified ratio?
  
  We thank the reviewer for highlighting the unexpected nature of the methylation results. We agree that the observation of arginine methylation in this context is surprising, particularly given that SET domain proteins are classically associated with lysine methylation. This is why we performed multiple in vitro and in vivo experiments, and careful interpretation data that were clear led us to conclude that Set1C methylates the arginines within the ARTSTRGR motif of the AT-hook. We agree that the biological significance of this modification remains unclear. We obtained data showing that deletion of the SID domain of Snf2 impairs yeast growth on lactate, whereas this mutant grows normally on glucose and galactose, in contrast to the Snf2Δ mutant, which exhibits poor growth on both glucose and galactose. In comparison, deletion of the RG motif of Snf2 does not affect growth on lactate. These results provide insight into the interaction between Set1 and Snf2 but do not shed light on the potential importance of methylation of the RG motif. We therefore chose not to include them. In the discussion, we acknowledge the limitations of the current evidence. Our intention is to retain these findings as potentially interesting observations while ensuring that their interpretation remains appropriately cautious.
  
  Minor comments:
  
  (1) Lines 153 and 163: Stress response is listed twice, but with different references. Maybe these need to be further defined or else combined?
  
  We have deleted stress response line 163 and moved the references “Deshpande et al, 2022 and Nadal-Ribelles et al, 2015” line 153.
  
  (2) Line 193: better to say the proteins were fused to the C- or N-terminus (rather than upstream/downstream). It would be worth mentioning if there was a reason why Swd2 was fused to the N-terminus, unlike all the others.
  
  This has been done accordingly. In our hands, C-terminal fusions of Swd2 are not functional.
  
  (3) Is the scoring scheme (highest, high, good) that produces the colors in Figure 1 shown in the table? It doesn't say what the tan color (two of the Bre2 interactors) means.
  
  It is a mistake, Tea1 should be blue and Swi1 should not appear here. This has been fixed.
  
  (4) Line 206. It's not clear what it means to say that three of the subunits "interact alone with Set1". It can't mean they only interact with Set1, since other interactors are shown in Figure 1B. If it meant to say the interactions don't require other COMPASS subunits? I don't see how you can tell that from the Y2H assay. Please clarify.
  
  It means that these 3 subunits interact directly with Set1 without the need of another subunit, unlike of the other subunits.
  
  (5) Line 252. While discussing the Set1 - Snf2 interaction, the paper cites Hirschhorn et al. That paper talks about Swi-Snf, but doesn't mention Set1 anywhere. Maybe the authors meant to cite a different paper?
  
  We agree, this reference is not appropriated. It has been deleted.
  
  (6) Figure S2 panels A and C are redundant and could easily be combined.
  
  Figure S2 has been deleted.
  
  (7) Figure S4: Should the green category also include transcription? Ssl1 is a TFIIH subunit, which could be involved in either transcription initiation or NER. Sen1 and Nrd1 are transcription termination factors, although Sen1 may also function in R-loop resolution.
  
  We agree but it is already complicated as it is.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.11.23.690026v2
www.biorxiv.org www.biorxiv.org

Oxytocin neurons signal state-dependent transitions from rest to thermogenesis and behavioral arousal in social and non-social settings

1
1. Public_Reviews 10 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors identify and investigate a specific population of PVNOT neurons (oxytocin neurons of the paraventricular hypothalamus) that seem to be involved in both behavioral and autonomic thermoregulation. These cells are activated by social thermoregulatory behaviors, but can influence thermoregulation in both social and nonsocial contexts, specifically during transitions and when mice are at low core body temperature (Tb).
  
  Strengths:
  
  The manuscript has many strengths.
  
  This is a novel study, with a clear question that is addressed using an array of well-designed experiments employing integrative methods. Most of the figures are well-developed, and the analysis is generally rigorous and well-detailed. The authors are clearly very experienced in this field, and indeed, their scholarly introduction and discussion sections are to their credit.
  
  We are grateful for the reviewer’s careful reading and positive assessment, including their remarks on the clarity of the question, experimental design, and analysis.
  
  The link between thermoregulation and the oxytocin system is well established, as is the link between social behavior and the same broad system. However, the link between these three things is novel, if it can be well substantiated. I am not persuaded that was achieved here, but I do think this manuscript has many novel and useful offerings.
  
  We thank the reviewer for this thoughtful comment and for recognizing the novelty of the study. We wish to clarify the central goal of the manuscript: while social thermoregulation provided the initial influence for studying PVNOT neurons, our principal finding is that PVNOT activity during rest-to-arousal transitions is independent of social context. As stated in the manuscript, "To our surprise, these peaks were observed in both social and non-social contexts." Thus, our study demonstrates a broader role for PVNOT neurons in state-dependent thermoregulatory transitions—one that includes, but is not limited to, social contexts. We have revised the text to make this emphasis clearer throughout.
  
  We also added a short piece to the Discussion on this point. This is the fourth and final paragraph of the Discussion section called “State-dependent PVNOT activity during thermo-behavioral transitions.”
  
  The authors use a cooling floor, and only go down to 10 degrees Celsius. This is fine, but I would like to see the effects using ambient temperature also. This is not a crucial issue, as it is not necessary for the authors' interpretations, but it could improve measurement sensitivity.
  
  Both Reviewer 1 and Reviewer 2 raise important and related points: manipulating floor temperature provides a thermal stimulus that is distinct from manipulating whole-chamber ambient air temperature, and these modalities could engage partially different sensory pathways and circuits. (Note this response is copy-pasted to other relevant comments).
  
  We intentionally used floor cooling/heating because it provides a reliable, well-controlled stimulus that elicits thermoregulatory behaviors while keeping the experimental environment stable (e.g., avoiding changes in airflow/humidity that can accompany ambient cooling). To prevent conflation of these modalities, we revised the manuscript to consistently describe the manipulation as “floor temperature” (and not “ambient temperature”), and we added to the Discussion acknowledging that conductive floor temperature changes may differentially recruit peripheral thermoreceptors compared to ambient air temperature.
  
  While extending these experiments to whole-chamber ambient temperature changes could be informative in future work, it is not required for the central interpretations here, which focus on PVNOT activity dynamics during thermoregulatory behavior under controlled thermal conditions.
  
  Through an elegant behavioral experiment in Figure 1, the authors identify c-Fos patterns in the PVN that are activated by active social huddling, and they show that at the RNA level these cells overlap with oxytocin, indicating that they are oxytocin-producing cells. But this is not well discussed or indeed quantified.
  
  We thank the reviewer for catching this; Reviewer 2 made a similar comment. A typo in the figure legend led to this confusion. Figure 1I is in fact a quantification of the percent Oxytocin:Fos colocalized cells (not Fos:DAPI, as was written) in dorsal and ventral subregions of the PVN during active huddling and quiescent huddling. We have corrected the legend and clarified the quantification in the revised manuscript.
  
  The authors engage in a deep analysis of fiber photometry experiments, first by observing PVNOT neuron overall activity during a variety of different behaviors in the context of three different temperatures. Activity was associated with nesting, quiescence, and both types of huddling (when social opportunities exist). Social situations did not strongly affect this, nor did temperature conditions. These analyses indicate that the PVNOT neurons are involved in mediating specific behavioral outputs.
  
  With more detailed analysis, the authors investigated how PVNOT neuronal activity relates to behavioral state transition. They found that the probability of peak PVNOT neural activity strongly predicts the offset of quiescence or quiescent huddling, and therefore can be argued to signal an increase in physical activity, and as such, increased metabolism. However, the opposite pattern was observed for huddling and nesting (onset being associated with PVNOT activity), again arguing for increased thermogenesis as a function.
  
  What is particularly compelling is that these peaks of activity tend to occur during low Tb, again arguing for the function in increasing body warmth.
  
  The authors then employ an impressive setup where they image brown adipose tissue (BAT) in tandem with DeepLabCut (DLC) based animal tracking. Crucially, BAT activity and surface temperature correlated with the calcium peak of PVNOT neurons.
  
  Lastly, optogenetic activation of PVNOT neurons increased Tb when it was in the lower range, but not when in the higher range. It also affected BAT and rump temperature, again at low Tb. However, there is no real effect on behavior, except a trend in activity.
  
  The authors do some interesting tracing work at the end, though this is not functionally explored. That is not a criticism, as it does seem like this would be a whole follow-up study.
  
  Weaknesses:
  
  While novel and valuable, the manuscript feels incomplete in its current form.
  
  The main evidence lacking is a loss of function of the experiment. Ideally, the authors would chronically and/or acutely inhibit PVNOT neurons to establish their necessity. I know this seems obvious, but I think it is important.
  
  We agree with the reviewer that loss-of-function experiments are a valuable component of circuit mapping and we appreciate this suggestion. For transparency, we did attempt a chronic chemogenetic inhibition experiment using DREADDs in PVNOT neurons. However, the results were inconclusive, primarily owing to the confounding effects of pharmacological injections: both drug and vehicle-treated animals exhibited stress-induced hyperthermia following injection, and because inhibition could not be delivered while animals were asleep/resting the experimental conditions did not recapitulate the low-Tb quiescent state during which PVNOT peaks naturally occur. Given these confounds, we do not believe these data meet the standard required for inclusion in this manuscript.
  
  We did consider acute optogenetic inhibition. However, a clear prediction about inhibition was not as apparent in our model. Our photometry data identified a, testable hypothesis for activation: PVNOT peaks precede the exit from quiescence, therefore activation during quiescence should increase the transition, which it did (Figures 5 and 6).
  
  That said, new analyses of our data, driven by these reviews, have now uncovered what might be inhibition of PVNOT neurons during the approximate 60 seconds prior to entry to resting states (i.e., quiescence and quiescent huddling); see the new Fig. S3I-L. This raises the possibility that an appropriately timed photoinhibition of PVNOT neurons could facilitate the establishment of resting states. We believe that, in light of our chemogenetic and optogenetic activation experiments, for an inhibition experiment to be done appropriately would require a real-time, closed loop setup that is currently not available in our laboratory.
  
  We have added a caveat to the Discussion acknowledging the lack of LOF data as a limitation and have identified this as an important direction for future investigation.
  
  The relative lack of behavioral analysis following optogenetic activation of PVNOT neurons is puzzling. The authors must surely want to study what this intervention does to behavioral state transitions. I feel that the current level of analysis limits the overall conclusions of this study to a large extent.
  
  We appreciate this concern and wish to clarify two points.
  
  First, our decision to perform optogenetic activation in isolated (solo-housed) animals was driven by our initial finding that PVNOT activity profiles are mostly social-context independent during the transition from rest to arousal (Figures 2 and 3). By studying isolated animals, we could test the fundamental relationship between PVNOT activation and the rest-to-active transition without confounding social feedback. Additionally, we encountered technical challenges when using the SGBS thermographic model in paired contexts: the high thermal intensity at the point of contact between huddling mice created a thermal merging artifact that prevented accurate segmentation of individual body regions (BAT vs. rump).
  
  Second, we did examine the post-stimulation behaviors of solo-housed animals (Fig. S5B). While PVNOT activation significantly increased the probability of exiting quiescence, it did not trigger a singular, stereotyped behavioral output. Instead, it facilitated a generalized transition to an active state, within which animals engaged in various context-appropriate actions (nesting, grooming, locomotion). We note in the discussion that “Analysis of manually- annotated behaviors suggested that PVNOT stimulation did not activate a specific motor pattern output but instead resulted in combined increases in the time spent in nesting (linear mixed model estimate coefficient of ChR2+ stimulation: +38.0 sec), locomotion (+54.0 sec), and grooming (+14.5 sec), but not in eating/drinking (-0.4 sec) (Fig. S4B).”
  
  That photostimulation had relatively larger effects on nesting and locomotion is consistent with our model.
  
  Last, in the Discussion we acknowledge that future experiments should seek to disentangle the effects of PVNOT light simulation in the non-social vs social context (last paragraph of the Discussion section called “State-dependent PVNOT activity during thermo-behavioral transitions”).
  
  A broader criticism is that the social dimension of this manuscript seems overplayed. Naturally, oxytocin signaling can be implicated in social behavior based on a large literature. However, the focus on social thermogenesis seems like a crude integration of social behavior and thermogenesis. Given that the authors see their effects in both social and nonsocial cases of thermoregulation, I am not sure the attempts at integrating social functions and thermogenic functions of PVNOT neurons are warranted. That is, unless the authors have further experiments or analysis that can convincingly justify this link.
  
  We thank the reviewer for this comment. We understand the concern and wish to reframe our position. We argue that the equivalence of PVNOT signals across social and non-social contexts is itself a central finding. While the oxytocin system is widely regarded as a mediator of social bonding, and therefore a candidate mechanism underlying huddling, our data demonstrate that PVNOT neurons provide a signal for state-dependent thermoregulatory transitions that is unbiased by social context. Rather than overplaying the social dimension, we believe our study contextualizes the social function within a broader homeostatic role: PVNOT neurons facilitate transitions from rest to thermogenesis and arousal regardless of whether the resting state involves social huddling or solitary quiescence.
  
  While the thermoregulatory transitions are present in both contexts, we note that social context appears to modestly enhance some PVNOT downstream effects. Specifically, peak probability and frequency were slightly higher in the paired compared to solo context (Fig. 3F-I, Fig. S2D), and peaks were associated with a somewhat stronger increase in physical activity when a cagemate was present (Fig. 3B-E). Additionally, quiescent huddling (paired) bouts were associated with stronger body temperature regulation compared to solo quiescence (Fig. S3Q-V). This nuance supports that the social dimension is not overplayed but rather situated within a broader homeostatic function.
  
  We have revised the manuscript to ensure that this framing is consistent and clear. We emphasize that our goal was to uncover neural mechanisms underlying physiological transitions across behavioral and arousal states, using our social thermoregulation assay as a starting point (based on our previous publication). Counter to our initial hypothesis, the PVNOT signals generalized beyond the social setting.
  
  In addition, the analysis of virgin females and lactating mothers seems out of place in Figure 4.
  
  This point was echoed by Reviewers 1 and 3, and one we have taken several actions to address this. (Note this response is copy-pasted to the other reviewers).
  
  We agree with the reviewers that the rationale for the lactation data should be made more explicit. The primary purpose of this experiment was to validate the identity of oxytocinergic neurons of the PVN.
  
  Our efforts to use IHC to validate the identity of AAV-transfected cells were inconclusive, and we have now added new data to illustrate this point. We have added Fig. S4 that includes quantitative data on expression specificity. We observed significant variability in co-staining (OT+/GCaMP+) across brain slices, likely reflecting the dynamic nature of oxytocin peptide synthesis and storage, particularly with respect to processes lining the third ventricle. This finding is in accordance with other studies that are now cited in the text.
  
  We now emphasize that, because IHC provided variable co-localization, we employed the lactation model as an independent physiological validation of the identity of the recorded neurons.
  
  It is well established that PVNOT neurons undergo dramatic changes in firing dynamics and synchrony during lactation to support milk ejection (Yaguchi et al., 2023; Yukinaga et al., 2022). Conversely, AVP and CRF cell populations in the PVN do not appear to display synchronized pulsatile bursting during lactation (see response to Reviewer-2 comment-2 in ‘Recommendation for authors’ and our updated Discussion). Observing these characteristic changes in our recorded population provides high-confidence functional evidence that we are targeting oxytocin neurons. We have revised the text to clarify that Figure 4 serves primarily as a functional verification of genetic targeting.
  
  We also acknowledge in the Discussion the possibility that our Cre-line may capture a small percentage of nonoxytocinergic neurons, while noting that the dramatic shift in calcium dynamics during lactation (Figure 4I–L) strongly suggests the recorded population is dominated by oxytocin neurons.
  
  The c-Fos/oxytocin overlap needs to be quantified.
  
  We thank the reviewer for catching this; Reviewer 2 made a similar comment. A typo in the figure legend led to this confusion. Figure 1I is in fact a quantification of the percent Oxytocin:Fos colocalized cells (not Fos:DAPI, as was written) in dorsal and ventral subregions of the PVN during active huddling and quiescent huddling. We have corrected the legend and clarified the quantification in the revised manuscript. (Note this response is copy-pasted to other relevant comments).
  
  The methods section could be improved by explaining how the authors exclude animals that exhibit both types of huddling, if they occur within a 90-minute time window. This seems like it could cause significant confounds.
  
  We have clarified in the Methods that animals were not excluded if they exhibited both active and quiescent huddling during the recording session. Importantly, a prerequisite for inclusion in the FOS study was that animals had to be continually engaged in the target behavior for a minimum of 15 consecutive minutes from behavior onset, an established approach for behavior-driven immediate early gene mapping. The 90-minute window was then counted from that same onset for FOS IHC. Because active huddling frequently transitions directly into quiescent huddling (and vice versa), excluding such animals would have eliminated the majority of recordings. The heterogeneity of behavioral states within the FOS integration windows is precisely why we turned to fiber photometry, a technique with the temporal resolution necessary to dissociate neural signals associated with each behavioral state.
  
  The computer vision model is not well-explained. The authors need to be far more explicit here about how it was validated.
  
  We thank the reviewer for this comment and agree that the original manuscript did not sufficiently detail the validation framework. We have revised both the Methods and Results to explicitly detail how SGBS was evaluated.
  
  First, we now clearly describe model validation on a held-out dataset (20% of manually annotated images not used for training), reporting standard segmentation metrics (per-class IoU and Dice/F1) and directly comparing SGBS to an unmodified Mask R-CNN trained under identical conditions (same backbone initialization, dataset split, and training schedule). As shown in Fig. 5D, the skeleton-guided model converged more rapidly and achieved a lower final loss than the baseline network, demonstrating improved segmentation performance in occlusion-rich thermographic recordings.
  
  Second, we more explicitly describe an independent physiological validation step. SGBS-derived surface temperature trajectories were temporally aligned with simultaneously recorded implanted thermologger measurements, which were not used during model training. As shown in Fig. 5E, SGBS-derived signals strongly corresponded with core body temperature dynamics and reproduced expected thermophysiological relationships (e.g., BAT warming preceding core temperature rise). This establishes external validity beyond pixel-level segmentation metrics.
  
  The authors should cite and consider this preprint: https://www.biorxiv.org/content/10.1101/2024.09.17.613378v1
  
  We have cited this preprint (Raam et al., 2024) in the revised manuscript and integrated relevant findings into the Discussion, in the section called “Limitations and caveats”.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This is a very interesting study from Vandendoren and colleagues examining the role of PVN oxytocin neurons during thermoregulatory behaviors, in particular during thermoregulatory huddling. The findings are important and compelling, and have implications for the thermoregulation field as well as the social/naturalistic behavior field.
  
  Strengths:
  
  The study is very creative and tackles a challenging task to examine how natural and social behavior influences neural circuits for a homeostatic system such as thermoregulation. The authors use a combination of state-of-the-art tools (photometry, optogenetics, automated behavior tracking, thermal imaging, and core body temperature measurement), often in combination with each other, to produce a rigorous and high-dimensional dataset. Carrying out tightly temperature-controlled experiments and examining natural behavior, neural activity, and body physiology simultaneously is quite a feat. I applaud the authors for taking this on in a rigorous and detailed manner. This paper will be valuable for both the thermoregulation field as well as for researchers interested in naturalistic social behaviors. The conclusions are supported by the data.
  
  We appreciate the reviewer’s careful read and positive assessment of our integrated behavioral, neural, and physiological measurements and their relevance to both thermoregulation and social behavior.
  
  Weaknesses:
  
  I have a number of questions and suggestions for clarification that would help improve the interpretation of the findings.
  
  (1) Figure 1D-F: It would be helpful to include representative images of cFos expression in the PVN, LS, and DMH during both quiescent and solo huddling conditions, to better illustrate the reported differences.
  
  We have now addressed this in the revised manuscript. We had originally shown active huddle FOS expression in Fig. 1D-F and quiescent huddle in Fig. S1A-C. We have now added solo groom FOS expression to Fig. S1D-F.
  
  (2) Figure 1C: The data suggest a general suppression of neural activity during sleep-associated quiescent huddling, which somewhat complicates the interpretation of what specifically the active huddling cells are responding to. A more informative control might have been a comparison between huddling and a more generic form of social engagement (e.g., dyadic sniffing) to assess whether huddling-responsive neurons are broadly tuned to social stimuli. While it may not be feasible to add this experimentally at this time, a brief discussion of this limitation in the main text would be valuable.
  
  We thank the reviewer for this thoughtful suggestion. We agree that comparing huddling-responsive neurons with a more generic social engagement is an important consideration.
  
  We first note that the FOS study required animals to be continuously engaged in the target behavior for a minimum of 15 consecutive minutes, ensuring that FOS expression reflects sustained behavioral engagement rather than brief social contact. Furthermore, we believe the FOS association with active huddling in Figure 1C is likely driven by preceding bouts of quiescent huddling. Because these experiments were conducted during the light phase, active huddling bouts were almost always preceded by bouts of quiescent huddling.
  
  Given that FOS protein often integrates neural activity over ~60-90 minutes, the FOS signal during active huddling may reflect cumulative PVNOT activity during the quiescent to active transition, rather than active huddling by itself. This interpretation aligns with our fiber photometry data, which show that PVNOT peaks are concentrated at the offset of quiescent states and the onset of active states. Moreover, a broad-scale analysis of calcium data driven by these reviews, now shows there is a local minimum of PVNOT neurons during the transition into quiescent states and a local maximum of calcium activity during the offset of resting states and the onset of nesting and active huddling (Fig. S3I-L).
  
  To directly address whether PVNOT neurons are broadly tuned to social engagement or specifically associated with thermoregulatory state transitions, we examined neural activity during "Contact Initiated" (ConI) and "Contact Received" (ConR) events—brief social interactions (e.g., dyadic sniffing) that occur outside the context of huddling. These interactions, which typically last less than one second, did not trigger the large-amplitude calcium peaks observed during rest-to-arousal transitions. Specifically, there was no significant association between ConI or ConR events and PVNOT peak frequency or amplitude (Fig. S2H; Table S1; p = 0.505, p = 0.575, respectively). This reinforces our conclusion that PVNOT peaks are not a generic response to social stimuli but are specifically aligned with the coordinated autonomic and behavioral transitions required to exit a low-temperature quiescent state. We have added a clarifying paragraph to the Discussion.
  
  (3) Figure 2H-J vs. Figure 1: The fiber photometry data suggest increased PVN activity during quiescent huddling vs active huddling, which appears to contrast with the cFos results from Figure 1. It would be helpful for the authors to comment on possible reasons for this discrepancy-e.g., methodological differences, temporal resolution, or cell-type specificity.
  
  We agree that this apparent contrast deserves explicit discussion. The difference arises from the dramatically different temporal resolutions of the two techniques. Fiber photometry captures real-time neural dynamics at subsecond resolution, revealing that PVNOT neurons exhibit high-amplitude bursts primarily during the offset of quiescence (and to a lesser extent the onset of post-quiescence behaviors) (Figs. 3 and 5). Because these peaks occur while the animal is categorized as "quiescent," they appear as quiescence-associated activity in the photometry ethogram.
  
  Conversely, FOS integrates neural activity over ~30–90 minutes. In retrospect, and in light of our photometry data, an animal categorized as "Active Huddling" in the FOS study is one that has likely experienced PVNOT bursts and subsequently transitioned to an active state. The higher FOS signal in active animals therefore likely represents the cumulative activity of the transition itself and sustained activity in the active state.
  
  We have added a clarifying statement to the Discussion section, in the section called “State-dependent PVNOT activity during thermo-behavioral transitions”.
  
  (4) Figure 2O: A comparable linear regression for active huddling would be informative to assess whether the observed relationships extend across behavioral states.
  
  We agree. We have added linear regression analyses for active huddling and nesting to Fig. S2K-N including rsquared values, to complement the resting analyses in Figure 2O and 2L.
  
  This analysis shows that active huddling peak counts are also positively correlated with active huddle duration (but not nesting duration). The text has been updated accordingly.
  
  (5) Temperature manipulation: The use of floor temperature changes presents a distinct physiological and sensory experience from, for example, manipulation of ambient temperature. A discussion of how this choice may affect neural circuit engagement or interpretation of thermoregulatory responses would be beneficial.
  
  Both Reviewer 1 and Reviewer 2 raise important and related points: manipulating floor temperature provides a thermal stimulus that is distinct from manipulating whole-chamber ambient air temperature, and these modalities could engage partially different sensory pathways and circuits. (Note this response is copy-pasted to other relevant comments).
  
  We intentionally used floor cooling/heating because it provides a reliable, well-controlled stimulus that elicits thermoregulatory behaviors while keeping the experimental environment stable (e.g., avoiding changes in airflow/humidity that can accompany ambient cooling). To prevent conflation of these modalities, we revised the manuscript to consistently describe the manipulation as “floor temperature” (and not “ambient temperature”), and we added Discussion acknowledging that conductive floor temperature changes may differentially recruit peripheral thermoreceptors compared to ambient air temperature.
  
  While extending these experiments to whole-chamber ambient temperature changes could be informative in future work, it is not required for the central interpretations here, which focus on PVNOT activity dynamics during thermoregulatory behavior under controlled thermal conditions.
  
  (6) Correlations with behavior: Across the manuscript, it would be informative to see correlations between huddle duration and neural activity (e.g., cFos expression, calcium signal magnitude). Similarly, do longer huddles produce greater thermogenic effects?
  
  This is a great suggestion. The first point about huddle duration and neural activity echoes the Reviewer’s comment (4) above. For this point, we now show that the duration of active huddling is positively correlated with PVNOT peak count (Fig. S2K), which is similar to what we had shown for quiescence and quiescent huddling (Fig. 2K-P).
  
  Next, the point about huddle duration and thermogenic effects is also helpful. We have now added new analysis and panels to address this (Fig. S3M-R). We find that the duration of quiescent huddle bouts is negatively correlated with Tb (Fig. S3V). The other behaviors examined did not show correlations between duration and Tb. This finding supports our previous demonstration that quiescent huddling is an energy saving state in mice (Landen et al., 2024).
  
  Finally, we note that longitudinal correlations between bout length and peak counts are already reported in Fig. S3A-H.
  
  (7) Lactating vs. virgin mothers: The inclusion of maternal data is intriguing but feels somewhat disconnected from the central huddling-thermoregulation narrative. If these experiments are to remain, additional explanation of their rationale and how they fit into the broader story would help clarify their relevance.
  
  This point was echoed by Reviewers 1 and 3, and one we have taken several actions to address this.
  
  We agree with the reviewers that the rationale for the lactation data should be made more explicit. The primary purpose of this experiment was to validate the identity of oxytocinergic neurons of the PVN.
  
  Our efforts to use IHC to validate the identity of AAV-transfected cells were inconclusive, and we have now added new data to illustrate this point. We have added Fig. S4 that includes quantitative data on expression specificity. We observed significant variability in co-staining (OT+/GCaMP+) across brain slices, likely reflecting the dynamic nature of oxytocin peptide synthesis and storage, particularly with respect to processes lining the third ventricle. This finding is in accordance with other studies that are now cited in the text.
  
  We now emphasize that, because IHC provided variable co-localization, we employed the lactation model as an independent physiological validation of the identity of the recorded neurons.
  
  It is well established that PVNOT neurons undergo dramatic changes in firing dynamics and synchrony during lactation to support milk ejection (Yaguchi et al., 2023; Yukinaga et al., 2022). Conversely, AVP and CRF cell populations in the PVN do not appear to display synchronized pulsatile bursting during lactation (see response to Reviewer-2 comment-2 in ‘Recommendation for authors’ and our updated Discussion). Observing these characteristic changes in our recorded population provides high-confidence functional evidence that we are targeting oxytocin neurons. We have revised the text to clarify that Figure 4 serves primarily as a functional verification of genetic targeting.
  
  We also acknowledge in the Discussion the possibility that our Cre-line may capture a small percentage of non-oxytocinergic neurons, while noting that the dramatic shift in calcium dynamics during lactation (Figure 4I–L) strongly suggests the recorded population is dominated by oxytocin neurons.
  
  (8) Optogenetic manipulation: Have the authors tested the effect of PVN OT neuron stimulation or inhibition during huddling? Even a negative result would be of interest to the field. If these data exist (main or supplementary), I apologize for missing them. If not, the authors might consider including them or commenting briefly on any attempts or challenges in carrying out these experiments.
  
  We thank the reviewer for this question. We have not performed optogenetic manipulation during huddling. Our decision to perform optogenetic activation in solo-housed animals was driven by our fiber photometry finding that PVNOT activity profiles during the rest-to-arousal transition are social-context independent (Figures 2 and 3). Had the GCaMP data suggested that PVNOT peaks were specific to social huddling, optogenetic manipulation during huddling would have been the natural next experiment. However, because peaks aligned with thermoregulation broadly, rather than social behavior specifically, we designed our functional experiments to test the circuit's role in driving the autonomic and behavioral arousal transition.
  
  We also note that our experience with chemogenetic manipulation suggests that pharmacological approaches to study the rest-arousal transitions during huddling are not currently feasible. As described to our response to Reviewer 1, our DREADD inhibition experiments were confounded by stress-induced hyperthermia following injection, and because drug delivery could not occur while animals were asleep and resting, the experimental conditions failed to recapitulate the low-Tb quiescent state during which PVNOT peaks naturally occur. We share this experience because we believe it will be informative for others in the field considering similar approaches.
  
  Additionally, as described above (Reviewer 1, #5), the SGBS thermographic model encounters artifacts in paired contexts due to thermal merging between huddling mice. We have added a note in the Discussion addressing this, in the section called “Limitations and caveats”.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors aimed to elucidate the relationship between physiological state (i.e., behavioral status and thermogenic sympathetic activity) and the activity of hypothalamic paraventricular oxytocin (PVNOT) neurons in female mice. They studied this by combining automated classification of mouse behavior via video-based analysis with calcium imaging of PVNOT neuron activity. Sympathetic thermogenesis was inferred from surface temperature changes captured by infrared thermography, and the authors provided their custom analysis scripts in the manuscript. Notably, they found that a strong, pulsatile activation of PVNOT neurons was "occasionally" observed immediately before the animals transitioned from a resting to an active state. This pulsatile activity was observed in both pair-housed and individually housed animals. While PVNOT neurons are often associated with social behaviors, this finding suggests that the oxytocinergic system is also engaged during naturalistic behaviors, even in the absence of social interactions. If experiments were more convincingly performed and presented, the results would point to a broader physiological role of central oxytocin, including in the regulation of fundamental brain states and homeostatic processes, and offer a new perspective on the functional significance of central oxytocin signaling.
  
  Strengths:
  
  The oxytocinergic neural system is believed to subserve a wide range of physiological functions, and elucidating these roles requires monitoring PVNOT neuronal activity under various behavioral contexts, as well as manipulating this activity to establish causal links. In the present study, the authors show a technically sound experimental framework that integrates behavioral tracking in both individually and group-housed mice with the observation and manipulation of PVNOT neuron activity. This experimental setup represents a valuable methodological resource for researchers investigating the physiological functions of oxytocin.
  
  We thank the reviewer for the thoughtful review and for recognizing the value of our integrated framework for monitoring and manipulating PVNOT neuronal activity across behavioral contexts.
  
  Weaknesses:
  
  While this study successfully established a new experimental setup for simultaneous analyses of behavior and PVNOT neuronal activity, there are several concerns regarding the interpretation of the results and the robustness of the conclusions, which should be more thoroughly addressed.
  
  (1) The study relies on the assumption that calcium imaging and optogenetic manipulation were restricted only to PVNOT neurons. However, the specificity of AAV-mediated gene expression was not verified quantitatively. A fair number of cell bodies in the PVN expressed GCaMP8s, but not OT, indicating potential off-target expression (see Figure S2A, B). The lack of quantitative validation weakens confidence in the causal interpretation of the results.
  
  This point was echoed by Reviewers 1 and 3, and one we have taken several actions to address this.
  
  We agree with the reviewers that the rationale for the lactation data should be made more explicit. The primary purpose of this experiment was to validate the identity of oxytocinergic neurons of the PVN.
  
  Our efforts to use IHC to validate the identity of AAV-transfected cells were inconclusive, and we have now added new data to illustrate this point. We have added Fig. S4 that includes quantitative data on expression specificity. We observed significant variability in co-staining (OT+/GCaMP+) across brain slices, likely reflecting the dynamic nature of oxytocin peptide synthesis and storage, particularly with respect to processes lining the third ventricle. This finding is in accordance with other studies that are now cited in the text.
  
  We now emphasize that, because IHC provided variable co-localization, we employed the lactation model as an independent physiological validation of the identity of the recorded neurons.
  
  It is well established that PVNOT neurons undergo dramatic changes in firing dynamics and synchrony during lactation to support milk ejection (Yaguchi et al., 2023; Yukinaga et al., 2022). Conversely, AVP and CRF cell populations in the PVN do not appear to display synchronized pulsatile bursting during lactation (see response to Reviewer-2 comment-2 in ‘Recommendation for authors’ and our updated Discussion). Observing these characteristic changes in our recorded population provides high-confidence functional evidence that we are targeting oxytocin neurons. We have revised the text to clarify that Figure 4 serves primarily as a functional verification of genetic targeting.
  
  We also acknowledge in the Discussion the possibility that our Cre-line may capture a small percentage of nonoxytocinergic neurons, while noting that the dramatic shift in calcium dynamics during lactation (Figure 4I–L) strongly suggests the recorded population is dominated by oxytocin neurons.
  
  (Note, we have updated Figure S2A,B to more accurately reflect the extent of co-localization in this image).
  
  (2) The study focuses on the transition from rest to active states following pulsatile activity of PVNOT neurons. However, the physiological significance of this pulsatile activity remains unclear. According to the authors, pulsatile activity occurred with an approximately 20% probability within 100 seconds prior to the end of the resting state. This implies that, in the remaining 80% of rest-to-active transitions, pulsatile PVNOT activity did not occur, suggesting that it is not essential for initiating the transition. A comparative analysis of behavioral and thermogenic changes between transitions with and without pulsatile PVNOT activity would help to further clarify the functional relevance of this phenomenon and strengthen the authors' interpretation of the findings.
  
  These are excellent points, and here we address them separately.
  
  (1) probability of transitions.
  
  We agree that our wording could be misread and we have revised the text for clarity. The “~20%” value is not the fraction of rest-to-active transitions that exhibit pulsatile PVNOT activity within a 100-s window. Instead, Fig. 3F,H report an instantaneous (per-second) probability of observing a calcium peak as a function of time-to-bout offset (logistic regression). In other words, the probability of a peak increases sharply as the animal approaches rest offset (e.g., from ~2–3%/s near onset to ~14%/s for quiescence and ~25%/s for quiescent huddling near offset), indicating a strong state-dependent increase in peak likelihood rather than an all-or-none trigger.
  
  We further clarify in the Discussion that we do not claim PVNOT peaks are essential for initiating every transition; rather, PVNOT activity biases or enhances the probability of transition toward thermogenesis and behavioral arousal (added to section called “State-dependent PVNOT activity during thermo-behavioral transitions”).
  
  (2) the effect of peaks on transitions
  
  This is a very helpful suggestion and we agree that directly comparing transitions with vs. without pre-offset pulsatile PVNOT activity could strengthen interpretation of the functional relevance of these events. We have therefore added a new transition-aligned analysis of thermogenic dynamics at rest-to-active transitions (new Fig. 3P&S; and corresponding text in the Results and Statistics sections).
  
  Briefly, we extracted peri-transition body temperature (Tb) traces (−300 to +300 s) aligned to the offset of quiescence and quiescent-huddling bouts and classified each transition as Peak+ if it contained one or more calcium peaks in the 100 s preceding bout offset, and Peak− otherwise. To account for inter-individual differences in “balance point,” Tb was z-scored within mouse. We then quantified the post-offset thermogenic rise for each transition as the change in scaled temperature from a pre-offset baseline (−60 to 0 s) to the post-offset interval (0 to 300 s) and tested Peak+ vs Peak− differences using linear mixed-effects models. This revealed that Peak+ transitions exhibited significantly larger post-offset increases in scaled Tb than Peak− transitions for both quiescence offsets and quiescent-huddling offsets.
  
  Together, these results indicate that while pulsatile PVNOT activity is not present prior to every rest-to-active transition, when it occurs it is associated with a stronger thermogenic rise, consistent with a probabilistic modulatory role in promoting the transition rather than being strictly required to initiate it.
  
  We are grateful for this suggestion as this new data is very informative in the context of our model.
  
  (3) The study identifies a correlation between pulsatile activity of PVNOT neurons and rest-to-active transitions, and tests for a causal relationship using optogenetic stimulation. However, since PVNOT neurons are known to co-release other neurotransmitters such as glutamate, it remains unclear whether the observed effects are mediated specifically through oxytocin receptor signaling. To address this question, functional intervention experiments using oxytocin receptor antagonists or receptor knockout mice are necessary.
  
  We agree with the reviewer that PVNOT neurons co-release glutamate and that isolating the specific contribution of oxytocin signaling versus co-transmitted signals is an important question. However, our study was designed to identify the functional role of the PVNOT cell type during thermoregulatory state transitions, not to dissect the molecular mechanism of signaling at downstream targets. By demonstrating that the endogenous activity of this specific population aligns with the rest-arousal window and that their activation is sufficient to drive the phenotype, we provide an anatomical and functional framework for future mechanistic investigations.
  
  We also note that we provide anatomical evidence supporting a possible peptidergic mechanism: PVNOT neuron projections to the rostral medullary raphe (rMR), a key thermogenic control site, alongside oxytocin receptor mRNA expression in this region (Fig. S5). This anatomical link suggests a plausible pathway for oxytocinergic modulation of thermogenesis, but of course does not rule in/out glutamatergic signaling. We acknowledge this limitation in the Discussion and frame pharmacological and receptor knockout studies as important next steps.
  
  We address these points in the Discussion, in the section called “Limitations and caveats.”
  
  (4) The authors attempted to detect BAT thermogenesis and skin vasomotion using infrared thermography. This technique measures only skin hair temperatures (since the skin was not shaved), but does not measure "BAT temperature" or "vasomotor tone". As seen in Figure 5E, the temperatures of the body surface areas ("BAT", "Rump", and "Dorsal surface") mostly changed in parallel, indicating that these temperatures are strongly affected by body core temperature. Therefore, the thermographic measurements in this study did not provide convincing information on BAT thermogenesis or skin vasomotion. To avoid misleading reports, the authors need to use other techniques to directly measure temperatures, such as telemetry.
  
  We agree that infrared thermography measures surface radiance rather than internal tissue temperature. We have revised the manuscript to use more precise language (e.g., "surface temperature over the interscapular BAT region" rather than "BAT temperature"). However, surface measurements are not merely passive reflections of core temperature. Here we add background and explanation about our thermography data:
  
  Background on our approach
  
  Infrared thermography provides a non-invasive readout of heat emission over the interscapular region and has been validated as reporting UCP1-dependent BAT thermogenesis in mice under adrenergic stimulation (Crane et al., 2014). That said, there are known confounds (insulation/adiposity, blood flow, protocol variability) and standardized protocols are needed (Law et al., 2018). Direct telemetry or implanted thermocouples offer superior precision for measuring BAT temperature, so long as the probe is sutured to BAT itself or to Sulzer’s vein–a technical challenge because probes tend to drift over time (e.g., (Dodson et al., 2024)).
  
  Our BAT findings in context:
  
  Using SGBS, we demonstrate that the interscapular BAT region is significantly warmer than the adjacent rump surface (Fig. 5C). If surface temperature were purely a reflection of uniform core temperature, this consistent regional hotspot would not be observed.
  
  Our cross-correlation analysis from the photometry (Fig. 5E) shows the rise in BAT surface temperature precedes changes in other body regions by approximately 90 seconds, suggesting that BAT acts as a primary heat source during rest-to-arousal transitions rather than passively following core temperature. This finding is consistent with another study, using telemetric probes placed in BAT, finding that episodic onset of BAT temperature started to increase 3 minutes before body temperature (Ootsuka et al., 2009).
  
  Based on this Reviewer’s comment here and the subsequent one (5), we have now added a new analysis of the temporal patterning of arousal and thermogenesis in the optogenetic cohort of animals; see below for details.
  
  Vasomotor tone
  
  We agree that infrared thermography does not directly measure vasomotor tone. We have revised the text to remove language implying that our measurements directly quantify vasomotor tone, vasodilation or vasoconstriction.
  
  We note that the established approach for non-invasive assessment of vasomotion uses glabrous skin of the tail and ears (Garami et al., 2011; Meyer et al., 2017; Škop et al., 2020). Rump surface temperature measured over hairy, non-glabrous skin correlates more closely with core body temperature than with cutaneous vasomotor tone (Meyer et al., 2017; Škop et al., 2020) and is used in the literature as a reference point for calculating BAT thermogenesis.
  
  In our data, rump surface temperature decreased following PVNOT calcium peaks while BAT and dorsal surface temperatures increased (Fig. 5L-M). This pattern is consistent with sympathetically-driven thermogenesis in which peripheral heat loss is reduced while BAT drives core temperature upwards. We now acknowledge that our rump measurements do not isolate vasomotor contributions. We have revised the manuscript accordingly, replacing references to rump vasoconstriction with language describing the observed thermal pattern while avoiding attribution to a specific thermoeffector mechanism.
  
  Finally, we note that telemetry would strengthen deep-body temperature interpretation, but telemetry does not itself quantify vasomotor tone; the same distal heat-loss readouts described above would be required regardless of core Tb methodology.
  
  In sum, infrared thermography enables non-invasive, simultaneous tracking of multiple thermal features in freely moving, undisturbed animals—a requirement for studying the naturalistic state transitions central to this study. We have added a section to the Discussion acknowledging the limitations of surface infrared thermography.
  
  (5) Photostimulation of PVNOT neurons increased Tb after 400 sec (6.6 min) (Figure 5). This latency is too long to conclude that the neuronal stimulation elicited BAT thermogenesis. A more reasonable explanation is that the increase in Tb was caused by the induction of physical activity (Figure S4C), which slowly generates heat and contributes to the elevation of Tb. However, this view contradicts the authors' claim. To address this concern, the authors should directly measure BAT thermogenesis and compare it with the rate of Tb elevation. If BAT thermogenesis occurs, the rate at which the BAT temperature increases must exceed the rate at which Tb rises.
  
  We thank the reviewer for this thoughtful critique. With this response we first provide additional context about the timeline of temperature increases, and second add a new analysis addressing the relative contributions of activity and BAT-surface to Tb changes.
  
  (1) Additional context on the temporal progression
  
  First, the observed timescale does not, per se, rule out a contribution of BAT thermogenesis. While the kinetics of BAT activation and associated Tb increases can operate on a fast timescale in anesthetized animals, in vivo activation of BAT thermogenesis pathways can take several minutes to yield a statistically detectable difference. For example, activation of DMH→rMR glutamatergic signaling, a canonical thermogenic command pathway, takes several minutes to produce a significant increase in both Tb and BAT using telemetric temperature probes (Kataoka et al., 2014).
  
  This timescale could also be consistent with peptidergic neuromodulation by PVNOT neurons, which are more likely to be modulators (and not drivers) of the canonical thermogenic pathway. Oxytocin is known to act via volume transmission and metabotropic receptor signaling, which operate on slower timescales than ionotropic neurotransmission (Ludwig and Leng, 2006). Downstream recruitment of sympathetic outflow and BAT thermogenesis is likewise a multistep autonomic process, not an immediate synaptic event.
  
  Next, the thermal dynamics reported in Figure 5 and Figure S4 are not consistent with activity-induced heat production alone. Specifically:
  
  - Thermal increases were spatially localized to interscapular/dorsal regions corresponding to BAT depots before generalized surface warming.
  
  - Importantly, photostimulation-induced warming was observed even during behavioral states characterized by low baseline activity, suggesting that thermogenic activation was not simply a byproduct of movement.
  
  While we did not directly measure BAT sympathetic nerve activity, our surface thermography approach was designed specifically to resolve regional temperature dynamics over the interscapular BAT area. The spatial specificity and temporal profile of the warming are consistent with BAT thermogenesis rather than uniform musclegenerated heat.
  
  We acknowledge that direct measurement of BAT sympathetic activity or oxygen consumption would provide additional mechanistic resolution. However, given (i) the known role of PVN oxytocin neurons in autonomic regulation, (ii) the spatially localized dorsal temperature increase, and (iii) the temporal dissociation between stimulation onset and gradual systemic Tb rise, we conclude that BAT thermogenesis remains the most parsimonious explanation.
  
  We have revised the Discussion to more explicitly acknowledge these temporal dynamics by clarifying that photostimulation likely follows the timescales of peptidergic neuromodulation.
  
  (2) New analysis
  
  We have added a new analysis to address the relationship between Tb and BAT-surface temperature and locomotion the optogenetic cohort. In short, we show that across all mice changes in BAT typically precede changes in Tb, and that the effect of optogenetic stimulation on core Tb can’t be explained by physical activity (nor can it be explained by BAT-surface temperature).
  
  First, cross-correlation of derivatives suggested BAT surface temperature changes typically precede changes in dTb/dt across mice, whereas physical activity changes did not consistently precede dTb/dt. This result, now shown in Fig. S5G, is consistent with our cross-correlation analysis of the fiber-photometry cohort.
  
  Next, we used a lagged regression analysis to test whether photostimulation-evoked increases in core temperature are fully mediated by physical activity. Specifically, we modeled the derivative of core Tb (dTb/dt) using an impulseresponse representation of photostimulation, while controlling for distributed lags (0–120 s) of physical activity and BAT surface temperature derivative, with random effects for mouse and trial. Photostimulation remained a significant predictor of dTb/dt while controlling for activity and BAT-surface (likelihood ratio test, χ<sup>2</sup>=7.66, p=0.0056), indicating that the relationship between stimulation and Tb is not fully explained by activity.
  
  Recommendations for the authors:
  
  Editors note:
  
  We suggest including key statistical support for the claims in the main text (e.g., results or figure legends).
  
  We have added statistical support for key claims in the main text results. We have also added references to Table S1 where appropriate (e.g., where there is a long list of statistical results); we hope this aids the readability of the report.
  
  Reviewer #1 (Recommendations for the authors):
  
  See above - the authors should decide what to prioritize, but I only mention significant concerns above. The manuscript could be improved to 'Convincing' or even 'Compelling' with sufficient effort.
  
  Thanks for the careful reading of the manuscript. We’ve addressed many of these points, and feel the manuscript has been strengthened as a result.
  
  There were also some text errors here and there.
  
  Several text errors were identified and fixed. Thank you.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Figure 1I: The quantification shown here is a bit unclear from the figure and legend - are the authors reporting the percentage of cFos+ cells within the OXT+ population, or within the general DAPI+ population? If the latter, including a co-localization analysis to estimate the proportion of OXT+ cells activated would strengthen the interpretation.
  
  We thank the reviewer for catching this; Reviewer 1 made a similar comment. A typo in the figure legend led to this confusion. Figure 1I is in fact a quantification of the percent Oxytocin:Fos colocalized cells (not Fos:DAPI, as was written) in dorsal and ventral subregions of the PVN during active huddling and quiescent huddling. We have corrected the legend and clarified the quantification in the revised manuscript. (Note this response is copy-pasted to other relevant comments).
  
  (2) PVN cell types: It would be useful to briefly discuss the potential involvement of other PVN populations (e.g., CRF, AVP neurons) in huddling, given their known roles in social behavior, stress, and thermoregulation.
  
  Thank you for the insightful comment. We address these points in two parts.
  
  (1) PVN cell types and huddling
  
  Regarding the specific connection between these cell types and huddling: to our knowledge, no study has directly tested the effect of PVN CRF or PVN AVP neuron manipulation on huddling behavior. The most relevant data come from Bendesky et al. (Bendesky et al., 2017), who found that intracerebroventricular administration of AVP in Peromyscus inhibited nest building but had no effect on huddling, licking, or pup retrieval (though this pharmacological approach does not isolate PVN AVP neurons specifically). Their chemogenetic manipulation of PVN AVP neurons in Mus musculus confirmed the nest-building effect but did not assess huddling. For CRF, the available evidence suggests an opposing role to OT in social care contexts: chemogenetic activation of PVN CRF neurons impairs maternal behavior in postpartum mice (Melón et al., 2018), and intracerebroventricular CRF administration suppresses maternal care and can induce pup-killing in virgin rats (Pedersen et al., 1991).
  
  That said, PVN AVP neurons do promote wakefulness via lateral hypothalamic orexin neurons (Islam et al., 2022) and a recent preprint has implicated PVN AVP neurons in temperature-dependent maternal thermoregulatory behaviors, including co-nesting and shepherding, via projections to the central amygdala (Adahman et al., 2025). Notably, while that study focused on AVP neurons, their c-Fos data also revealed significant temperature-dependent modulation of PVNOT neurons (Fig. 3B), with suppressed activity at thermoneutrality relative to cooler conditions, a pattern suggesting that OT neurons are active under conditions where thermoregulatory effort is required. This data is consistent with our findings on PVNOT neuron involvement in rest-to-arousal transitions driven by thermoregulatory need.
  
  Additionally, Inada et al. (Inada et al., 2025) used an elegant series of viral-genetic experiments to demonstrate that PVN AVP neurons facilitate paternal caregiving behaviors via AVP to oxytocin receptor crosstalk in the preoptic area. Critically, their fiber photometry and circuit mapping data showed that chemogenetic activation of PVN AVP neurons did not recruit PVN OT neurons (Fig. 4), indicating that these populations operate independently in this context. We believe this finding is consistent with our interpretation that the thermoregulatory signals we observe reflect a cell-type specific property of PVNOT neurons. Future work examining how PVNOT, AVP, and CRF population interact during thermoregulatory state transitions would be valuable.
  
  (2) PVN cell types and stress and thermoregulation
  
  PVN CRF and AVP neurons have established roles in stress responses and social behavior, and future studies examining their involvement in huddling would be valuable. However, their direct roles in thermoregulation are limited. PVN CRF neurons are primarily stress-axis regulators whose thermoregulatory influence is mediated indirectly through downstream targets such as the DMH (reviewed in (Morrison and Nakamura, 2019)). AVP's thermoregulatory role is principally as an endogenous antipyretic acting via preoptic area neurons (Tabarean, 2021), rather than through PVN magnocellular AVP neurons.
  
  Importantly, the synchronized pulsatile bursting pattern that is characteristic of OT neurons during lactation (which serves as a key validation benchmark for our PVNOT calcium peaks), appears to be specific to OT neurons and does not generalize to other PVN populations. One study (Popescu et al., 2019) directly demonstrated that lactation-induced IPSC burst upregulation occurs selectively in OT magnocellular neurons, with no change in VP neurons within the same nucleus. VP neurons do exhibit phasic bursting, but these patterns are asynchronous, of longer duration, and serve antidiuretic rather than neuroendocrine-pulsatile functions (De Mota et al., 2004; Poulain et al., 1977; Wakerley et al., 1978). To our knowledge, no studies have reported synchronized burst activity in PVN CRF neurons during lactation or at rest. We have added a brief discussion of these points to the manuscript.
  
  (3) Figure 2B: Several behavioral abbreviations (e.g., LMA) are not intuitive and are missing from the legend. Spelling them out or including schematic illustrations would improve clarity.
  
  We have expanded the figure legends to define all behavioral abbreviations: LMA (Locomotor Activity), EaDr (Eating or Drinking), Groom (Grooming), Nest (Nesting or Nest Building), Quies (Quiescence), Sta (Stationary), ConI (Contact Initiated), ConR (Contact Received), AHud (Active Huddle), QHud (Quiescent Huddle).
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Figures 1D-F and S1A-C: The current magnification is insufficient to clearly resolve the distribution of FOS signals. FOS fluorescence is generally expected to be localized within cell nuclei. However, particularly in Figure 1F, the signals exhibit punctate or fibrous staining in addition to nuclear localization.
  
  This raises concerns about the quality of the tissue staining and the reliability of subsequent analyses. Including higher-magnification images would strengthen the credibility of the data presented.
  
  Thanks for the careful observation. We used a well-validated FOS protocol (see Methods; c-Fos (9F6) Rabbit mAb, Cell Signaling, 14609, 1:1000 dilution in block solution).
  
  To address this issue, in Figure 1 we have included better images of the regions of interest (DMH, LS, and PVN). We also show an inset with DAPI and the FOS IHC. These inset images show that the FOS signal does co-localize with nuclei.
  
  The reviewer notes that there is a fibrous staining in the PVN. We too noted this type of staining, due to clusters of bright dots in the PVN but not in other regions. This pattern was reproducible across several histological experiments. Fortunately, these bright dots were easily removed in our image processing routine using a selective median filter (pixel radius < 2.0 and and pixel intensity > 50).
  
  (2) Figures 2A, 4C, and 6A: As mentioned in the Public Review, the specificity of AAV-mediated gene expression is critical for the strength of the conclusions. Quantitative data demonstrating the expression specificity should be included.
  
  This point was echoed by Reviewers 1 and 3, and one we have taken several actions to address this. (Note this response is copy-pasted to the other reviewers).
  
  We agree with the reviewers that the rationale for the lactation data should be made more explicit. The primary purpose of this experiment was to validate the identity of oxytocinergic neurons of the PVN.
  
  Our efforts to use IHC to validate the identity of AAV-transfected cells were inconclusive, and we have now added new data to illustrate this point. We have added Fig. S4 that includes quantitative data on expression specificity. We observed significant variability in co-staining (OT+/GCaMP+) across brain slices, likely reflecting the dynamic nature of oxytocin peptide synthesis and storage, particularly with respect to processes lining the third ventricle. This finding is in accordance with other studies that are now cited in the text.
  
  We now emphasize that, because IHC provided variable co-localization, we employed the lactation model as an independent physiological validation of the identity of the recorded neurons.
  
  It is well established that PVNOT neurons undergo dramatic changes in firing dynamics and synchrony during lactation to support milk ejection (Yaguchi et al., 2023; Yukinaga et al., 2022). Conversely, AVP and CRF cell populations in the PVN do not appear to display synchronized pulsatile bursting during lactation (see response to Reviewer-2 comment-2 in ‘Recommendation for authors’ and our updated Discussion). Observing these characteristic changes in our recorded population provides high-confidence functional evidence that we are targeting oxytocin neurons. We have revised the text to clarify that Figure 4 serves primarily as a functional verification of genetic targeting.
  
  We also acknowledge in the Discussion the possibility that our Cre-line may capture a small percentage of non-oxytocinergic neurons, while noting that the dramatic shift in calcium dynamics during lactation (Figure 4I–L) strongly suggests the recorded population is dominated by oxytocin neurons.
  
  (3) Figure 2D: The authors should show an expanded view of a representative "PVNOT peak" from the spikes presented.
  
  We have added a representative peak to Fig. 2D.
  
  (4) Figure 2E-J: All the abbreviations of the behavioral states must be defined in the figure or legend.
  
  We added these abbreviations to the legend, and a text box reading “See legend for abbreviations” to the schematic.
  
  (5) Figure 2F, G, I, and J: The units on the y-axis should be indicated to facilitate interpretation.
  
  We have added these units. Thanks.
  
  (6) Figure 3A: Three large PVNOT peaks occurred between 01:30 and 02:00. However, these peaks did not cause an obvious transition in behavioral states or an increase in Tb within several minutes. Therefore, statements such as "PVNOT neurons predict transitions towards thermogenesis and behavioral arousal" in the text and subheading (pages 7 and 9) are questionable.
  
  We thank the reviewer for this careful observation. The three peaks between 01:30 and 02:00 that do not immediately lead to a behavioral transition illustrate a key aspect of our findings: the relationship between PVNOT activity and state transitions is probabilistic and state-dependent, not deterministic. Our logistic regression analysis (Fig. 3F, H, J, L) demonstrates that peaks increase the probability of a transition (up to ~20% per second) rather than acting as an obligatory "on switch." While individual variability exists in any single trace, the group-level analysis reveals a statistically significant increase in physical activity following PVNOT peaks (Fig. 3B–E).
  
  We therefore use ‘predict’ in a probabilistic sense: PVNOT peaks increase the conditional probability of impending state transitions in a manner that depends on behavioral context, rather than acting as an obligate trigger in every instance. We have taken care to not claim that PVNOT neurons are a necessary causal factor for transitions towards thermogenesis and arousal.
  
  We have updated the figure legend to clarify that Figure 3A shows an individual example trace, and revised the subheading on page 7 to more accurately reflect the probabilistic nature of this relationship: "PVNOT neurons predict increased likelihood of transitions towards thermogenesis and behavioral arousal in social and non-social contexts".
  
  We qualified the word “predicts” with “probabilistically” in the third paragraph of this section.
  
  Finally, this comment is related to the Reviewer’s comment-2 in the Public Reviews. To address that comment, we added a new analysis (now Fig. 3P&S) which shows that the presence of a peak in a bout of rest increases the thermogenic trajectory compared to bouts without a peak.
  
  (7) Figure 3F and H: If PVNOT peaks contribute to the initiation of transitions into the active state, the probability of peak occurrence should reach its maximum prior to the quiescence offset. However, the figures do not present the probability trajectory after the offset, which limits the ability to evaluate the authors' interpretation. Reanalysis extending to 150 seconds post-offset would be needed to clarify this issue.
  
  Thank you for this suggestion. We agree that examining PVNOT dynamics around the period following quiescence (and quiescent huddling) offset can further inform how PVNOT activity relates to rest-to-active transitions, and this has led to new insights within the manuscript.
  
  For background, in the original analysis (Fig. 3F,H), we used logistic regression to quantify how peak probability differs between bout onset versus near bout offset. We focused these analyses on the the timeframe of the bouts themselves (plus a small margin) because, in freely behaving animals, the pre-onset and post-offset period is heterogeneously composed of multiple potential subsequent behaviors (e.g., brief re-entry into quiescence, nesting, active huddling, locomotion, etc), which would confound a single post-offset probability trajectory (unless offsets are stratified by the identity of the subsequent behavioral state–beyond the scope of this paper).
  
  To address this concern, we now expand our peri-event baseline calcium analysis to include three minutes before and three minutes after both bout onset and bout offset for all four behaviors (new Fig. S3I–L). These extended traces show that for the two resting states (quiescence and quiescent huddling), baseline PVNOT calcium reaches a minimum near bout onset and a maximum near bout offset, whereas for the two active states (nesting and active huddling) baseline calcium shows the opposite pattern (maximum near onset, minimum near offset). Thus, the expanded post-offset analyses provide a more complete view of PVNOT calcium dynamics across the requested post-offset epoch and further support the conclusion that PVNOT activity is aligned with (and elevated around) behavioral transitions in a state-dependent manner. We have updated the Results text accordingly and now explicitly reference these new extended peri-event baseline analyses.
  
  (8) Figures 4H and I: Figure 4H shows that the waveform in the PPD2-7 group has a narrower FWHM than the Virgin group, which is the opposite of the group data in Figure 4I. Presenting scaled waveforms in parallel would allow for a clearer comparison across groups.
  
  Thank you for pointing out the inconsistency between the representative waveform in Fig. 4H and the group summary in Fig. 4I. You were correct: the PPD2–7 and Virgin waveforms in Fig. 4H had been mislabeled. We have corrected the labeling. (We verified that the underlying data are correct).
  
  As suggested, to enable visual comparison of waveform width across groups independent of amplitude differences, we derived peak-normalized average waveforms using a normalization procedure for every peak prior to averaging. Specifically, for each peak we (1) baseline-subtracted the trace by subtracting the mean fluorescence in a pre-peak baseline window, and then (2) divided the baseline-subtracted waveform by its own maximum value to scale the event amplitude to 1. We then computed the mean ± SEM of these peak-normalized waveforms across events within each group.
  
  We believe these changes resolve the discrepancy and improve the clarity of the figure, consistent with your suggestion.
  
  (9) Figure 5: In studies of thermoregulatory processes, tail blood flow or temperature is commonly used as an indicator of vasomotor responses. Is it feasible to track tail temperature using the SGBS system? If not, it may be helpful to acknowledge this as a technical limitation.
  
  We agree that tail temperature is a commonly used indicator of vasomotor responses. While SGBS could in principle be trained to segment the tail, the current model was optimized for dorsal body regions viewed from an overhead perspective. Reliable tail tracking presents substantial technical challenges in our configuration of homecage recordings. The tail’s thin geometry and rapid, multidirectional movement frequently result in partial or complete occlusion (e.g., beneath bedding or the animal’s body). In addition, during vasoconstriction the tail temperature approaches ambient floor temperature, reducing thermal contrast and making segmentation unreliable with the current thermal resolution limited by our camera. We have acknowledged this as a technical limitation in the Discussion, in the section called “Thermal tracking and validation of PVNOT recording specificity”.
  
  (10) Figure S5: Please describe the reason and histological background for the intravenous injection of FluoroGold.
  
  Intravenous injection of FluoroGold (FG) was used to histologically differentiate between magnocellular and parvicellular oxytocin neurons in the PVN. Because the posterior pituitary is located outside the blood-brain barrier,
  
  i.v. FG is selectively taken up by terminals of magnocellular neurons and retrogradely transported to their cell bodies. This allows us to infer the neuroanatomical identity (magno- vs. parvicellular) of the PVNOT neurons of interest. We have updated the Methods with a detailed description of the FG injection protocol as follows:
  
  “To distinguish between peripheral-projecting magnocellular and central-projecting parvicellular neurons, mice received 15 uL intravenous injection of 4% Fluoro-Gold (Fluorochrome) diluted in 100 uL of sterile saline. Prior to injection, mice were given an analgesic dose of carprofen (20 mg/kg, s.c.). Mice were briefly restrained using a modified 50 mL conical tube, in which holes were drilled to allow for proper air flow and respiration. Mouse tails were interposed between two heating pads to enhance visibility of the tail vein. Tails were wiped down with 70% ethanol and FG was administered via either right or left lateral tail vein using a 0.5 mL 28G syringe. Mice were sacrificed 24- 48 hours post-FG administration.”
  
  The following are minor points.
  
  (11) Figure 2E-G, Figure 3F,G, Figure S2G,I, Figure S3A: "quiesence" > "quiescence". This typo may appear elsewhere in the manuscript as well.
  
  Thanks. These edits have been made.
  
  (12) Page 7, line 14: Peaks were NOT significantly increased at 29{degree sign}C in Figure 2N.
  
  Thanks for the very careful read. By way of explanation: this difference had been significant in an earlier draft; however, when we added more replicates, the difference went away. We have corrected this sentence.
  
  (13) There are mislabeled figure numbers in the main text. The authors should carefully check this throughout the manuscript.
  
  We found mislabeled figure numbers and have corrected them.
  
  (14) Page 13, lines 1- 2: To make the description clearer, it might be better to rephrase the part that says, "some blue light stimulations occurred." As it stands, it could give the impression that the stimulations happened spontaneously. Using a phrase like "were delivered" would more clearly indicate that these were intentional, experimenter-controlled events.
  
  Agreed. Thanks. The edit has been made.
  
  Additional comments:
  
  The oxytocin system is thought to support a wide range of physiological and behavioral functions, and the circuits involving oxytocin neurons are likely to be regulated in complex and dynamic ways. As oxytocin research continues to expand, the growing body of evidence not only deepens our understanding but also highlights the system's complexity. In this context, the development of an approach that enables the observation of oxytocinergic neuron activity in parallel with naturalistic behavior represents a promising methodological contribution. It is likely that similar experimental frameworks will become increasingly common in future studies. While reading this manuscript, as a reader rather than a reviewer, I was wondering how OXT neurons detect or define the "rest balance-point," and how they might contribute to shifting the brain toward an "awake balance-point" (Figure 7). Given that eLife allows authors to include an "Ideas and Speculation" subsection within the Discussion, it would be appreciated - though not essential - if the authors could briefly share their perspective on this point. I believe such mechanistic insight would make the manuscript more intellectually stimulating.
  
  This is a great suggestion. We have added a new “Ideas and Speculation” section of the Discussion.
  
  References
  
  Adahman Z, Ooyama R, Gashi DB, Medik ZZ, Hollosi HK, Sahoo B, Akowuah ND, Riceberg JS, Carcea I. 2025. Hypothalamic Vasopressin Neurons Enable Maternal Thermoregulatory Behaviors. DOI: https://doi.org/10.1101/2025.01.23.634569
  
  Bendesky A, Kwon Y-M, Lassance J-M, Lewarch CL, Yao S, Peterson BK, He MX, Dulac C, Hoekstra HE. 2017. The genetic basis of parental care evolution in monogamous mice. Nature 544:434–439. DOI: https://doi.org/10.1038/nature22074
  
  Crane JD, Mottillo EP, Farncombe TH, Morrison KM, Steinberg GR. 2014. A standardized infrared imaging technique that specifically detects UCP1-mediated thermogenesis in vivo. Molecular Metabolism 3:490– 494. DOI: https://doi.org/10.1016/j.molmet.2014.04.007
  
  De Mota N, Reaux-Le Goazigo A, El Messari S, Chartrel N, Roesch D, Dujardin C, Kordon C, Vaudry H, Moos F, Llorens-Cortes C. 2004. Apelin, a potent diuretic neuropeptide counteracting vasopressin actions through inhibition of vasopressin neuron activity and vasopressin release. Proceedings of the National Academy of Sciences 101:10464–10469. DOI: https://doi.org/10.1073/pnas.0403518101
  
  Dodson AD, Herbertson AJ, Honeycutt MK, Vered R, Slattery JD, Goldberg M, Tsui E, Wolden-Hanson T, Graham JL, Wietecha TA, O’Brien KD, Havel PJ, Sikkema CL, Peskind ER, Mundinger TO, Taborsky GJ, Blevins JE. 2024. Sympathetic Innervation of Interscapular Brown Adipose Tissue Is Not a Predominant Mediator of Oxytocin-Induced Brown Adipose Tissue Thermogenesis in Female High Fat Diet-Fed Rats. Current Issues in Molecular Biology 46:11394–11424. DOI: https://doi.org/10.3390/cimb46100679
  
  Garami A, Pakai E, Oliveira DL, Steiner AA, Wanner SP, Almeida MC, Lesnikov VA, Gavva NR, Romanovsky AA. 2011. Thermoregulatory Phenotype of the Trpv1 Knockout Mouse: Thermoeffector Dysbalance with Hyperkinesis. The Journal of Neuroscience 31:1721–1733. DOI: https://doi.org/10.1523/JNEUROSCI.4671-10.2011
  
  Inada K, Hagihara M, Yaguchi K, Irie S, Inoue YU, Inoue T, Miyamichi K. 2025. Vasopressin-to-oxytocin receptor crosstalk in the preoptic area underlying parental behaviors in male mice. Nature Communications 16:10844. DOI: https://doi.org/10.1038/s41467-025-66908-0
  
  Islam MT, Rumpf F, Tsuno Y, Kodani S, Sakurai T, Matsui A, Maejima T, Mieda M. 2022. Vasopressin neurons in the paraventricular hypothalamus promote wakefulness via lateral hypothalamic orexin neurons. Current Biology 32:3871-3885.e4. DOI: https://doi.org/10.1016/j.cub.2022.07.020
  
  Kataoka N, Hioki H, Kaneko T, Nakamura K. 2014. Psychological Stress Activates a Dorsomedial HypothalamusMedullary Raphe Circuit Driving Brown Adipose Tissue Thermogenesis and Hyperthermia. Cell Metabolism 20:346–358. DOI: https://doi.org/10.1016/j.cmet.2014.05.018
  
  Landen JG, Vandendoren M, Killmer S, Bedford NL, Nelson AC. 2024. Huddling substates in mice facilitate dynamic changes in body temperature and are modulated by Shank3b and Trpm8 mutation. Communications Biology 7:1186. DOI: https://doi.org/10.1038/s42003-024-06781-7
  
  Law J, Chalmers J, Morris DE, Robinson L, Budge H, Symonds ME. 2018. The use of infrared thermography in the measurement and characterization of brown adipose tissue activation. Temperature 5:147–161. DOI: https://doi.org/10.1080/23328940.2017.1397085
  
  Ludwig M, Leng G. 2006. Dendritic peptide release and peptide-dependent behaviours. Nature Reviews Neuroscience 7:126–136. DOI: https://doi.org/10.1038/nrn1845
  
  Melón LC, Hooper A, Yang X, Moss SJ, Maguire J. 2018. Inability to suppress the stress-induced activation of the HPA axis during the peripartum period engenders deficits in postpartum behaviors in mice. Psychoneuroendocrinology 90:182–193. DOI: https://doi.org/10.1016/j.psyneuen.2017.12.003
  
  Meyer CW, Ootsuka Y, Romanovsky AA. 2017. Body Temperature Measurements for Metabolic Phenotyping in Mice. Frontiers in Physiology 8:520. DOI: https://doi.org/10.3389/fphys.2017.00520
  
  Morrison SF, Nakamura K. 2019. Central Mechanisms for Thermoregulation. Annual Review of Physiology 81:285– 308. DOI: https://doi.org/10.1146/annurev-physiol-020518-114546
  
  Ootsuka Y, de Menezes RC, Zaretsky DV, Alimoradian A, Hunt J, Stefanidis A, Oldfield BJ, Blessing WW. 2009. Brown adipose tissue thermogenesis heats brain and body as part of the brain-coordinated ultradian basic rest-activity cycle. Neuroscience 164:849–861. DOI: https://doi.org/10.1016/j.neuroscience.2009.08.013
  
  Pedersen CA, Caldwell JD, McGuire M, Evans DL. 1991. Corticotronpin-releasing hormone inhibits maternal behavior and induces pup-killing. Life Sciences 48:1537–1546. DOI: https://doi.org/10.1016/00243205(91)90278-J
  
  Popescu IR, Buraei Z, Haam J, Weng F, Tasker JG. 2019. Lactation induces increased IPSC bursting in oxytocinergic neurons. Physiological Reports 7:e14047. DOI: https://doi.org/10.14814/phy2.14047
  
  Poulain DA, Wakerley JB, Dyball REJ. 1977. Electrophysiological differentiation of oxytocin-and vasopressinsecreting neurones. Proceedings of the Royal Society of London. Series B. Biological Sciences 196:367– 384. DOI: https://doi.org/10.1098/rspb.1977.0046
  
  Škop V, Guo J, Liu N, Xiao C, Hall KD, Gavrilova O, Reitman ML. 2020. Mouse Thermoregulation: Introducing the Concept of the Thermoneutral Point. Cell Reports 31:107501. DOI: https://doi.org/10.1016/j.celrep.2020.03.065
  
  Tabarean IV. 2021. Activation of Preoptic Arginine Vasopressin Neurons Induces Hyperthermia in Male Mice. Endocrinology 162:bqaa217. DOI: https://doi.org/10.1210/endocr/bqaa217
  
  Wakerley JB, Poulain DA, Brown D. 1978. Comparison of firing patterns in oxytocin- and vasopressin-releasing neurones during progressive dehydration. Brain Research 148:425–440. DOI: https://doi.org/10.1016/00068993(78)90730-8
  
  Yaguchi K, Hagihara M, Konno A, Hirai H, Yukinaga H, Miyamichi K. 2023. Dynamic modulation of pulsatile activities of oxytocin neurons in lactating wild-type mice. PLOS ONE 18:e0285589. DOI: https://doi.org/10.1371/journal.pone.0285589
  
  Yukinaga H, Hagihara M, Tsujimoto K, Chiang H-L, Kato S, Kobayashi K, Miyamichi K. 2022. Recording and manipulation of the maternal oxytocin neural activities in mice. Current Biology 32:3821-3829.e6. DOI: https://doi.org/10.1016/j.cub.2022.06.083
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.22.619715v4
www.biorxiv.org www.biorxiv.org

Anterior cingulate cortex in complex associative learning: monitoring action state and action content

1
1. Public_Reviews 10 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  We thank the reviewers for their additional feedback. Below, we provide detailed responses to each reviewer’s major concerns. In addition, we identified an error in the previously submitted Fig. 6C and have corrected the X-axis labels accordingly.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Motion-related signal in ACC: the new Fig. 2E looks good, but it is hard to visualize how it is just a reordering of the old Fig. 5C.
  
  We thank the reviewer for this feedback. Fig. 2E and the original Fig. 5C do bear resemblance. The primary difference is the temporal window and organization of the data. In the original Fig 5C, the time window was only ± 5 sec whereas Fig. 2E is ± 30 sec. The main objective we aim to highlight is that ACC shows both activation and inhibition in response to shuttle on an extremely prolonged order, up to 30 sec. Data is sorted to separate inhibition and activation to illustrate the sustained activity persists for both populations.
  
  All categories in the new Fig. 4D appear to respond to shuttle initiation, with less than 1s latency. For example, type 2a/2b consists of 40% of the population and their response to movement onset is apparent. Thus, it is not clear whether most neurons respond to shuttle crossing as described in the manuscript.
  
  We thank the reviewer for drawing attention to this discrepancy. It was not our intention to strike comparison between shuttle initiation versus shutting crossing responses across neurons, and we do not dispute that ACC responds to both events. While shuttle initiations and crossings provide a consistent temporal alignment point, they do not define the temporal focus of much of our analyses. Given that most shuttle responses terminate within ~2 sec, the extended windows analyzed (i.e. ± 5 sec; Fig. 4) largely reflect post-action ACC activity. Overall, although ACC neurons show mixed responses to initiations or crossings, the most consistent feature is prolonged modulation that persists beyond shuttle termination. We have revised the text to reflect this focus.
  
  Given this and the reviewer’s feedback, we further examined whether ACC activity is more strongly aligned with shuttle initiation, crossing, or termination. To determine which shuttle event (initiation, crossing, or termination) captured the most acute changes in ACC neuronal firing, we conducted an event-locked modulation analysis (Fig. S4). Our results showed that shuttle crossing was associated with the largest fraction of significantly modulated ACC neurons (Fig. S4). These findings suggest that shuttle crossing represents the most prominent event for ACC engagement during shuttle behaviors.
  
  Could the authors use relatively simple analysis, such as comparing spike rate before and after crossing, or before and after initiation, to quantify the response properties of each neuron? This could also help validate the classification analysis performed in Fig. 4.
  
  As mentioned above, we have added a new supplemental figure to directly address this question (Fig. S4).
  
  Reviewer #2 (Public review):
  
  I think the authors did a very admirable job revising the manuscript. It is much improved. However, I believe a formal analysis of action-state versus action-content neurons on A-->B versus B-->A crossing is still warranted. I appreciate the fact that this analysis may not be as reliable with smaller ensemble sizes, but with careful pseudo-ensemble and resampling approaches, such an analysis would go a long way towards increasing the strength of evidence.
  
  At present, we are not sure what the reviewer means as “formal analysis”. Below is our best effort in addressing this concern.
  
  Firstly, in our first revised manuscript, we implemented a generalized linear model-based classification of action-content and action-state neurons using direction specific regressors. Specifically, this analysis classified neurons as action-content or action-state based on coefficient contrasts (Δβ), with appropriate statistical testing and multiple comparison correction (see Methods; Fig. 7 C–E). Neurons were classified as action-content neurons if the corrected p-value for Δβ was significant and the absolute effect size exceeded a predefined threshold (|Δ β |> 0.5). Neurons were classified as action-state neurons if Δβ was not significant but both β1 and β2 were individually significant after correction. We believe our generalized linear model-based classification offers a sophisticated and formal classification of these two neurons classes.
  
  Subsequently, we performed an SVM decoder to distinguish A→B from B→A shuttles. Decoding accuracy depended on action-content neurons, as their removal drastically decreased decoding accuracy, whereas removal of non-action-content neurons had no effect, further strengthening the conclusion that these populations encode distinct information.
  
  In the updated revision, we performed an additional SVM decoding analysis while controlling for unequal neuronal population sizes between action-state and action-content neurons (Fig. S8). Specifically, we constructed pseudo-ensembles by randomly resampling neurons within each category and training SVM decoders on size-matched ensembles. Decoder performance was evaluated across repeated resamples to generate distributions of accuracy. We found that only decoders using action-content neuronal activity predicted shuttle content with high accuracy (>95%), whereas decoders trained using non-action-content neurons performed at chance levels (Fig. S8).
  
  Reviewer #3 (Public review):
  
  The only remaining comment that was not addressed pertains to anatomy and recording details. Some electrodes appear to be clearly in M2 (Fig 2A), and the tetrodes were driven each day. I would strongly suggest that this be included as a further limitation, particularly given the statement on line 178.
  
  We thank the reviewer for this feedback. In the previous revision, we added a supplemental figure showing tetrode locations for each mouse (Fig. S2) and described recording details in the Methods (Lines #481–488). We agree that this should also be noted as a limitation, and we have now added this to the Discussion (Lines #384–388).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.01.29.635442v3
www.biorxiv.org www.biorxiv.org

Cardiolipin deficiency disrupts electron transport chain and drives steatohepatitis

1
1. Public_Reviews 09 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  As the reviewers noted, the evidence we provide is the strongest on the mechanistic link between hepatic cardiolipin deficiency and electron leak from the electron transport chain. This narrative is supported by our assessment of site-specific electron leak as well as reconstitution of exogenous cardiolipin in the small unilamellar vesicles deficient with CL. On the other hand, as pointed out by the Reviewer 2, the mechanistic link between cardiolipin to MASLD/MASH is less robust. At this moment, we have not experimentally demonstrated that the MASLD/MASH induced by CLS deletion can be rescued by replacement of mitochondrial CL in vivo. Taken together, our current narrative makes an incomplete loop between CL deficiency, electron leak, and MASLD/MASH. Nevertheless, as indicated by all the reviewers, this manuscript highlights a previously undescribed role that CL potentially plays in MASH pathology, particularly with the data that human MASH coincides with reduction in liver mitochondrial CL. We focused this revision primarily on additional descriptive experiments in CLS-LKO mice that were requested by the reviewers. Even though it is not a component of the current manuscript, we have recently successfully developed mice with hepatocyte-specific CLS overexpressing mice and began performing experiments to test causality of CL deficiency to MASLD/MASH which we hope to complete in a few years. We are hopeful that the MASLD/MASH research community will still find evidence on CL contained in this manuscript plausible, and that it provides critical information to our understanding of mechanisms for MASH pathogenesis.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The manuscript by Brothwell and colleagues describes a central role for hepatic cardiolipin deficiency in MASH. The authors identify cardiolipin as a mediator of two long-standing problems in the field: how dysregulated lipid metabolism relates to altered mitochondrial metabolism during MASLD, and what the innate changes are in the steatotic liver that cause the increased respiration. The authors identified reduced liver cardiolipin in humans with MASH and in a variety of mouse models with MASH. When they knocked out hepatic cardiolipin synthesis, mice developed steatosis and inflammation. These mice also recapitulated the elevated hepatic oxidative metabolism and oxidative stress found in obese humans with MASLD. Some of the in vivo functional data related to glucose homeostasis and substrate metabolism could be stronger, and interpretation of the in vitro flux data needs some clarification, but in both cases, the data are not essential to the main conclusions of the manuscript. Overall, the study offers compelling evidence that cardiolipin is reduced in MASLD and that impaired cardiolipin synthesis is sufficient to recapitulate many features of MASLD.
  
  We thank the reviewer 1 for the positive feedback emphasizing novel and important findings in our manuscript.
  
  Strengths:
  
  The main strengths of the study are:
  
  (1) The identification of reduced cardiolipin levels in the liver of humans with MASLD and in a variety of mouse models of MASLD.
  
  (2) The finding that loss of cardiolipin synthesis recapitulates steatosis and inflammation in MASH.
  
  (3) The finding that loss of cardiolipin increases mitochondrial respiration, ROS production, and fat oxidation (in a separate hepatocyte cell line), again recapitulates several previous studies in obese humans with MASLD.
  
  (4) Evidence, though less definitive, that cardiolipin deficiency promotes electron leak by disrupting respiratory supercomplexes and preventing CoQ reduction.
  
  Weaknesses:
  
  (1) Figure 3A-D tries to make the point that liver CLS KO causes defects in substrate handling in vivo, based on glucose and pyruvate tolerance tests. The KO mice have a blunted response to a glucose tolerance test, but the pyruvate tolerance test showed very little (almost no) effect on glucose levels in either WT or LKO mice. The small blunting of the response in the LKO is impossible to interpret (if it's real), since the ability to clear glucose is also increased, and no tracers were used. It might be useful to monitor pyruvate and lactate levels during the experiment. However, this reviewer doesn't think the data is essential to prove the authors' main points.
  
  Thank you for pointing this out. We have now revised our manuscript to correctly reflect our findings on GTT and PTT. In our initial submission, we failed to clearly articulate that CLS deletion appeared to increase systemic glucose handling, which is the opposite of what one might expect in liver with steatosis. We agree that additional experiments would be helpful to better understand the systemic substrate handling in the CLS-LKO mice. As the reviewer indicates, we decided to focus this particular manuscript on intracellular and mitochondrial metabolism because of cardiolipin’s known localization to mitochondria, and the central role that this organelle plays in the pathogenesis of MASLD.
  
  (2) After presenting convincing evidence that respiration is elevated in isolated mitochondria from CLS KO liver, the authors follow up the findings by investigating whether 13C-palmitate and 13C-glucose oxidation are altered by CLS knockdown in murine Hepa1-6 cells (Figure 4).
  
  A few comments are worth mentioning about Figure 4:
  
  (a) It is not clear why the authors chose to use a hepatoma cell line rather than primary hepatocytes from LKO mice. The latter would be more convincing, since there could be important differences in metabolism between hepatoma cells and hepatocytes (e.g., preference for fatty acids vs glucose). Nevertheless, I think the approach is sufficient to test the general effect of loss of CLS on substrate metabolism.
  
  We appreciate the sentiment and agree that primary hepatocytes would have been a better model. We simply have not had prior expertise to culture primary hepatocytes and do not have the system working. We completely agree that it’s important to discuss the limitation of hepa1-6 cells as a hepatoma cells and now discuss this in our manuscript.
  
  (b) The authors use the M+2 enrichments of TCA cycle intermediates to infer rates of oxidation of [U-13C] palmitate or [U-13C] glucose. It is important to note that this kind of data reports fractional carbon sources (i.e., substrate preference) rather than rates of oxidation. For example, data from the 13C-palmitate experiment indicates that the CLS KD cells increase the fractional contribution from 13C palmitate (compared to glucose, for example) to the TCA cycle, but the actual rate of palmitate oxidation is not implicit in the data. However, it is reasonable to suggest that, in combination with the increased rates of O2 consumption observed in isolated mitochondria, this data supports increased fat oxidation.
  
  We agree with the reviewer that the nuances are important: that M+2 enrichments from [U-13C] palmitate or [U-13C] glucose reflects the fractional contributions of labeled substrates to the TCA cycle rather than oxidation. We have now revised the text to clarify that the data represent carbon incorporation patterns.
  
  (c) I have some concern that the [U-13C] glucose experiment is more complicated to interpret than the description implies. I'm not sure what happens in this cell line, but in the liver, most labeling from pyruvate (i.e., originating from glucose in this case) enters the TCA cycle via pyruvate carboxylase, with smaller amounts entering via PDH (depending on the nutritional state). Since one could expect pyruvate carboxylase to contribute M+3 labeled TCA cycle intermediates initially, and M+2 on the first turn of the cycle, it's hard to conclude what the data indicates about glucose oxidation. The authors could generalize the conclusion by framing the TCA cycle enrichment data as the contribution of glucose carbons and noting in Figure 4A that pyruvate carbons can enter the TCA cycle via PDH or pyruvate carboxylase, without attempting to assign their relative contributions. There are better ways to do it, but it's a small nuance here since the authors aren't making a critical point about the pathways.
  
  This expert comment is much appreciated. We have revised the text to more broadly describe glucose carbon entry into TCA cycle through PDH and PC. We also revised the schematic to reflect this notion.
  
  Reviewer #2 (Public review):
  
  In this study, the authors show that alterations in the lipid composition of the inner mitochondrial membrane, particularly changes in cardiolipin (CL) content, lead to defects in electron transport, supercomplex formation, and oxidative stress. Using liver-specific CLS knockout mice, which are characterized by dysfunctional capacity for cardiolipin synthesis, the authors highlight an underappreciated role for CL in MASH pathology. Overall, this is an interesting study highlighting the importance of functional/physiological electron transport (and in this context, electron leakage) in MASH pathophysiology. Despite that, this manuscript has several weaknesses that require attention.
  
  We thank the reviewer 2 for the constructive criticisms and identifying areas of weakness were additional data or explanations can improve the manuscript.
  
  (1) For all LKO studies, it is stated that the decrease in hepatic CL is causal for the observed phenotype. However, it is evident that many other lipids are impacted by CLS KO, including a marked increase in hepatic PG. In this respect, the authors show no evidence that the observed metabolic phenotype is indeed due to the reduction in CL and not to other accompanying changes.
  
  Thanks for this comment. We agree that because deletion of CLS promotes changes in mitochondrial lipids other than CL, we cannot conclusively attribute phenotypes we observed to CL and not to other lipids such as PG. In our experience, rescuing mitochondrial phospholipids by exogenous supplementation is problematic as they most certainly are not exclusively destined to the tissue of interest, nor to the organelle of interest, and often metabolized to produce other lipids, etc, making it difficult to interpret the data. We now have mice that conditionally overexpress CLS, which could be used to address this question, but the study is in its early phase and are outside the scope of the current study.
  
  The one experiment we performed is the ex vivo CL supplementation by SUV fusion to mitochondria, which has an ability to rescue electron leak. While they do not demonstrate the role of CL in all phenotypes found in the CLS-LKO mice, we think that bioenergetic phenotype associated with CLS deletion is therefore likely due to the reduction in CL. We now provide these additional discussions in lines.
  
  (2) In the results, the authors highlight that 'MASLD has been shown to alter the total cellular lipidome in liver.' Given that this study focused on CL, it would be useful to include specific studies that pointed to changes in hepatic CL content in MASLD/MASH/fibrosis.
  
  We now provide citations for these studies (PMID: 30042157, PMID: 34257827).
  
  (3) The initial human mitochondrial lipidomics studies show a reduction in mitochondrial CL and PG content. What was the content/expression of CL synthase and PGP synthase in these samples? If this cannot be assessed, is there any association of CLS or PGPS expression and MASLD/fibrosis (etc) in publicly available databases (e.g, GEP liver) that may explain the reduction in mitochondrial PG and CL content?
  
  Thanks for this suggestion. Quantification of mitochondrial lipidome require a good amount of tissue, and we do not have sufficient biomaterials left to quantify gene expression. Upon our survey of publicly available database (including GepLiver), we did not find that human MASLD was associated with an increase in CLS or other enzymes of CL biosynthesis compared to healthy controls.
  
  (4) The validation of MASH in patients (Figure 1B) is not convincing (ie., no quantification/scoring provided). NAS /fibrosis scoring (according to Kleiner) would help to define if all patients have indeed MASH, and what subset has fibrosis. Could the reduction in CL/PG content be (also) associated with fibrosis? In addition, Masson's Trichrome should be added to Figure 1B.
  
  The diagnosis was based on obvious bridging fibrosis and/or regenerative nodules on H&E staining (see additional zoomed-out images in Figure 1 – figure supplement 1). Due to the severity of these cases, formal NAS scoring was not applied. We do not have the Trichrome staining available but all MASH samples had fibrosis. Thus, it is possible that reduced CL/PG is related to fibrosis. We now added more descriptions on this point.
  
  (5) In human lipidomics, the authors suggest that reductions are observed in tetralinoleoyl CL (Figure 1C). However, Figure 1C only shows the combined FA acyl chain length + unsaturation, therefore not allowing for FA-specific ID (unless such data are available from the LC/MS analysis).
  
  Thanks for pointing this out. Per lipidomic nomenclature guideline we assign combined FA acyl chain length + unsaturation when MS2 is not performed. We have validated that our 72:8 peak corresponds to TLCL, but we do not perform MS2 on every lipid species for every sample. We now clarify this point in our manuscripts.
  
  (6) Figures 1 J/K/I. It is obvious that the background in all murine immunoblotting analysis has been altered. The authors should provide unaltered images for these immunoblots.
  
  We apologizes with the confusion. In Figure 1J/K/L/M, each panel actually represents two western blots (not one, similar to Figure 3H). The above represents a western blot with OXPHOS antibody cocktail (CV, CIII, CIV, CII, and CI), while the bottom represents the second western blot with citrate synthase (CS). Thus, we had not manipulated parts of the western blot to look different. To clarify, we now place an outline in each of the western blot to clearly demarcate individual blots to avoid confusion (new Figure 1J-M).
  
  (7) For Figure 1, it is unclear what is meant by 'we performed all mitochondrial lipidomic analyses by quantifying lipids per mg of mitochondrial proteins'. Was the murine lipidomics carried out on fractionated mitochondria or whole liver? If whole liver, then how were the data corrected, particularly given that PG is not a mitochondria-specific lipid?
  
  The data are all from lipidomic analyses performed in isolated mitochondria.
  
  (8) While total CL content seems indeed decreased across the different mouse models, this is mostly due to 1-2 CL species showing a pronounced reduction, with the remainder being unaltered. This should at least be acknowledged in the results. This is similarly the case in the LKO livers.
  
  Thanks for pointing this out. We now provide additional clarification in the text.
  
  (9) Figure 2. A secondary biochemical analysis of changes in lipid content should be provided, e.g., total triglyceride content, particularly given that the histology analysis does not show any major changes in hepatic lipid droplets/steatosis. In addition, the Masson's Trichrome staining shows almost no collagen deposition.
  
  We now provide a quantification of triglycerides in Figure 2J.
  
  (10) Figure 3. 'CLS deletion modestly reduced glucose handling' should be reworded. The LKO mice show improved glucose tolerance (despite the MASH phenotype), which is not evident from the above wording.
  
  We modified our text accordingly.
  
  (11) Looking at the mechanism behind the increase in hepatic steatosis, the authors state that lipid accumulation can occur due to increased lipogenesis, or dysfunctional VLDL secretion or beta oxidation, and subsequently assessed the relevant proteins/pathways. What about fatty acid uptake, which is also one of the four major pathways impacted in MASLD? This should be included in this assessment in Figure 3.
  
  Thank you for this comment. We now provide data for genes involved in fatty acid uptake, which was not reduced with CLS deletion (Figure 3E).
  
  (12) For Figure 5A, it is simply stated 'CLS deletion promotes liver fibrosis in standard chow-fed condition', and it is unclear what is highlighted within the selected EM images and what the arrows refer to. The authors should clarify this within the text.
  
  We have modified the text accordingly.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Mitochondrial oxphos causes lipid accumulation, leading to MASH, although the mechanism has been poorly understood. In this study, Funai and colleagues identify that reductions in cardiolipin in the mitochondria cause disruptions in the electron transport chain. Knockout of cardiolipin synthase was sufficient to drive MASH phenotypes, increase respiratory capacity, and cause electron leak at complexes II and III. It is well established that loss of cardiolipin increases ROS. Studies to date have been performed on whole tissue lysates, but to rule out which changes in mitochondrial lipids are driven by changes in mitochondrial number versus lipid synthesis/turnover, the authors uniquely purified mitochondria from human and mouse livers in MASH and NASH models for this study. This study provides critical information to the field that will inevitably help us better understand the mechanisms underlying MASH and NASH onset. The evidence provided is both convincing and compelling. With further suggested revision experiments, this study has the potential to change our understanding of MASH and NASH pathogenesis.
  
  We would like to thank the reviewer 3 for the highly-encouraging feedback.
  
  Strengths:
  
  The authors use a unique approach of lipidomics on purified mitochondria. They also analyze many distinct MASH models and provide a unique resource for the field of comprehensive lipidomics analysis of the different ways in which MASH can be induced. The use of human tissue elevates the impact/significance of the findings.
  
  Weaknesses:
  
  The data on the super complexes was the least compelling, and frankly, I do not think the authors needed those data to make a compelling argument! The authors should shift their focus more to the compelling electron leak data they have collected. If possible, it would also strengthen the work to include cardiolipin rescues on more of the experiments. Finally, expanding their explanations of the model systems would be very helpful for the readership.
  
  Thank you for this comment. We have now revised our argument to highlight the electron leak data and less emphasis on the supercomplexes.
  
  Reviewer #4 (Public review):
  
  Summary:
  
  Here, the authors wish to shed light on factors that contribute to the development of liver disease in what used to be called 'the metabolic syndrome'. This is a human-health problem of considerable significance, and the insights they provide, namely the implication of a defect in mitochondrial cardiolipin (CL) content to the progression from metabolic dysfunction associated steatotic liver disease to steatohepatitis, are plausible.
  
  We would like to thank the reviewer 4 in an encouraging feedback.
  
  Strengths:
  
  The experimental evidence proffered is derived from the observation of lower levels of (CL) in mitochondria from the liver of patients undergoing liver transplant or resection due to endstage steatohepatitis compared with mitochondria derived from livers of patients with other conditions. This correlation is buttressed by observations made in mice with liver-selective compromise in CL synthesis and which suggest a pathological environment associated with mitochondrial dysfunction and enhanced oxidative stress, features deemed to play a role in the progression from steatotic liver disease to steatohepatitis.
  
  The paper is well written, and the findings are well explained and superficially convincing.
  
  Weaknesses:
  
  It is unclear how much can be learned from compromising a key enzyme that produces a key mitochondrial lipid in a busy metabolic organ like the liver - isn't the discovery of a mitochondrial defect in such a context rather trivial? And how reliably can these findings be related to the human observations? Most importantly, the chain of causality implied by the title is unproven: the key question of whether or not (somehow) preventing the drop in cardiolipin content affects the course of steatohepatitis remains unanswered.
  
  We agree with the reviewer that the current manuscript does not directly provide evidence that reduction in CL causes MASLD in humans, which as the reviewer describes, must be tested by rescuing CL content in the context of MASLD. We have now obtained mice with conditional overexpressor and have begun the experiments, but findings from these mice are beyond the scope of the current study. We have modified our title to “Cardiolipin deficiency disrupts electron transport chain AND drives steatohepatitis” to reduce the implication for causality.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  The manuscript states that loss of mitochondrial respiration is expected in MASLD. Forexample, line 187 "MASLD is known to be associated with reduced mitochondrial oxidative capacity". A more accurate statement is that "MASH" is known to be associated with reduced mitochondrial oxidative capacity and increased ROS production in humans. As you correctly cite later for an ex vivo human mitochondrial respiration study, early MALSD, especially with obesity, is associated with elevated mitochondrial respiration (40). Since those measurements are maximal respiration rates, which might not reflect actual in vivo flux, you might also make readers aware that your data is consistent with in vivo human studies that found increased hepatic oxidative flux (TCA cycle flux) in obese subjects with moderate steatosis (PMID: 22152305), which appears to wane with severe steatosis and/or inflammation (PMID: 31012869, PMID: 40272888).
  
  Thank you for these suggestions. We have made the suggested changes to the text.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Throughout the manuscript, the authors refer to the inner mitochondrial membrane, although they never perform assays to distinguish the inner vs outer mitochondrial membrane. It would be better to just refer to the cardiolipin being measured as "mitochondrial."
  
  Thank you. We made these changes.
  
  (2) In figures showing changes in cardiolipins, not all of them change; only a handful of them are reduced in NASH. Could the authors add commentary in the manuscript about what is known about these different cardiolipin species, and speculate as to why certain CLs are changing while others are not?
  
  Thank you. Reviewer #2 had similar comments and we provided additional discussions.
  
  (3) In the human tissues, what do the other mitochondrial inner membrane lipids (PC, PE, PI, PS, LPC, LPE) look like in the healthy vs NASH patients (Figure 1A-D)?
  
  Thank you for this request. We did not include these data in the manuscript as we have a separate ongoing study (the second author is the lead author on this paper) where we are following up on hepatic mitochondrial PS and PE, which we found to be decreased in human MASH samples compared to healthy livers. This turned out to be a convoluted story so we decided not to include it in the paper.
  
  (4) The descriptions of the different MASLD/MASH models are a little sparse. Especially needing more detail is the model for carbon tetrachloride injection, causing NASH. The authors should explain how each of these models typically induces MASLD/MASH.
  
  We now provide these details.
  
  (5) In figures 2E and F, total body mass is unchanged in CLS-LKO mice, but liver mass is decreased; yet on the chow diet, there appears to be lipid accumulation in the liver as well; I am wondering what the authors' reasoning is for this decreased liver mass.
  
  It is difficult to say conclusively, but we suspect it is due to cell death evidenced by fibrosis. It’s important to note that while there is lipid accumulation in the liver, steatosis is relatively mild and the increase in liver triglyceride is quite marginal (Figure 2J).
  
  (6) The lipidomics analysis and comparison of livers in these different models is a wonderful dataset that needs far more depth in terms of unpacking and describing the findings. For example, all the models of MASH show similar changes in most of the lipid species analyzed. NASH appears to be quite different than MASH. This, among other trends, is certainly worth highlighting as it will be of interest to the field.
  
  Thanks for this comment. We agree that while CL phenotype were common to mouse and human MASH samples, there were other changes that we observed in other lipids that may be biologically significant. As described above, we have an ongoing study pursuing mitochondrial PS in the liver.
  
  (7) Figure 2B - It is interesting that the CLS KO only impacts certain CLs. The 72:8 CL, which is regulated by CLS, is also a CL that appears to change in the human patient samples. The information on the specific CL that is changing seems critical to the mechanism of the role of the CL in the disease. Throughout the manuscript, it is important to specify which specific CL is being referred to, instead of broadly characterizing the changes to cardiolipins, especially since most of the cardiolipins shown do not change; only a handful of them do.
  
  Thank you for this suggestion. We have included additional discussions on 72:8 CL in the manuscript.
  
  (8) One potential non-specific mechanism whereby CLS knockout can cause MASH would be if the mice change their overall food consumption. It is an important control to test if the total food intake is different in WT vs KO mice to formally rule out this possibility.
  
  The food intake was not different between the group (Figure 2E).
  
  (9) To determine the extent to which de novo cardiolipin synthesis underlies the change in MASH/fatty liver observed in the HFD, GAN, and CCl4 models in Figure 1, the authors should also put the CLS KO mice on these diets and perform liver histology, analysis of inflammation markers, and analyze immune cell infiltration. Alternatively, the authors could try to rescue the CLS KO model by supplementing cardiolipin in the diet or by injection.
  
  Thank you. We have an ongoing experiment to examine the effect of hepatocyte-specific CLS overexpression on protection from GAN-induced MASLD.
  
  (10) Figure 3F shows a decrease in UQCRC2 by RNA but no change at the protein level in Figure 3H. The authors should comment a bit more on this disparity, and the data in Figure 3F don't mean much for the main point of the study if the levels of the proteins are unchanged.
  
  The reviewer is correct. We initially performed RNAseq in trying to broadly capture how CLS knockout influences liver health, which implicated that transcriptional program for mitochondrial proteins were downregulated. Nevertheless, gold standard measurements of mitochondrial content (mitochondrial protein or mtDNA) did not show change in the abundance with CLS deletion.
  
  (11) The increase in respiration and spare respiratory capacity upon CLS KO shown in Figure 3J is extremely interesting! The explanation of the experiment and its meaning should be significantly expanded upon.
  
  Thank you. We included additional discussion on this point.
  
  (12) Figure 4 - It is interesting that the fraction of the TCA cycle metabolites labeled is increasing with the palmitate tracer and decreasing with the glucose tracer. This implies a "fuel switch," such that more of the TCA cycle carbons originate from fatty acids than glucose upon loss of CLS. The authors should make note of this point. Also, to understand if the total molar quantity of labeling in the TCA cycle from palmitate and glucose is changing, the authors should also report the relative abundance (instead of just the fraction labeled) of the labeled metabolites and unlabeled metabolites.
  
  Thanks for this suggestion, we have now added this discussion.
  
  (13) In Figure 5C-F, the authors show that CLS deletion can activate the caspase pathway, but do not see any change in cytochrome c localization. Can the authors clarify if CLS deletion is sufficient to induce apoptosis?
  
  CLS deletion certainly causes cell death that induces tissue fibrosis. Activation of the caspase pathway suggests that the cell death may be due to apoptosis but we did not see changes in cytochrome c localization. Our lab is currently performing additional experience to test the possibility that CLS deletion may induce ferroptosis.
  
  (14) Figure 6A-C- The authors discuss the I + III2 + IV supercomplex substantially and consistently decreasing in the CLS-KO mice, however, the quantifications do not look statistically significant. Can the authors confirm if these changes are or are not significant and adjust the text accordingly?
  
  The reviewer is correct. Abundances of I+III2+IV supercomplexes are decreased in CLS-LKO mice compared to control mice when quantifying with supercomplex antibody cocktail or with UQCRSF1 (complex III subunit) antibody, but not with complex I antibodies. The discrepancy for these results are not entirely clear but it’s likely a combination of antibody sensitivity and a tricky nature to dissolve high molecular weight protein complexes.
  
  (15) The most compelling data to indicate electron leakage increasing upon CLS knockout is in Figures 7A-E. I would suggest the authors decrease their emphasis on the rearrangement of the supercomplexes and focus their discussion on the very compelling results of Figure 7.
  
  Thanks for this suggestion. We have modified our text.
  
  (16) Figure 7D shows that a major site of electron leak is from site II, and these results also fit with the profound succinate-induced respiration observed in earlier experiments. It would be nice if the authors could test the ability of cardiolipin to rescue these phenotypes, similar to the assay in Figure 5I. Assessing this rescue on the CoQ redox state would also strengthen the claims.
  
  Thank you for this comment. We are encouraged with your suggestions. We have thought about this quite extensively during the preparation of the manuscript but we refrained from making conclusive statements regarding complex II because the magnitude of the increase in electron leak is equally elevated at complex II and III. It’s true that CLS deletion increases succinate-induced respiration, but this might also be because succinate elicits the highest increase in respiration even in wildtype mice (see values in Figure 3K and L compared to other substrates). It would be intriguing to examine the influence of CLS deletion on complex II/III electron leak as well as succinate-induced respiration in tissues where succinate is not a preferred substrate. We have attempted cardiolipin rescue in SUV but unfortunately, we could not get this assay to work for site-specific electron leak measurements.
  
  (17) In Figure 7G-H, it would be nice to see a ratio of oxidized to reduced CoQ, in the CLS deletion mice and in human NASH livers, if samples are available.
  
  Thanks for this suggestion. Data shown (Figure 7- figure supplement 1P-S).
  
  (18) CoQH2 can also deliver electrons to complex II (via its reversal). Complex II shows a remarkable contribution to the electron leak phenotype (Figure 7D). Also, as the complex II monomer showed much larger changes in the native gels of Figure 6 than the complexes involving complex III. A more likely model is that oxidized CoQ accumulates in the CLS knockout model because of increased CoQH2 leak via complex II.
  
  Perhaps. We also thought about this but we are not sure if this fits with the observation that CLS deletion increases succinate-induced respiration, which suggests increased succinate to fumarate conversion, a notion that I am not sure can be congruent with increase CoQH2 reversal to complex II. Overall, I think we lack the tools or evidence to conclusively implicate whether CLS deletion primarily acts on complex II or III. Nevertheless, we appreciate the reviewer’s enthusiasm on these topics as we perform additional experiments on the mechanism of interactions between CL and the ETC.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.10.617517v4
www.biorxiv.org www.biorxiv.org

Methylation Clocks Do Not Predict Age or Alzheimer's Disease Risk Across Genetically Admixed Individuals

1
1. Public_Reviews 09 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Cruz-Gonz´alez and colleagues draw on DNA methylation and paired genetic data from 621 participants (n=308 controls; n=313 participants with Alzheimer’s Disease). The authors generate a panel of epigenetic biomarkers of aging with a primary focus on the Horvath multi-tissue clock. The authors find weaker correlations between predicted epigenetic age and chronological age in subgroups with higher African ancestry than within a subgroup identified as White. The authors then examine genetic variation as a potential source for between-group differences in epigenetic clock performance. The authors draw on a large collection of publicly available methylation quantitative trait loci datasets and find evidence for substantial overlap between clock CpGs located within the Horvath clock and methQTLs. Going further, the authors show that methQTLs that overlap with Horvath clock CpGs show greater allelic variation in African ancestral groups pointing to a potential explanation for poorer clock performance within this group.
  
  Thank you for this summary.
  
  Strengths:
  
  This is an interesting dataset and an important research question. The authors cite issues of portability regarding polygenic risk scores as a motivation to examine between-group differences in the performance of a panel of epigenetic clocks. The authors benefit from a diverse cohort of individuals with paired genetic data and focus on a clinical phenotype, Alzheimer’s disease, of clear relevance for studies evaluating age-related biomarkers.
  
  Thank you.
  
  Weaknesses:
  
  While the authors tackle an important question using a diverse cohort the current manuscript is lacking some detail that may diminish the potential impact of this paper. For example:
  
  (1) Information on chronological ages across groups should be reported to ensure there are no systematic differences in ages or age ranges between groups (see point below).
  
  Thank you for pointing out this omission. The distributions are now presented in Supplementary Figure 1. While there is some variation in median age, the age ranges are similar across cohorts (median 73.1 to 79.3). The small differences do not explain the differences in accuracy between the cohorts, e.g., the median age of the African Americans (76.4) is lower than the median age for the White cohort (77.7).
  
  (2) The authors compare correlations between chronological age and epigenetic age in sub-groups within to correlations reported by Horvath (2013). Attempting to draw comparisons between these two datasets is problematic. The current study has a much smaller N (particularly for sub-group analyses) and has a more restricted age range (60-90yrs versus 0-100 yrs). Thus, is an alternative explanation simply that any weaker correlations observed in this study are driven by sample size and a restricted age range? Reporting the chronological ages (and ranges) across subgroups in the current study would help in this regard. Similarly, given the lack of association between AD status and epigenetic age (and very small effect in the white group), it may be of interest to examine the correlation between chronological age and epigenetic age in each group including the AD participants: would the between-group differences in correlations between chronological age and epigenetic be altered by increasing the sample size?
  
  Our conclusions about the reduced accuracy of the clock in admixed individuals are based on the comparison within the MAGENTA cohorts, not a comparison of MAGENTA to previously published studies. We find significantly reduced accuracy in the admixed cohorts compared to the White MAGENTA cohort. Further supporting this conclusion beyond he MAGENTA cohort, we analyzed three independent whole blood methylation datasets. Two focused on African American individuals—the Grady Trauma Project (n = 422) and the GENOA study (n = 1,394)—and one focused on White Swedish individuals (n = 729). As observed in MAGENTA, the Horvath clock had significantly lower accuracy for the African American cohorts (Figure 3 than for the White Swedish cohort.
  
  When comparing results across studies, the reviewer is correct that lower correlations are generally seen for older cohorts. Indeed, other studies applying the Horvath clock have seen similar correlations in older cohorts to those observed in MAGENTA (Marioni et al., 2015, Horvath 2013, and Shireby et al., 2020). We now also include the chronological age distributions of the cohorts in this study, along with their mean and standard deviations (Supplementary Figure 1). This shows that the distribution of chronological ages for White individuals is similar to the cohorts where the clocks did not perform as well. Finally, as suggested, we correlated chronological and epigenetic age with the inclusion of AD cases in each cohort for the Horvath clock. The significantly lower performance of the clock on Puerto Ricans and African Americans, relative to White individuals, remains even after including all individuals in each cohort. Thus, combining cases and controls did not qualitatively change the performance relationships for the African Americans and Puerto Ricans relative to the Whites (Supplementary Figure 3).
  
  (3) The correlation between chronological age and epigenetic age, while helpful is not the most informative estimate of accuracy. Median absolute error (and an analysis of MAE across subgroups) would be a helpful addition.
  
  We used correlation because it is commonly used to evaluate the performance of epigenetic age clocks, but we agree that other error quantification metrics provide a complementary perspective. We now include MAE and MSE comparisons across sub-groups in the revision (Supplementary Table 1). We find that across all accuracy metrics, the African American and Puerto Rican cohorts perform worse than the White and Peruvian cohorts. Interestingly, the Cubans show relatively high error despite a high correlation between predicted and chronological age. However, there are only 21 non-demented Cuban controls. In addition, we evaluated the same metrics in three replicate datasets (two African American cohorts and one for White Swedish individuals) and found the same patterns of lower accuracy across metrics in African ancestry individuals, albeit with some variation in accuracy between cohorts (Supplementary Table 2). Notably, as discussed above, this is not driven by differences in chronological age distributions: when we subset to older individuals (≥ 55 years old) in order to facilitate comparisons to MAGENTA study individuals, the median age for the White Swedish individuals (70 years old) is higher than that of the GENOA (62.7 years old) and Grady (58 years old) individuals. Despite the difference in median ages, the clock performs better on White Swedish individuals across all accuracy metrics than the African ancestry cohorts with younger individuals.
  
  (4) More information should be provided about how DNAm data were generated. Were samples from each ancestral group randomized across plates/slides to ensure ancestry and batch are not associated? How were batch effects considered? Given the relatively small sample sizes, it would be important to consider the impact of technical variation on measures of epigenetic age used in the current study. The use of principal Component-based versions of these clocks (Higgins Chen et al., 2023; Nature Aging https://doi.org/10.1038/s43587-022-00248-2) may help address concerns such concerns.
  
  Thank you for pointing out the need for additional context on data generation. We have added details to the Methods. All omics data from the MAGENTA study were generated using standard protocols that ensure minimal technical artifacts and batch effects. Samples were randomized across plates and chips to ensure that ancestry, age, and sex were not confounded with each batch. We also performed a principal components analysis of the normalized methylation data used as inputs for all MAGENTA analyses. We found that the samples did not stratify by sample plate, cohort, ethnicity, or ascertainment center along the principal components (Supplementary Figure 2).
  
  We also thank the reviewer for their suggestion to apply the principal component clock to account for potential technical variation. As outlined in the new section “Principal component versions of the methylation clocks also have lower age prediction accuracy for genetically admixed individuals,” using the principal component version of the Horvath clock did not result in consistent improvement in age prediction accuracy or generalization across MAGENTA cohorts (Supplementary Figures 4 and 5). The lower accuracy for age prediction in individuals with substantial African ancestry was present for the PC clock in the replication cohorts, just as in the MAGENTA cohorts (Supplementary Figure 6).
  
  (5) Marioni et al., (2015) found a very weak cross-sectional association between DNAm Age and cognitive function (r∼0.07) in a cohort of >900 participants. Given these effect sizes, I would not interpret the absence of an effect in the current study to reflect issues of portability of epigenetic biomarkers.
  
  We agree that previous links between DNAm Age and AD or cognitive function have been relatively small in magnitude. For example, the PhenoAge paper (Levine et al., 2018) and a study using the Horvath clock (Levine et al., 2015) found age acceleration of less than a year in AD patients relative to non-demented individuals. Similar results have also been observed in studies with smaller sample sizes (e.g., 700 for Levine et al. 2015 and 604 for Levine et al. 2018). Given these small effect sizes, we agree that accounting for statistical power is essential for interpretation of our results. We performed power calculations based on an effect of the size observed in previous studies (0.5 year acceleration). We have 86% power in the full MAGENTA data set to detect an effect of this size. Stratifying by cohorts, we have 75% power for the African Americans, 72% for the Puerto Ricans, 72% for the Whites, 65% for the Peruvians, and 47% for the Cubans. Thus, we believe we have high enough power that the consistent lack of association outside of the White cohort in MAGENTA is likely meaningful. Based on these calculations, there is only a 1% chance that we would not observe an effect in any of the other cohorts if the effect was present across cohorts. Nonetheless, we have added caveats about power and the small sample size to our suggestion that the reduced accuracy of the clocks contributes to the lack of AD association outside of Whites.
  
  (6) The methQTL analyses presented are suggestive of potential genetic influence on DNAm at some Horvath CpGs. Do authors see differences in DNAm across ancestral groups at these potentially affected CpGs? This seems to be a missing piece together (e.g., estimating the likely impact of methQTL on clock CpG DNAm).
  
  We agree. Thank you for this suggestion. We have added Figure 6 in the main text to address this gap. In short, we analyzed additional whole blood methylation data from inidividuals with African ancestry and found that a substantial proportion of the CpGs in methylation clocks are differentially methylated in African ancestry individuals relative to European ancestry individuals. In the case of the Horvath clock, we find that 84/353 (23.8%) of the clock CpGs are differentially methylated between ancestries. In parallel, we found that 56 of these differentially methylated clock CpGs are also affected by meQTL, many of which are at different frequencies between populations. We also investigated whether the meQTL-affected clock CpGs are associated with increased clock error in the MAGENTA individuals. We found 56 clock CpGs whose methylation levels associated with increased clock error, and 42 of these have at least one meQTL. Thus, while meQTL are not the only factor to affect the portability of methylation clocks across global populations, we suggest that they are a significant contributor, especially in the case of the Horvath clock.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This paper seeks to characterize the portability of methylation clocks across groups. Methylation clocks are trained to predict biological aging from DNA methylation but have largely been developed in datasets of individuals with primarily European ancestries. Given that genetic variation can influence DNA methylation, the authors hypothesize that methylation clocks might have reduced accuracy in non-European ancestries.
  
  Strengths:
  
  The authors evaluate five methylation clocks in 621 individuals from the MAGENTA study. This includes approximately 280 individuals sampled in Puerto Rico, Cuba, and Peru, as well as approximately 200 self-identified African American individuals sampled in the US. To understand how methylation clock accuracy varies with proportion of non-European ancestry, the authors inferred local ancestry for the Puerto Rican, Cuban, Peruvian, and African American cohorts. Overall, this paper presents solid evidence that methylation clocks have reduced accuracy in individuals with non-European ancestries, relative to individuals with primarily European ancestries. This should be of great interest to those researchers who seek to use methylation clocks as predictors of age-related, late-onset diseases and other health outcomes.
  
  Thank you for this summary.
  
  Weaknesses:
  
  One clear strength of this paper is the ability to do more sophisticated analyses using the local ancestry calls for the MAGENTA study. It would be valuable to capitalize on this strength and assess portability across the genetic ancestry spectrum, as was recently advocated by Ding et al. in Nature (2023). For example, the authors could regress non-European local ancestry fraction on measures of prediction accuracy. This could paint a clearer picture of the relationship between genetic ancestry and clock accuracy, compared to looking at overall correlations within each cohort.
  
  Thank you for this suggestion. To model portability across genetic ancestry as a spectrum, we regressed the Horvath clock error on the proportions of African ancestry in the genomes of the MAGENTA individuals, adjusting for chronological age. The proportion of African ancestry is significantly associated with increased Horvath clock error (p = 0.039), with the clock making less accurate age predictions by 1.46 years for individuals with full African ancestry compared to no African ancestry. We have added this new analysis to the Results.
  
  The authors present two possible reasons that methylation clocks might have reduced accuracy in individuals with non-European ancestries: genetic variants disrupting methylation sites (i.e., ”disruptive variants”) and genetic variants influencing methylation sites (i.e., meQTLs). The authors conclude disruptive variants do not contribute to poor methylation clock portability, but the evidence in support of this conclusion is incomplete. The site frequency spectrum of disruptive variants in Figure 4 is estimated from all gnomAD individuals, and gnomAD is comprised of primarily European individuals. Thus, the observation that disruptive variants are generally rare in gnomAD does not rule them out as a source of poor clock portability in admixed individuals with non-European ancestries.
  
  In the revision, we now additionally report ancestry-specific allele frequencies to demonstrate the rarity of CpGclock disrupting variants (Supplementary Figure 9). The global allele frequencies were so low that even if they all occurred in individuals of non-European ancestries, they would still be extremely rare.
  
  It is also unclear to what extent meQTLs impact methylation clock portability. The authors find that the frequency of meQTLs is higher in African ancestry populations, but this could reflect the fact that some of the analyzed meQTLs were ascertained in African Americans. The number of meQTL-affected methylation sites also varies widely between clocks, ranging from 6 to 271; thus, meQTLs likely impact the portability of different clocks in different ways. Overall, the paper would benefit from a more quantitative assessment of the extent to which meQTLs influence clock portability.
  
  We agree that the meQTL likely influence the clocks in different ways and that the ascertainment of the meQTLs in different populations makes direct comparisons challenging. To more directly link meQTL to clock performance, we identified 56 Horvath clock CpG sites whose methylation levels significantly associate with increased clock error in the MAGENTA study individuals. Of these, 42 (75%) are affected by an meQTL, including nine that are affected by an African ancestry-differentiated meQTL. As such, meQTL, and specifically meQTL that were likely not present in the training data of the Horvath clock, associated with both the methylation of CpG sites and clock error. However, as the reviewer suggests, determining causality among these factors is challenging. Given our incomplete knowledge of meQTL in different ancestries, we have added caveats to our conclusions about the effect of meQTL on clock portability.
  
  The paper implies that methylation clocks have an inferior ability to predict AD risk in admixed populations relative to white individuals, but the difference between white AD patients and controls is not significant when correcting for multiple testing. This nuance should be made more explicit.
  
  We agree that the signal is not strong in the white cohort; however, it is similar in magnitude to previous studies. As outlined in response to Reviewer 1’s Point 5, we have now added power calculations that indicate reasonable power (≥72%) to detect small effect sizes (0.5 year increase) in the white, Puerto Rican and African American cohorts. We now interpret the AD association tests in the context of these power calculations and multiple testing correction.
  
  Finally, this paper overlooks the possibility that environmental exposures co-vary with genetic ancestry and play a role in decreasing the accuracy of methylation clocks in genetically admixed individuals. Quantifying the impact of environmental factors is almost certainly outside of the scope of this paper. However, it is worth acknowledging the role of environmental factors to provide the field with a more comprehensive overview of factors influencing methylation clock portability. It is also essential to avoid the assumption that correlations with genetic ancestry necessarily arise from genetic causes.
  
  We entirely agree and have now clarified the scope of our analyses and importance of environmental factors in the revision. We intersected clock CpGs with enviromental-factor-associated CpGs from multiple epigenome-wide association studies (EWAS) and found overlaps that suggest an environemtnal contribution to differences in clock CpG methylation. However, given the lack of environmental data on the MAGENTA study individuals, as well as the lack of datasets for replication, we cannnot directly compare the environmental and genetic contributions to clock accuracy. Nevertheless, the new analyses in the revision highlight the contribution of both genetic and environmental factors to lack of portability for certain methylation clocks.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Line 64: An association between methylation patterns and genetic ancestry does not presuppose that meQTLs vary in frequency between genetic ancestries; environmental factors could also play a role. It would be nice to comment on this further in the Introduction.
  
  We agree that environmental factors likely play a role in the decrease in methylation clock performance in admixed populations. We have added text highlighting this in the revised Discussion. Regarding meQTL, we agree that associations between methylation patterns and genetic ancestry do not necessarily imply that meQTL will vary in frequency between genetic ancestries. However, our new analyses in the revision find African-ancestry differentiated meQTL that associate with Horvath clock CpG methylation levels and overall clock error (Figure 6E-F and Supplementary Figure 13).
  
  (2) Line 116 implies Puerto Ricans have “substantial amounts of African ancestry” but the median ancestry is 15% (which is not much more than the Peruvian and Cuban cohorts).
  
  Thank you for pointing this out. We have clarified this statement in the text. While the median proportion of African ancestry in Puerto Ricans is 15% (vs. 6% and 2% for the Peruvian and Cuban individuals in MAGENTA), there are many individuals with substantially higher African ancestry. The upper quartile is >25% and several Puerto Ricans have >50% African ancestry.
  
  (3) In Figure 2B, Puerto Ricans have worse accuracy than Peruvians but a higher proportion of inferred CEU ancestry, which is interesting and defies intuition - is there any hypothesis for why this might be the case?
  
  In light of our new meQTL analyses, we hypothesize that the African ancestry differentiated meQTL that affect Horvath clock CpGs drive the increase in clock error for these individuals, despite having more European ancestry across their genome. Given that the Peruvians (and Cubans, for that matter) hold very little African ancestry, and also very few of the African-differentiated meQTL, this could explain some of the large difference in clock errors for the cohorts.
  
  (4) Figure 2C would be improved with confidence intervals.
  
  We thank the reviewer for this suggestion and have added confidence intervals for Figure 2C.
  
  (5) It’s interesting that the correlation with Cubans is positive in Figure 3B (for one clock, significantly so). Is there any rationale for this?
  
  We noticed this as well, but have not been able to come to a definitive conclusion. It is possible that environmental factors contribute. However, the Cuban cohort is the smallest in MAGENTA (22 cases and 21 controls) and the none of the differences are statistically significant, so more investigation in a large cohort is required.
  
  (6) Line 231: Which population(s) is allele frequency estimated in?
  
  This is the global frequency reported in gnomAD, which is calculated across all populations in gnomAD v3.0. As noted above, we now also report allele frequencies by gnomAD population (Supplementary Figure 9).
  
  (7) Were the meQTLs pruned? How many independent variants are there per methylation site? It would be nice to see a distribution for the sites in the Horvath clock.
  
  We now report the distribution of meQTL across clock CpG sites. The mean number of variants is 108; the median is 36; and the maximum is 1,699. We have now included a plot of the distribution for all 271 (out of 353) Horvath clock CpG sites (Supplementary Figure 14). We did not perform any pruning in these initial results for several reasons. First, we sought to demonstrate the great potential for meQTL to influence these CpGs and to compare the distributions of these common meQTL across populations (based on gnomAD data). Second, identifying the causal variant or variants is challenging. Given that many of these meQTLs likely reflect redundant signals, for the new analyses of African-differentiated meQTL, we restrict to a single variant per clock CpG site. We focus on the variant with the greatest absolute beta, as reported by the original meQTL study from which the variant originates.
  
  (8) Figure 5C might benefit from a geom density rather than overlapping bar plots; the trends are hard to see.
  
  We appreciate the reviwer’s suggestion and have now reworked the figure and based it on just the density curves so that readers may better appreciate the differences in allele frequencies.
  
  (9) Several figures would be more legible with larger font sizes.
  
  We appreciate this recommendations and have made the font sizes for all plots larger and more legible.
  
  Reviewer #3 (Public review):
  
  This manuscript examines the accuracy of DNA methylation-based epigenetic clocks across multiple cohorts of varying genetic ancestry. The authors find that clocks were generally less accurate at predicting age in cohorts with large proportions of non-European (especially African) ancestry, compared to cohorts with high European ancestry proportions. They suggest that some of this effect might be explained by meQTLs that occur near CpG sites included in clocks, because these variants may be at higher frequencies (or at least different frequencies) in cohorts with high proportions of non-European ancestry relative to the training set. They also provide discussions of potential paths forward to alleviate bias and improve portability for future clock algorithms.
  
  The topic is timely due to the increasing popularity of DNA methylation-based clocks and the acknowledgment that many algorithms (e.g., polygenic risk scores) lack portability when applied to cohorts that substantially differ in ancestry or other characteristics from the training set. This has been discussed to some degree for DNA methylationbased clocks, but could of course use more discussion and empirical attention which the authors nicely provide using an impressive and diverse collection of data.
  
  Thank you for this summary.
  
  The manuscript is clear and well-written, however, some key background was missing (e.g., what we know already about the ancestry composition of clock training sets) and most importantly several analyses would benefit from being taken one step further. For example, the main argument of the paper is that ancestry impacts clock predictions, but this is determined by subsetting the data by recruitment cohort rather than analyzing ancestry as a continuous variable. Extending some of the analyses could really help the authors nail down their hypothesized sources of lack of portability, which is critical for making recommendations to the community and understanding the best paths forward.
  
  Thank you for this suggestion. As noted in our response to Reviewer 2’s Point 1, we have analyzed ancestry as a continuous variable and found that the proportion of African ancestry in the genomes of the MAGENTA individuals significantly associates with increased difference in chronological and predicted age, even after controlling for chronological age (1.46 years more error for 100% vs. 0% African ancestry; p = 0.039). As outlined below, we have also added details on the training of previous clocks and the important additional previous work highlighted by the Reviewer.
  
  Reviewer #3 (Recommendations for the authors):
  
  Major comments
  
  There is previous literature addressing who is in the training set for methylation clocks. To my knowledge, this work has been primarily led by Nancy Krieger. It would be a valuable addition to discuss her work (and any similar work by other investigations) in the introduction. In other words, what do we currently know about the degree of bias in the training sets for methylation-based clocks? The assumption of the introduction is that the training sets are overwhelmingly European ancestry (which I assume is true) but I think some quantitative information about this would be helpful for understanding the source and magnitude of the problem.
  
  We thank the reviewer for bringing the work of Dr. Nancy Krieger to our attention. It directly supports the rationale for this study: the sociodemographic characteristics of the individuals used to train these clocks are poorly reported, limited to outdated population descriptors (for example, the use of “Caucasians” to describe some of the individuals used to train the Horvath and the Hannum clocks) or race and ethnicity labels. Moreover, where labels are available for training individuals, they tend to underrepresent the individuals of diverse backgrounds, as in the Horvath clock. We have incorporated Dr. Krieger’s work into the Introduction, including details of how this supports the rationale and purpose of our study.
  
  Related to the above comment, there has been pretty extensive previous work on the effects of race and ethnicity on epigenetic clock estimates (e.g., https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1030-0), and that seems like it could be more explicitly weaved into the introduction and discussion.
  
  We thank the reviewer for highlighting this relevant article. We have added discussion of it into the Introduction. Several factors make direct comparison with our results challenging. First, the grouping of individuals based on race and ethnicity without consideration of genetic ancestry complicates comparisons. Race and ethnicity commonly do not match genetic ancestry components (see Gouveia et al., 2025 https://www.cell.com/ajhg/fulltext/S00029297(25)00173-9). Second, the study reports differences in epigenetic age accelerations (intrinsic and extrinsic) in individuals from various race and ethnic groups. It does not directly evaluate the accuracy of the epigenetic age predictions in these groups. Thus, it is challenging to interpret whether the differences in acceleration are driven by biological factors or biases in the performance of the clocks themselves.
  
  The main analysis that felt like it was missing was asking whether the age deviations are larger for individuals with greater proportions of African ancestry. The authors have the ability to analyze ancestry as a continuous variable, but instead performed analyses in various a priori subsets of the data; the subsets do have average differences in ancestry, but also there is heterogeneity within groups. Given that the authors calculated admixture proportions already, it seems like a missed opportunity not to use these estimates. This would also sidestep the issue of the problematic labels applied to the subsets, which mix ancestry, nationality, and race terms (note that I thought the legacy reasons why these labels are used were well-explained, but they are nevertheless problematic for biological explanations that center on ancestry/genetic information as the driver of bias).
  
  We appreciate the reviewer’s suggestion to investigate clock accuracy in the context of African ancestry proportions. As noted in the response to Reviewer 2’s Point 1, we modeled the clock error as a function of the fraction of African ancestry of each individual, adjusting for an individual’s chronological age. The proportion of African ancestry is significantly associated with increased Horvath clock error (p = 0.039), with the clock estimated to give less accurate age predictions by 1.46 years for individuals with 100% African ancestry compared to no African ancestry. We now report this in the Results.
  
  Another missed analysis opportunity occurs in lines 259-261, where the authors state “Thus, the clock with the largest decrease in performance in admixed cohorts (in terms of predicting chronological age and identifying age acceleration in AD) has the most and largest fraction of meQTLs influencing its CpGs.” This is another place where the authors make generalizations about a given cohort based on average ancestry rather than testing the claim empirically on an individual basis (e.g., by examining the number of meQTL variants a given individual is heterozygous for or has the non-European allele for).
  
  We thank the reviewer for this comment. This feedback motivated us to evaluate the relationship between differences in meQTL frequencies and methylation clock error. We found differences in meQTL frequency in the MAGENTA individuals, specifically many of the clock CpG affecting meQTL are most common in the African American cohort, consistent with our theory (Figure 6E,F). Nonetheless, there are 84 Horvath clock CpGs (24%) that are differentially methylated in AFR individuals, and 56 of these are affected by an meQTL, including 11 that are affected by an African ancestry-differentiated meQTL (Figure 6G). Finally, we find that 42 Horvath clock CpG sites in MAGENTA individuals with methylation levels that are significantly associated with increased clock error, and that are also affected by an meQTL (Figure 6B). However, at the individual level we do not find a clear relationship between the number of meQTL or ancestry-differentiated meQTL and methylation clock error. In light of these data, we have reframed our conclusions to state that meQTL likely contribute to clock error, while also being clear that they are not the sole cause.
  
  Can the authors explain or offer an investigation into why predicted age is often better in Cubans than Whites? They gave much attention to the opposite effect (of similar magnitude) in African Americans and Puerto Ricans but didn’t really discuss the surprisingly accurate prediction in Cubans.
  
  We did not focus on the results in the Cuban cohorts for several reasons. As discussed in response to Reviewer 2’s comment, the Cuban cohort had the smallest sample size (22 cases and 21 controls). Thus, while the correlation between methylation age and chronological age is similar to Whites, and in a few cases higher, the differences were not statistically significant. Second, looking at other error metrics, like mean absolute error, the clocks are comparatively less accurate in Cubans than on the White cohort (Supplementary Table 2). Finally, the clocks consistently find that Cubans with AD have lower predicted age than controls, though this is only significant for the ZhangEN clock. However, given these inconsisencies and the very small sample size, we caution against over-interpretation of these results. We clarify this in the manuscript and suggest that more work is needed on larger Cuban cohorts before any clear conclusions can be made.
  
  I was not a conceptual fan of the ensemble clock. The clocks are trained on very different things (e.g., chronological age versus clinical biomarkers) and are designed to capture different aspects of biology. Without more validation and motivation, I don’t think it makes sense to average values that are not designed to measure the same thing.
  
  We agree that combining the first and second-generation clocks for the task of age prediction is not sensible. However, for AD risk stratification, combining values from multiple clocks that capture different aspects of biology and aging could be beneficial. As mentioned in the main text, we took inspiration from approaches in polygenic risk scores, as well as the broader machine learning field, where ensembling often makes for better predictors. Nonetheless, consistent with the Reviewer’s intuition, we do not see improvement here.
  
  Minor comments
  
  (1) Typo in line 91.
  
  Thank you for bringing this to our attention. Fixed.
  
  (2) Lines 111-115, sample sizes would be helpful.
  
  We have added the sample sizes of the non-demented controls that were used to calculate these correlations in each cohort.
  
  (3) Line 137-138, the correlation stats would be helpful here. This is a common issue throughout the paper, more in-text statistics would help readers to evaluate the authors’ claims. For example, lines 249-251 as well. The authors refer the reader to Figure 5C, which itself has no statistics, this has two plots so it’s unclear which the authors are putting forward as the primary evidence.
  
  We have added more statistical details in the text and figures to address this comment. In this instance, we have removed the referenced figure.
  
  (4) Lines 258 and 261, I believe the authors report the same result in both these lines.
  
  Thank you for pointing out this lack of clarity. These lines report different, but related, results about the frequency of clock-affecting meQTL in different ancestral contexts. The first reports the frequency of clock CpGaffecting meQTL in individuals of African ancestry across all of gnomAD. The second result gives the frequency of those meQTL in different local ancestry backgrounds in admixed individuals. This is distinction is relevant since admixed individuals’ genomes are mosaics of multiple genetic ancestries. As such, a genetic variant might be present in haplotype whose ancestry is not in line with expectations based on global ancestry (e.g., an African American individual inherits a genetic variant within a European ancestry block). This local ancestry difference could modify the effect of the variant or obscure causal variants. Given the potential for confusion and similar results considering global and local ancestry context in this case, we have focused on the first result in the Main Text.
  
  (5) Somewhere, it would be helpful to provide the distribution/range of ages broken by cohort. Similarly, I didn’t see the breakdown of AD versus control cases within each cohort. Both of these features will impact power within a given cohort for certain analyses.
  
  We have added the distribution of ages by cohort in Supplementary Figure 1. Table 1 provides a breakdown of cases versus controls for each of the cohorts in the MAGENTA study.
  
  (6) Figure 3 is pretty hard to read. It would also be helpful if the authors put the white cohort in Figure 3A as a ’baseline’ comparison, as they use this as the baseline comparison in the text.
  
  We have made these changes to the figure and used larger text overall.
  
  (7) The various acronyms in the labels in Figure 5 are not explained. For Figure 5C - this is over-plotted and therefore hard to see.
  
  We have added the full population descriptors from gnomAD to the boxplots showing allele frequencies (Figure 6E). In addition, what used to be Figure 5C has been simplified and moved to Supplementary Figure 12.
  
  (8) The authors correct for cell type heterogeneity, which is known to vary across populations and can impact clock estimates. However, as far as I can tell, the cell type proportion estimates are coming from the DNA methylation data. The deconvolution algorithms for cell type proportions also have the same problem as the clocks of being trained on a very specific subset of human genetic and environmental diversity. Do the authors have any empirically derived estimates of cell type heterogeneity to sanity-check these deconvolution estimates? At the very least, it would be helpful to acknowledge this limitation.
  
  We thank the reviewer for commenting on this. There are no empirically derived estimates of cell type counts for the samples in the MAGENTA study. This is an inherent limitation of our study, and we have included text to make note of this.
  
  (9) There are very different sample sizes for each group, did the authors consider that their null results for the AD analyses in different cohorts are just a lack of power? This could be evaluated with power analyses or by comparing against sample sizes from similar studies in the literature.
  
  We agree that this is an important analysis and have added it to the manuscript. Given these small effect sizes, accounting for statistical power is essential for interpretation of our results. We performed power calculations based on an effect of the size observed in previous studies (0.5 year acceleration). Considering the full study, we have 86% power to detect an effect of this size. Stratifying by cohorts, we have 75% power for the African Americans, 72% for the Puerto Ricans, 72% for the Whites, 65% for the Peruvians, and 47% for the Cubans. Thus, we have high enough power that the consistent lack of association observed outside of the White cohort in MAGENTA is likely meaningful. Based on these calculations, there is only a 1% chance that we would not observe an effect in any of the other cohorts if the effect was present across cohorts. Nonetheless, we have added caveats about power and the small sample size to our suggestion that the reduced accuracy of the clocks contributes to the lack of association outside of Whites.
  
  (10) There has been a fair amount of discussion recently that single CpG-based clocks are much more variable than clocks that combine information across CpG sites, either using PC-based or window-based approaches. For example, the PC clock R package from the Levine Lab (https://github.com/MorganLevineLab/PC-Clocks) is very easily implemented and generally gives much less variable age estimations than site-level clocks. It would be nice to consider integrating or discussing these later-generation clocks as ways to improve clock performance in diverse human groups.
  
  We thank the reviewer for their suggestion to apply the principal component clock to account for potential technical variation. As outlined in the new section “Principal component versions of the methylation clocks also have lower age prediction accuracy for genetically admixed individuals,” using the principal component version of the Horvath clock did not result in consistent improvement in age prediction accuracy or generalization across MAGENTA cohorts (Supplementary Figures 4 and 5). The lower accuracy for age prediction in individuals with substantial African ancestry were present for the PC clock in the replication cohorts, just as in the MAGENTA cohorts (Supplementary Figure 6)
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.16.618588v3
www.biorxiv.org www.biorxiv.org

Comparing the outputs of intramural and extramural grants funded by National Institutes of Health

1
1. Public_Reviews 09 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Strengths:
  
  Great care was taken in selecting and cleaning the data, and in making sure that intramural vs. extramural projects were compared appropriately. The data has statistical validation. The trends are clear and convincing.
  
  We thank the reviewer for highlighting the strengths of the manuscript.
  
  Weaknesses:
  
  The Discussion is too short and descriptive, and needs more perspective - why are the findings important and what do they mean? Without recommending policy, at least these should discuss possible implications for policy.
  
  The Discussion has been substantially expanded. We added several new paragraphs discussing: the 2024 Senate HELP Committee proposal for NIH reform; implications for portfolio management (positioning extramural for basic research, intramural for clinical translation); generalizability to other agencies (DoD, NSF FFRDCs, DoE national labs); and the extramural program's role in workforce training as a societal benefit distinct from research outputs.
  
  The biggest problem I have with this submission is Figure 3, which shows a big decrease in clinical-related parameters between 2014 and 2019 in both intramural and extramural research (panels C, D and E). There is no obvious explanation for this and I did not see any discussion of this trend, but it cries out for investigation. This might, for example, reflect global changes in funding policies which might also influence the observed closing gaps between intramural and extramural research.
  
  We added an explicit explanation in the Results: because the dataset is truncated at 2020, clinical citations naturally approach zero near the window's end, consistent with the ~7-year lag for clinical citations to accrue documented in prior work (Hutchins et al., 2019). The APT metric declines less steeply because it uses the forward citation network for predictions.
  
  Reviewer #2 (Public review):
  
  Strengths:
  
  The authors leveraged publicly available data (including RePORTER and the iCite repository) and used robust validated metrics (RCR, APT, clinical citations). They carefully considered a large number of confounders, including those related to the PI, and performed several well-described regression analyses.
  
  We thank the reviewer for highlighting these strengths of the manuscript
  
  Figure 3A shows intramural projects producing about 2.75 papers per year in 2009, whereas extramural projects are producing just over 1 paper per year. Extramural projects appear to catch up over the next five years. While the authors attempt to explain the difference in their figure legend, another explanation is that the intramural projects started well before 2009 but, as the authors state, intramural data only became available in 2009.
  
  We added a methodological note acknowledging that some intramural projects may have had start dates prior to 2009 that are not captured in the data, and that the ramp-up of new intramural projects is slower because they are more tied to new PI hiring. We also note the exclusion of projects matched in 2008 as possible continuations. However, the slow ramp-up of Intramural costs in Supplemental Figure 3 is consistent with hiring-associated lagged investment suggesting that our filtering of continuing projects was very successful. Nevertheless, because we cannot completely rule out some continuing projects made it through despite our efforts, we have made the caveats mentioned above in the “Comparison of research topics” section of the Results and the Data section of the Methods.
  
  As the authors note, funding information is often complex and difficult to characterize for an analysis like this. How did the authors handle: i) publications linked to multiple extramural grants; ii) publications linked to intramural and extramural grants; iii) publications linked NIH grants and non-NIH grants?
  
  I would think it necessary to somehow apportion credit, as otherwise it would appear that extramural projects are more productive than they truly are.
  
  We have now explicitly stated that papers with both intramural and extramural funding links were excluded, while papers with multiple links within the same funding type were retained. A new Supplemental Figure 6 was added showing the distribution of papers by number of funding sources for both extramural and intramural grants, demonstrating that the vast majority acknowledged only one project. These changes are in the Methods, Data section and Supplemental Figure 6
  
  Apportioning credit among a many-to-many graph like the ones used here is indeed a high value problem to solve, but one with many researcher-degrees-of-freedom about analytical design decisions that impact the results. We are working on a rigorous methodology for this, but the amount of time required to do this well is its own research project, and out of scope for manuscript revisions.
  
  Also, it is not clear if the authors took account of the indirect costs paid by the NIH to universities that have received extramural grants.
  
  We added explicit language clarifying that all cost comparisons use inflation-adjusted total costs (direct + indirect) for extramural grants. We also added a new sensitivity analysis (Supplemental Figure 4) inflating extramural indirect costs by 30% to approximate unrecovered university expenditures, with the finding that the fundamental pattern holds even under this adjustment. These are found in the “Comparison of funding” and “Comparison of cost effectiveness” sections of the Results, as well as Supplemental Figure 4.
  
  Reviewer #3 (Public review):
  
  Strengths:
  
  The authors clearly presented their methods for processing the NIH project data and classifying projects into either intramural or extramural categories. The limitations of the study are also well-addressed.
  
  We thank the reviewer for highlighting these strengths of the manuscript
  
  Weaknesses:
  
  The article would benefit from a more thorough discussion of the literature, a clearer presentation of the results (especially in the figure captions), and the inclusion of evidence to support some of the claims.
  
  The Introduction was updated with more specific framing of prior literature (e.g., explicit mention of risk management, funding disparities, and diminishing marginal returns as the focus of prior work). New references were added throughout, including Sampat (2012) on mission-oriented NIH research, Ioannidis et al. (2019) on grant competition inefficiencies, Drummond et al. (2005) on health economic evaluation methods, and the Cassidy (2024) Senate report, throughout the introduction and discussion.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  The article would benefit from a more detailed analysis/discussion about the recovery of indirect costs for extramural research.
  
  I note that the authors are from the University of Wisconsin, which is part of the IRIS network (https://iris.isr.umich.edu/iris-members-map/). They could work with IRIS (also called UMETRICS) to get a better sense as to the true costs of extramural research for each project (e.g., all labor costs, all equipment costs). The IRIS data are extraordinarily robust. Here's an example of an IRIS / UMETRICS paper: https://www.science.org/doi/10.1126/sciadv.abb7348.
  
  They could, for example, re-do the analyses assuming that the recorded indirect cost covers only 70% of the true indirect costs. Thus, if they get $700,000 indirect costs from RePORTER, they should assume that the true indirect costs were $1,000,000. Similarly, they can add the costs of the time the PI spent writing the grant proposal, using the Bergstrom paper as a guide.
  
  Another option would be to conduct sensitivity analyses taking into account ~30% incomplete indirect cost recovery (see https://docs.house.gov/meetings/AP/AP07/20171024/106525/HHRG-115-AP07-Wstate-DroegemeierK-20171024.pdf) and lost efficiency due to excess time writing grant proposals (see https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000065).
  
  We conducted a sensitivity analysis as requested inflating extramural indirect costs by 30%, citing the Droegemeier (2017) Congressional testimony as the basis for this estimate. The cost of grant-writing time is now acknowledged in the Discussion as an unreimbursed hidden cost of the extramural system, citing Ioannidis et al. (2019). This narrowed the gap between extramural research and intramural research, but did not close it completely. In addition, our updated regression (Supplemental Figure 4) showed similar trends as our main Figure 4, but with the Intramural advantage heightened and the Extramural advantage diminished. Both remained significant. We have also added to the discussion that there are additional costs and benefits that may not be fully captured in an analysis such as ours.
  
  The authors appear to have used an agency-perspective for their cost-effectiveness analyses. Generally, it is preferable to use a wider societal perspective. While that may be difficult, the article would benefit from some discussion from the perspective of the government and universities.
  
  We added a new paragraph explicitly acknowledging the agency-centered perspective and its limitations, noting that it does not capture the full economic cost borne by universities (startup costs, philanthropy, endowments, state contributions, graduate student training, faculty retention, infrastructure). The extramural program's contribution to the US workforce pipeline is specifically highlighted as a societal benefit not captured by the cost-effectiveness metrics.
  
  Reviewer #3 (Recommendations for the authors):
  
  Line 84-87: "The overrepresentation of viral research is likely because of the outsize investment toward the intramural Vaccine Research Center, and the cancer/genetics overrepresentation due in part because National Cancer Institute intramural investigators conduct research at that institute as well as at the NIH Clinical Center for their human genetics work." What evidence is there to support this claim?
  
  A citation to the NCI Center for Cancer Research website was added to support the claim about NCI intramural investigators working at the Clinical Center and Center for Cancer Research, where vaccine research is extensively discussed.
  
  Lines 107-109. "Given that NIH funding for intramural research has remained relatively constant as a percent of total funding over the years, this indicates larger single awards for intramural research while extramural investigators may increasingly require multiple concurrent grants to sustain their labs." Authors may consider adding a panel to Figure 2 showing the percentage of total funding of intramural vs. extramural funding.
  
  Rather than adding a panel to Figure 2, we added a new Supplemental Figure 3 showing the cost breakdown and intramural percentage of total funding by year.
  
  Discussion section: Are any of the findings of this study relevant to other funding agencies in the US (such as the National Science Foundation, the Department of Energy, and the Department of Defense)?
  
  A new paragraph to the Discussion was added discussing implications for the Department of Defense (including the Congressionally Directed Medical Research Programs), NSF FFRDCs, and the Department of Energy's national labs and FFRDCs, arguing that the incentive-alignment logic likely generalizes across agencies.
  
  Methods section: Please add an explanation of the technique used for propensity score matching.
  
  A detailed step-by-step description of the PSM procedure was added, covering propensity score estimation, within-year matching, matched cohort construction, outcome regression on matched data, and visualization of results.
  
  Figure 1: Please clarify if the relative ratio of intramural projects is calculated from the numbers of grants (as suggested in lines 95-96 and 98-100) or the numbers of publications (as suggested in lines 82-83 and 97-98).
  
  Also, this figure would be more intuitive if, for each topic, it showed the relevant intramural number (as it currently does) and also the relevant extramural number.
  
  The caption and Methods were updated to clarify that clustering and ratio calculation are based on projects/grants, not publications. A formula was added to the Methods to make the ratio calculation explicit. The figure itself was not modified to add extramural bars, though the ratio calculation already implicitly encodes both.
  
  Figure 2: Please change "(red)" to "(blue)" in the caption, and remove the A as there is only one panel in this figure
  
  Figure 4: Please change "(red)" to "(blue)" in the caption.
  
  These changes have been made.
  
  Lines 19-21: I suggest rewriting this sentence as follows:
  
  "We find that extramural awards are more cost-effective for producing outputs commonly used for academic evaluation, such as publications and citations per dollar, while intramural awards are more cost-effective for generating research that influences future clinical work, more closely in line with agency's health goals."
  
  The sentence was rewritten substantially in line with the reviewer's suggestion, now reading more clearly with "per dollar" removed as a parenthetical and the structure of the comparison clarified.
  
  Lines 31-34: Please rewrite this sentence along the following lines to provide more context on previous research into the grant funding system:
  
  Certain aspects of the grant funding system have been the focus of research, such as AAAA (Azoulay et al., 2009), BBBB (Goldstein and Kearney, 2020), CCC (Hoppe et al., 2019), DDDD (Lauer et al., 2017), EEEE (Wahls, 2018a) and FFFF (Wahls, 2018b), but the relative merits of intramural and extramural funding have received little attention to date.
  
  The sentence was rewritten to name specific contributions of each cited paper (e.g., risk management, funding disparities, diminishing marginal returns), replacing the generic list of citations.
  
  Lines 41-44: Please explain "merit score" and please add a reference to an article or website that explains the review process at the NIH.
  
  "Merit score" was revised to "percentile ranking of overall impact merit score" and a citation to the NIH CSR website ("What happens to your application during and after review?," 2025) was added.
  
  Lines 53-54: Please change Intramural to intramural (two instances, and also in line 284), and Extramural to extramural.
  
  "Intramural" and "Extramural" were corrected to lowercase throughout.
  
  Line 65-67: This sentence ("Potential advantages of the intramural approach are that researchers in the NIH's own laboratories allow the NIH to hire researchers whose research agendas more closely align with its mission.") reads awkwardly. Please clarify.
  
  The sentence was rewritten to read more clearly: "An advantage of the intramural approach are that NIH has the direct ability to hire scientists whose research closely aligns with agency goals, and researchers do not need to devote time and effort on preparing and submitting grant applications."
  
  Line 95-97: Authors should consider including an equation to help explain the following sentence: "The relative ratio of intramural projects for each topic was calculated by taking a ratio of the proportions of total grants a topic represented in the intramural vs. extramural portfolios. A relative ratio >1 signifies a higher share of intramural project publications on that topic relative to their share across all topics."
  
  A formula was added to the Methods defining the topic-level ratio calculation explicitly.
  
  Line 143: The phrase "may reflect the extra attention intramural investigators are afforded" reads awkwardly - please reword.
  
  Reworded to "may reflect the extra time intramural investigators save because they do not have teaching and grant writing responsibilities."
  
  Lines 303-304: This sentence ("First, as the renewal of project contracts may alter the topic and arrangement of the projects, we dropped 70,297 projects with renewal records in our data.") reads awkwardly. Please clarify.
  
  Reworded to "Since the scientific focus of a study may drift over time, we dropped 70,297 projects with renewal records in our data."
  
  Line 378-379: Please specify the model of ChatGPT used.
  
  Done.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.11.09.566298v5
www.biorxiv.org www.biorxiv.org

Brain-Cognitive Gaps in relation to Dopamine and Health-related Factors: Insights from AI-Driven Functional Connectome Predictions

1
1. Public_Reviews 09 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  In the revised version, our primary focus has been to more clearly demonstrate the unique contribution of the brain-cognitive gap (BCG) beyond what is captured by cognitive performance alone, and to show that the BCG is not trivially driven by the observed cognitive scores. Additional analyses now demonstrate that the BCG provides complementary and nuanced information regarding factors associated with cognitive resilience, above and beyond the cognitive measures themselves.
  
  In response to the comment regarding the inclusion of a baseline predictive model, we would like to clarify that the central aim of our study is to compare predictive utility across different cognitive states (resting state, movie watching, and n-back), rather than to establish a single universally optimal prediction model. Several previous studies have already systematically compared deep learning approaches with more traditional machine learning methods for functional connectome-based prediction. In contrast, the goal of the present study is to examine how brain state modulates the ability of AI-based functional connectome models to capture individual differences in working memory and episodic memory.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors attempted to identify whether a new deep-learning model could be applied to both resting and task state fMRI data to predict cognition and dopaminergic signaling. They found that resting state and moving watching conditions best predict episodic memory, but only movie watching predicts both episodic and working memory. A negative 'brain gap' (where the model trained on brain connectivity predicts worse performance than what is actually observed) was associated with less physical activity, poorer cardiovascular function, and lower D1R availability.
  
  Strengths:
  
  The paper should be of broad interest to the journal's readership, with implications for cognitive neuroscience, psychiatry, and psychology fields. The paper is very well-written and clear. The authors use two independent datasets to validate their findings, including two of the largest databases of dopamine receptor availability to link brain functional connectivity/activity with neurochemical signaling.
  
  Weaknesses:
  
  The deep learning findings represent a relatively small extension/enhancement of knowledge in a very crowded field.
  
  It's unclear from these results how much utility the brain gaps provide above and beyond observed performance. It would be helpful to take a median split of the dataset on observed performance and plot aside the current Figure 3 results to see how the cardiovascular and physical activity measures differ based on actual performance. Could the authors perform additional analyses describing how much additional variance is explained in these measures by including brain gaps?
  
  We thank the reviewer for raising this important point. In response to their request, we first examined the relationship between the BCG and the cognitive measure itself. We did not find any significant relationship in either the DyNAMiC sample (r =0.01, p =0.939) or the COBRA sample ((r =0.01, p=0.894) (see Author response image 1).
  
  Author response image 1.
  
  We then conducted additional analyses, splitting the sample into high and low EM performers, and compared their levels of physical activity and Framingham cardiovascular disease (CVD) risk scores. We found no significant difference in physical activity (DyNAMiC: p =0.56, 95% CI: –14.99 - 8.13; COBRA: p =0.29, 95% CI: –3.54 - 1.05) or Framingham CVD risk score (DyNAMiC: p =0.11, 95% CI: –1.08 - 10.72; COBRA: p =0.41, 95% CI: –1.86 - 4.58) between high and low EM perfprmers. Given the significant difference in physical activity and Framingham CVD risk score between positive and negative BCG groups, our results support that BCG provides unique information, beyond the observed cognitive measure (episodic memory score), regarding factors that contribute to cognitive resilience. These results have been added to Section 2.4, and Figure 3 has been updated.
  
  Some of the imaging findings require deeper analysis. For Figure 1f - Which default mode regions have high salience? DMN is a huge network with subregions having differing functions.
  
  Grad-CAM provides a coarse, gradient-based attribution that reflects how the learned feature maps contribute to the model output. It is not designed to produce specific input-level interpretations, such as symmetric edge-wise importance values. Therefore, the primary interpretation remains at the network level rather than at the level of individual FC edges.
  
  Along the same lines, were the striatal D1R findings regionally specific at all? It would be informative to test whether the three nuclei (Accumbens, Caudate, Putamen) and/or voxelwise models would show something above and beyond what is achieved from averaging D1R across the striatum. What about cortical D1R, which is highly abundant, strongly associated with cognitive (especially WM) performance, and has much unique variance beyond striatal D1R? https://www.science.org/doi/full/10.1126/sciadv.1501672. The PET findings are one of the unique strengths of this paper and are underexplored. It's also unclear if the measure of brain entropy should simply be averaged across all regions.
  
  In this study, we focused on D1DR/ D2DR averaged across the caudate and putamen, which has been reported in our previous work to be more strongly associated with cognitive functions (Johansson et al., 2023, Nyberg et al., 2016), compared to the nucleus Accumbens, which tends to show lower D1DR/D2DR levels and limited association with these cognitive domains. Following the Reviewer’s suggestion, we examined regional variations and found that while both caudate and putamen D1DR showed significant associations with BCG, there were no significant associations for D1DR in the nucleus accumbens or DLPFC with BCG. For D2DR, we observed a significant association between caudate/putamen D2DR and BCG.
  
  D1DR:
  
  Partial correlation between:
  
  Caudate_Bilateral vs. NegGap, (r =0.37, p =0.02
  
  Putamen_Bilateral vs. NegGap, r =0.34, p =0.03
  
  Accumbens_Bilateral vs. NegGap, r =0.07, p =0.69
  
  Mean (LRCaud, LRput, LRacc) vs NegGap, r =0.35, p =0.03
  
  DLPFC_Bilateral vs NegGap, r =0.21, p =0.21
  
  Striatum_Bilateral (Mean (LRCaud, LRput)) vs. NegGap, r =0.40, p =0.01
  
  Caudate_Bilateral vs. PosGap, r=–0.37, p=0.02
  
  Putamen_Bilateral vs. PosGap, r=–0.53, p=0.02
  
  Accumbens_Bilateral vs. PosGap, r=–0.25, p=0.31
  
  Mean (LRCaud, LRput, LRacc) vs PosGap, r=–0.41, p=0.08
  
  DLPFC_Bilateral vs. PosGap, r=–0.30, p=0.21
  
  Striatum_Bilateral (Mean (LRCaud, LRput)) vs. PosGap, r=–0.49, p=0.03
  
  Author response image 2.
  
  D2DR:
  
  Correlation between:
  
  Caudate_Bilateral vs. NegGap, r=0.36, p=0.0003
  
  Putamen_Bilateral vs. NegGap, r=0.22, p=0.03
  
  Accumbens_Bilateral vs. NegGap, r= –0.01, p=0.91
  
  Mean (LRCaud, LRput, LRacc) vs PosGap, r= –0.24, p=0.01
  
  Striatum_Bilateral vs. NegGap, r=0.39, p=0.0001
  
  Caudate_Bilateral vs. PosGap, r= –0.34, p=0.004
  
  Putamen_Bilateral vs. PosGap, r= –0.37, p=0.002
  
  Accumbens_Bilateral vs. PosGap, r= –0.21, p=0.09
  
  Mean (LRCaud, LRput, LRacc) vs PosGap, r= –0.38, p=0.001
  
  Striatum_Bilateral vs. PosGap, r= –0.49, p=0.0001
  
  We have added the following sentence to the Results section to highlight these regional differences in D1DR/D2DR in relation to BCG.
  
  “Both D1DR and D2DR availability in the striatum were associated with BCG, such that lower dopamine receptor availability was linked to a greater behavioral-cognitive gap. However, these associations varied by region. For D1DR, significant correlations with BCG were observed in the caudate (positive gap: r = –0.37, p =0.02; negative gap: r= 0.37, p =0.02) and putamen (positive gap: r = –0.53, p=0.02; negative gap:r=0.34, p=0.03), but not in the nucleus accumbens (positive gap: r= –0.25, p= 0.31; negative gap: r =0.07, p=0.69) or the DLPFC (positive gap: r = –0.30, p=0.21; negative gap: r =0.21, p=0.21). For D2DR, both caudate (positive gap: r = –0.34, p=0.004; negative gap: r =0.36, p=0.0003) and putamen (positive gap: r = –0.37, p=0.002; negative gap: r =0.22, p=0.03) showed significant associations with BCG.”
  
  Author response image 3.
  
  It is not clear from the text that the authors met the preconditions for mediation analysis (that is, demonstrating significant correlations between D1R and entropy, in addition to the correlation with brain gap. The authors should report this as well.
  
  This is a fair question. We recalculated entropy in the striatum, given that D1DR is more strongly expressed in this region and, therefore, reduced striatal D1DR may have a more pronounced impact on local entropy (as the reviewer suggested, it may not be appropriate to compute entropy across all brain regions). Our analyses showed that lower D1DR/D2DR levels were associated with higher entropy, which in turn was related to higher BCG.
  
  DyNAMiC; negative gap:
  
  Partial correlation between:
  
  Entropy and D1DR, r = –0.33, p=0.04.
  
  Entropy and NegGap, r = –0.36, p=0.03.
  
  DyNAMiC; positive gap:
  
  Partial correlation between:
  
  Entropy and D1DR, r = –0.56, p=0.01.
  
  Entropy and PosGap, r r =0.47, p=0.04.
  
  COBRA; negative gap:
  
  Correlation between:
  
  Entropy and D2DR, r = –0.22, p=0.03.
  
  Entropy and NegGap, r = –0.27, p=0.007.
  
  COBRA; positive gap:
  
  Correlation between:
  
  Entropy and D2DR, r = –0.26, p=0.03.
  
  Entropy and PosGap, r = 0.25, p=0.03.
  
  We have added these results under the result section 2.6. We have further updated Figure 4 in the revised manuscript, reporting these correlation results.
  
  Was age controlled for in the mediation analysis? I would not consider this result valid unless that is the case.
  
  We utilized the mediation package in R, and to control for a covariate age in the mediation analysis, we added age as a covariate in both the mediator model and the outcome model. The following information has been added in the method section in the revised version of the manuscript.
  
  “To assess the statistical significance of this mediation effect, we employed the bootstrapping method as outlined by Preacher and Hayes (145) and age has been controlled for in all statistical analysis.”
  
  The discussion section is long, but the authors would do better to replace some less helpful sections (e.g., the paragraph on methodological tweaks to parcellations and model alignment) with a couple of other important points, including:
  
  (1) Discuss the 'sweet-spot' of movie watching for behavior prediction in the context of studies showing that task states 'quench' neural variability: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007983. This may not be mutually exclusive of the discussion on dopamine and signal-to-noise ratio, but it would be helpful for the authors to discuss their potential overlap vs. unique contributions to the observed findings.
  
  Thank you for the comment. We have now eliminated the section about methodological tweaks and extended the discussion on the sweet-spot of the task for behavioral prediction by referencing the paper that the reviewer suggested. Here comes the paragraph discussing this topic:
  
  “Additionally, previous research showed that movie-watching alters the propagation of activity across cortical pathways (105), particularly within and between regions involved in audiovisual processing and attention. These alterations lead to a less segregated and more integrated network organization (106). Similarly, the n-back task has been associated with increased integration of task-positive cortico-cortical connectivity (104, 107) and striato-cortical connectivity (102). Our findings also suggest that certain task contexts strike an optimal balance between reducing neural variability and maintaining sufficient richness to capture individual differences. Prior work shows that task states quench neural variability, leading to a more reliable and predictable neural signal (108). In this context, movie watching may represent such a sweet spot constraining neural dynamics through shared audiovisual stimulation, while simultaneously engaging a broad range of cognitive processes that preserve individual differences.”
  
  (2) The argument that dopamine signaling increases signal-to-noise ratio is based on some preclinical data as well as correlational data using fMRI with pharmacological challenges. It is less clear how PET-derived estimates of D1R and D2R availability equate to 'dopamine signaling' as it is thought of in this context. Presumably, based on these data, higher D1R or D2R availability would be related to greater levels of tonic dopaminergic signaling. However, in the case of the COBRA dataset with D2R estimates, those are based on raclopride -- which competes with endogenous dopamine for the D2 receptor. Therefore, someone with higher levels of endogenous dopamine signaling should theoretically have lower raclopride binding and lower D2R estimates. I'm not arguing that the authors' logic is flawed or that D1R and D2R are not good measures of dopamine signaling, but I'd ask the authors to dig into the literature and describe more direct potential links for how greater receptor availability might be associated with greater dopamine signaling (and hence lower entropy). Adding this to the discussion would be very valuable for PET research.
  
  Thank you for raising this important point. We agree that D1R and D2R availability should not be taken as direct proxies of dopamine signaling. However, prior work has suggested meaningful associations between pre- and post-synaptic markers. For instance, a well-powered study demonstrated a significant correlation between D2R availability and dopamine synthesis capacity measured by FMT (Berry et al., 2018). This finding supports the idea that postsynaptic receptor markers may, under certain conditions, serve as an indirect proxy for dopaminergic signaling. Moreover, the number of dopamine-producing neurons innervating the striatum during development has been proposed to shape the structural maturation and arborization of dendrites (McAllister, 2000; Whitford et al., 2002), potentially providing a structural and functional basis for observed associations between pre- and post-synaptic measures.
  
  At the same time, smaller-scale studies have yielded mixed findings, reporting either non-significant associations (Heinz et al., 2005; Kienast et al., 2008) or negative correlations (Ito et al., 2011). Importantly, the latter studies employed [18F]FDOPA to index dopamine synthesis, which has been argued to provide a less reliable estimate of synthesis capacity compared to FMT, as used in Berry et al. (2018). These inconsistencies underscore that the relationship between pre- and post-synaptic markers is not straightforward and requires further examination in larger, well-powered samples. The following paragraph has been added to the discussion.
  
  “An important caveat is that D1DR and D2DR availability do not provide a direct measure of dopamine signaling. Instead, they reflect receptor availability, which interacts with endogenous dopamine in a complex manner. PET measures of D1R and D2R availability reflect the density of unoccupied dopamine receptors and the degree to which endogenous dopamine competes with radioligand binding. D2R binding potential is sensitive to competition from synaptic dopamine, such that higher ambient dopamine generally reduces tracer binding; D1R binding, however, is less affected by endogenous dopamine under physiological conditions, reflecting more directly receptor expression levels. Previous studies demonstrated a significant association between D2R availability and dopamine synthesis capacity measured by FMT (117, 118), suggesting that postsynaptic receptor markers may, under certain conditions, serve as a proxy for dopaminergic signaling. Developmental factors, such as the number of dopamine-producing neurons innervating the striatum, may further influence the structural and functional relationship between pre- and post-synaptic markers. By contrast, smaller studies have reported non-significant (119, 120) or negative (121) associations, although these studies relied on [18F]FDOPA, which is considered a less precise index of dopamine synthesis than FMT. Taken together, these reports indicate that the relationship between pre- and post-synaptic markers is complex and not necessarily linear. Accordingly, our observation that lower receptor availability is associated with greater neural variability should not be interpreted as direct evidence of weaker dopaminergic signaling, but rather as reflecting the interplay between receptor density and endogenous dopamine occupancy, particularly in the case of D2DR.”
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors developed a deep learning model based on a DenseNet CNN architecture to predict two cognitive functions: working memory and episodic memory, from functional connectivity matrices. These matrices were recorded under three conditions: during rest, a working memory task, and a movie, and were treated as images for the CNN algorithm. They tested their model's performance across different conditions and a separate dataset with a different age distribution (using the same MRI scanner, scanning configurations, and cognitive tests). They also calculated the "brain cognition gap" based on the model trained on resting functional connectivity to predict working memory. Extending from the commonly used index "brain age," the brain cognition gap was defined as the difference between the working memory score predicted by their model (predicted working memory) and the working memory score based on the working memory test itself (observed working memory). This brain cognition gap was found to be associated with physical activity, education, and cardiovascular risk. The authors also conducted additional mediation tests to examine whether regional functional variability mediated the relationship between PET-derived measures of dopamine and the brain cognition gap.
  
  Strengths:
  
  The major strength of this manuscript is the extensive effort the authors have put into creating a new 'biomarker' that links deep learning with fMRI, PET, physical activity, education, and cardiovascular risk across two studies. This effort is impressive.
  
  Weaknesses:
  
  There are several weaknesses in the current methods and results, making many of the claims unconvincing. These weaknesses include:
  
  (1) The lack of baseline models to benchmark the predictive performance of their DenseNet models.
  
  (2) The inappropriate calculation of the brain cognition gap due to the lack of control for regression-toward-the-mean and the influence of the working memory itself (a common practice in brain age studies).
  
  (3) The lack of benchmarking of the brain cognition gap against the 'corrected' brain age gap and the direct prediction of physical activity, education, and cardiovascular risk.
  
  (4) Minimal justification for their PET mediation analysis.
  
  We appreciate the reviewer’s constructive comments on the strengths and weaknesses of our study. In this revised version, we’ve addressed the concerns regarding the calculation of the brain-cognitive gap, clarified the unique variance that the brain-cognitive gap contributes beyond cognition itself, and provided additional justification for the PET mediation analysis. For the lack of a baseline model, it is important to highlight that our aim has never been to compare the predictive power of different deep learning or machine learning approaches. Therefore, the text in the introduction and discussion has been amended to avoid miscommunication on this topic.
  
  Regarding the impact of the work on the field and the utility of the methods and data to the community, I see its potential. However, addressing all the weaknesses listed above is crucial and likely to change the conclusions of the results.
  
  It is important to note that many statements in the manuscript are overstated, making the contribution of the manuscript seem exaggerated.
  
  We have run additional analysis based on the reviewer’s suggestions. The effect sizes and statistical values were adjusted due to the corrections; the overall conclusions remain largely consistent. The relationships between the brain-cognition gap and key factors such as physical activity, and cardiovascular risk persisted. We have updated the manuscript accordingly and revised the relevant sections to reflect these refinements and the resulting interpretations.
  
  For instance, the abstract claims "there is a lack of objective biomarkers to accurately predict cognitive function," and the discussion states, "across various studies, the correlation between predicted and actual fluid intelligence typically hovers around 0.25 (98-100)." However, a meta-analysis by Vieira and colleagues (2022 https://doi.org/10.1016/j.intell.2022.101654) found over 37 studies up to 2020 predicting cognitive abilities from fMRI with machine learning, with 24 studies published in 2019-20 alone. Since 2020, with the rise of machine learning and AI, even more studies have likely been published on this topic, all claiming to show objective biomarkers to accurately predict cognitive function. Vieira and colleagues also found an average performance of these objective biomarkers in predicting general cognition at r = .42, similar to what was found in this manuscript. Based on this alone, it is unclear how novel or superior their method is without a proper systematic benchmark.
  
  We appreciate the opportunity to clarify our study’s contribution relative to prior work. We have revised the introduction and discussion to highlight the contribution of other methods when it comes to biomarkers. As for the comment related to the work by Vieira and colleagues, Vieira et al. (2022) indeed present a comprehensive meta-analysis of studies predicting general and fluid intelligence using neuroimaging and machine learning. However, there are two critical differences between ours verus previous work:
  
  Target Cognitive Domains:
  
  Our study does not focus on general or fluid intelligence, but rather on comprehensive EM (3 tests) and WM (3 tests), two distinct cognitive domains that are critically important for aging research. These distinct abilities, in this context (measured by three independent tests to boost the reliability) are less frequently studied as predictive targets in the existing fMRI-ML literature, particularly using deep learning methods.
  
  Critically, our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back), with the aim of identifying the states that best capture individual differences across domains. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.
  
  Our primary objective is to test how brain state influences the ability of functional connectivity to predict domain-specific cognitive performance, using a deep learning framework. As now stated explicitly in the revised manuscript, this objective is operationalized through three clearly defined aims:
  
  (1) To compare the predictive utility of functional connectomes derived from different brain states (resting state, movie watching, and n-back task) for EM and WM;
  
  (2) To introduce and evaluate a brain-cognition gap as a marker of individual differences beyond chronological age; and
  
  (3) To examine the contribution of dopaminergic integrity to variability in connectome uniqueness and brain-cognition gaps.
  
  We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework for some of the prediction models, we did not conduct a comprehensive benchmark across all available machine learning methods nor was this the aim of the present study. Accordingly, we have adjusted the text to avoid implying methodological/biomarker superiority beyond the scope of our analyses.
  
  Modeling Approach:
  
  While Vieira et al. show that the majority (76%) of prior studies used linear modeling approaches, including CPM and penalized regressions, these models are often vulnerable to overfitting, especially when applied to high-dimensional fMRI data. Our use of a DenseNet-based CNN architecture is motivated by the need to leverage inductive biases suited to functional connectivity data, and we evaluate this approach across multiple cognitive tasks and independent datasets.
  
  Vieira and colleagues report that studies predicting general intelligence from fMRI (particularly from the HCP dataset) average around r =0.42, while those predicting fluid intelligence average around r =0.15. Our original claim about the correlation hovering around 0.25 is therefore not incorrect – and aligns with the Vieira meta-analysis. We have, however, nuanced this statement in the manuscript, now stating that correlations are higher for general intelligence than fluid intelligence.
  
  Altogether, we considered the reviewer’s comments and therefore conducted a careful revision of the manuscript text to moderate and clarify statements that may have come across as overstated. We have refined the language throughout the Introduction and Discussion sections to better align with the strength of the evidence and the scope of our contributions. A few examples are:
  
  “Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back), with the aim of identifying the states that best capture individual differences across domains. The relative performance of deep learning and other non-linear approaches depends on multiple factors, including sample size, model architecture, feature representation, and domain-specific characteristics of the prediction target. In this context, deep learning was employed as a flexible framework capable of modeling high-dimensional functional connectivity patterns across cognitive states, rather than as a claim of inherent methodological superiority. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.”
  
  Also in page 14.
  
  “Our study introduces a deep neural network architecture that features dense connections and incorporates an attentional mechanism. While our findings demonstrate that a deep learning framework can provide reasonable predictive accuracy, it is important to note that other machine learning approaches (e.g., tree-based models) may offer comparable predictive power, as suggested by prior benchmarking work (29, 30).”
  
  Similarly, the authors claim superior performance of deep learning and mischaracterize machine learning algorithms: "In particular, deep neural networks (DNN) methods have been successfully applied to behavioral and disease prediction (24-26), and have been found to outperform other machine learning approaches (27-29)," and "Deep learning approaches overcome the limitation of predictive techniques that solely rely on linear associations between connectivity and behavioral phenotypes (17)." However, the superiority of deep learning is debatable. Studies show comparable performance between machine learning (such as kernel regression) and deep learning (such as fully-connected neural networks, BrainNetCNN, Graph CNN (GCNN), and temporal CNN), e.g., He and colleagues (2019) and Vieira and colleagues (2024) https://doi.org/10.1016/j.neuroimage.2019.116276 and Vieira and colleagues' https://doi.org/10.1101/2024.03.07.583858.
  
  We agree that the performance gap between traditional machine learning models and deep learning (which is a subcategory of machine learning) in neuroimaging is debatable and task-dependent. Indeed, both He et al. (2019) and Vieira et al. (2024) offer evidence that kernel regression can achieve performance on par with deep learning models, applied to appropriate datasets.
  
  We have therefore nuanced the statements in the revised version of the manuscript as follows:
  
  Introduction:
  
  “In particular, deep neural networks (DNN) methods have been successfully applied to behavioral and disease prediction (24-26), and were initially expected to outperform other machine learning approaches (27-29). However, this superiority remains debatable, as recent studies have reported comparable performance between DNNs and traditional methods (He et al.,2019; Vieira et al.,2024). Accordingly, the present study does not aim to benchmark deep learning against traditional machine learning approaches, but instead uses a consistent predictive framework to examine how brain state influences the utility of FC for cognitive prediction.”
  
  “Deep learning approaches offer a flexible modeling framework capable of capturing complex non-linear associations in high-dimensional data with potentially less sensitivity to training on a smaller subsample (Vieira et al., 2024)”.
  
  Discussion:
  
  We agree that traditional methods, such as kernel-based models, tree ensembles, and non-linear SVRs, can also effectively capture such relationships. The relative performance of our model and other non-linear approaches depends on several factors, including data size, model architecture, and domain-specific considerations. We have included additional explanations in the discussion to address this.
  
  Moreover, many non-deep learning predictive techniques are non-linear, e.g., XGBoost, CatBoost, random forest, kernel ridge, and support vector regression with non-linear kernels (such as RBF and polynomial). Thus, stating that machine learning can only model linear relationships is incorrect. Moreover, for the small amount of data the authors had, some might argue that a linear algorithm might be more appropriate to balance the bias-variance trade-off in prediction. Again, without a proper systematic benchmark, it is unclear how well their DenseNet algorithm performs compared to other algorithms.
  
  Thank you for bring this up. We have now removed statements implying that machine learning can only model linear relationship.
  
  Regarding the Brain Age literature, the authors also misinterpreted recent findings: "However, a recent study suggests that brain age predictions contribute minimally compared to chronological age for explaining cognitive decline (65), implying that cognitive predictions are more reliable." In this study, Tetereva and colleagues (2024) (https://doi.org/10.7554/eLife.87297.4) showed that non-deep-learning machine learning can make good predictions from MRI on both chronological age (with r up to .88) and fluid cognition (with r up to .627). Using the combination of functional connectivity matrices across rest and tasks to predict fluid cognition, they found performance at r = .565, comparable to what was found in the current manuscript with deep learning. Nonetheless, while brain age predicted chronological age well (and brain cognition predicted fluid cognition well), it was problematic to predict fluid cognition from brain age. They showed that, because brain age, by design, shared so much common variance with chronological age, brain age and chronological age captured the same variance of fluid cognition. When chronological age was controlled for in the prediction of fluid cognition, brain age no longer had high predictive ability. In the case of the current manuscript, the brain cognition gap is not appropriately controlled for cognition (to be more precise, a working memory score). I expect the performance in predicting physical activity, education, and cardiovascular risk will drop dramatically once cognition is controlled for. There are at least two ways to control cognition according to Tetereva and colleagues' study (see more in the recommendations).
  
  We thank the reviewer for breaking down the findings in the study by Tetereva and colleagues (2024). It was not our intention to suggest that Tetereva et al. showed brain age has little predictive value in general. Our understanding of the findings reported in that study is on par with the reviewers’ clarifications. We have now revised the introductions to avoid any misunderstanding:
  
  “A recent study demonstrated that while brain age can predict chronological age with high accuracy from MRI, its utility for predicting cognition is limited. Specifically, Tetereva and colleagues (2024) showed that brain age strongly tracks chronological age and that brain cognition (using functional connectivity) can predict fluid cognition. Yet, when used to predict cognition, brain age largely overlapped with chronological age, such that controlling for chronological age eliminated the predictive contribution of brain age. This finding suggests that brain-age models may provide little unique explanatory power for cognitive decline beyond what is already captured by chronological age. Building on this observation and extending the concept of a brain-age gap to a brain-cognition gap (BCG, defined as the discrepancy between predicted and observed cognitive performance), we propose that a BCG may serve as an informative marker of individual differences.”
  
  In addition, in response to the first comment from Reviewer 1, we have extended our results in the manuscript. We first showed that BCG is not significantly associated with cognition itself (see Author response image 1). Moreover, we conducted additional analyses, splitting the sample into high and low EM performers, and compared their levels of physical activity and Framingham cardiovascular risk scores. We found that no significant difference in physical activity (DyNAMiC: p =0.56, 95% CI: -14.99 – 8.13; COBRA: p =0.29, 95% CI: -3.54 – 1.05) or Framingham CVD risk score (DyNAMiC: p =0.11, 95% CI: -1.08 – 10.72; COBRA: p =0.41, 95% CI: -1.86 – 4.58) between high and low EM performers. Given the significant difference in physical activity and Framingham CVD risk score between positive and negative BCG groups, our results support that BCP provides unique information, beyond cognitive measures, regarding factors that contribute to cognitive resilience. This text has been added into the result section, and Figure 3 has been updated in the manuscript.
  
  The authors mentioned, "The third aim of the current study is to uncover the contribution of dopamine (DA) integrity to brain-cognition gaps." However, I fail to see how mediation analysis would test this. The authors also mentioned, "Insufficient DA modulation can affect neurocognitive functions detrimentally (69, 74, 76-78)." They should test if DA levels are related to working memory scores in their study, and if so, whether the relationship is mediated by the "corrected" brain-cognition gaps. Note see more on the recommendation for the calculation of the "corrected" brain-cognition gaps.
  
  Our mediation was not designed to test whether DA predicts episodic memory performance directly, nor whether BCG mediates such a relationship. Instead, we specifically investigated whether the effect of DA on BCG operates through functional variability, the theoretical framework emphasizing the role of DA on neuronal grain and signal-to-noise ratio (see our recent work in Korkki et al., 2025). We agree that future work could extend our approach by directly examining whether BCG mediates the link between DA and cognitive outcomes. However, in the present study, our primary focus was on testing the mechanistic pathway of DA → entropy → BCG.
  
  In line with this aim, we found that lower DA receptor availability was associated with larger BCGs (Figure 4). We then asked whether this relationship is mediated by functional signal variability, such that lower DA is linked to reduced signal-to-noise ratio (i.e., greater entropy), which in turn contributes to less reliable prediction of cognition and, consequently, larger BCGs. Our mediation analysis supports this pathway (please see also our reply to Reviewer 1, Comment 6).
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This paper by Esmaeili and co-authors presents a connectome prediction study to predict episodic memory and relate prediction errors to other phonotypic variables.
  
  Strengths:
  
  (1) A primary and external validation dataset.
  
  (2) Novel use of prediction errors (i.e., brain-cognitive gap).
  
  (3) A wide range of data was investigated.
  
  Weaknesses:
  
  (1) Lack of comparisons to other methods for prediction.
  
  (2) Several different points are being investigated that don't allow any particular one to shine through.
  
  (3) Some choices of analysis are not well-motivated.
  
  (4) How do the n-back connectomes perform for prediction if the authors do not regress task activations from the n-back task?
  
  We thank the reviewer for raising these important points. For the lack of comparisons to other methods, it is important to highlight that our aim has never been to compare the predictive power of different deep learning or machine learning approaches. Rather, our primary objective was to test how brain state influences the ability of functional connectivity to predict domain-specific cognitive performance, using a deep learning framework.Therefore, the text in the introduction and discussion has been amended to avoid miscommunication on this topic.
  
  We chose to regress out task-evoked activations based on prior work demonstrating that failing to do so can produce spurious but systematic inflation of task functional connectivity estimates (Cole et al., 2019). In that study, as well as subsequent reports (e.g., Gao et al., 2020; Gonzalez-Castillo & Bandettini, 2018), connectomes derived without activation regression tended to capture task-evoked coactivations rather than background task functional interactions, which can artificially boost predictive performance but limit interpretability (whether it is co-activation or intrinsic connectivity during an entire goal-oriented task) and generalizability. For this reason, our analyses focused on the more conservative approach of regressing out task activations. Accordingly, we compared predictive performance only under this preprocessing strategy.
  
  We have added the following sentence to clarify this in the method: “To avoid spurious inflation of task functional connectivity by task-evoked activations, we regressed out task activation patterns from the n-back data prior to estimating functional connectivity, following recommendations by Cole et al. (2019) and related work.”
  
  (5) I am a little concerned about overfitting with the convolutional neural net. For example, the drop-off in prediction performance in the external sample is stark. How does the deep learning approach used here compare to something simpler, like a connectome-based predictive model or ridge regression?
  
  (6) It may be nice to try the other models in the validation dataset. This would also provide a sense of the overfitting that may be going on with overfitting.
  
  We thank the reviewer for raising this point. The prediction performance indeed dropped for episodic memory when models trained on the DyNAMiC sample were applied to the COBRA sample, whereas performance for working memory remained nearly identical across datasets. Moreover, our prediction power is on par with previous studies reporting reliable prediction of intelligence using deep learning approach (Vieira et al., 2021; Fan et al.,2020). While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework, we did not conduct a comprehensive benchmark across all available machine learning methods nor was this the aim of the present study.
  
  We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. Finally, We have added the following paragraph to the discussion:
  
  “Our study used a deep neural network architecture that features dense connections and incorporates an attentional mechanism. While our findings demonstrate that a deep learning framework can provide reasonable predictive accuracy, it is important to note that other machine learning approaches (e.g., tree-based models) may offer comparable predictive power, as suggested by prior benchmarking work (29, 30). Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back) to identify the states that best capture individual differences across domains. The relative performance of deep learning and other non-linear approaches depends on multiple factors, including sample size, model architecture, feature representation, and domain-specific characteristics of the prediction target. In this context, deep learning was employed as a flexible framework capable of modeling high-dimensional functional connectivity patterns across cognitive states, rather than as a claim of inherent methodological superiority. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.”
  
  (7) While predictive models increase the power over association studies, they still require large samples to prevent overfitting. Do the authors have a sense of the power their main and external validation sample sizes provide?
  
  We thank the reviewer for this important point. Our main sample size, together with the external validation in COBRA, is moderate for deep learning applications. To reduce the risk of overfitting, we employed several strategies, including external validation, early stopping, dropout, and regularization. As noted, performance for episodic memory decreased in the external sample, which we acknowledge, but key associations such as the link between BCG and resilient factors remained significant. Importantly, prediction of working memory was maintained across datasets, reducing the likelihood that the observed findings are driven by overfitting. We have added a statement in the Discussion to reflect on the limitations of sample size and the implications for generalizability.
  
  We added the following sentence to the discussion:
  
  “We acknowledge that our main and validation samples are moderate in size for deep learning, which constrains statistical power and generalizability. Although external validation, early stopping, dropout, and regularization help mitigate overfitting, larger samples will be needed in future work to fully establish the robustness of these predictive models.”
  
  (8) I am not sure that the Mann-Whitney is the correct test for comparing the distributions of prediction performances. The distributions are dependent on each other as they are each predicting the same outcomes. Using the typical degrees of freedom formula would overestimate the degrees of freedom.
  
  We appreciate the reviewer’s comment and agree that applying statistical tests directly to bootstrapped samples can lead to inflated or misleading p-values, as the degrees of freedom are determined by the number of bootstrap iterations rather than the actual number of independent observations.
  
  In our analysis, the Mann-Whitney U test was applied to 1000 bootstrapped correlation coefficients (r) for each model. While this number is relatively low and was chosen to limit overestimation of significance, we recognize that these bootstrapped samples are not independent, and thus the use of a Mann-Whitney U test can still be problematic. To address this concern, we have revised our statistical analysis. Rather than applying the Mann-Whitney U test to the bootstrapped r distributions, we now compute the difference in correlation coefficients (Δ r = r<sub>actual</sub> − r<sub>rest</sub>) for each bootstrap iteration. We then calculate a 95% confidence interval for Δr. If this interval does not include zero, we consider the difference statistically significant. This approach avoids artificially inflating the sample size and adheres more closely to proper statistical inference.
  
  We have updated the Methods (the following text) and Results sections accordingly and clearly stated the limitations regarding the degrees of freedom for all tests.
  
  “For the bootstrap-based comparison of model performance (bootstrap resampling with 1000 iterations), no test statistic with an associated degree of freedom is reported. Instead, statistical inference is based on the bootstrap distribution of the difference in correlation coefficients (Δr) and its 95% confidence interval. As bootstrap confidence-interval–based inference does not rely on an analytic sampling distribution, degrees of freedom are not defined for this procedure.” This has now been explicitly stated in the Methods section to avoid ambiguity.
  
  In the result section, we have reported with corresponding CI.
  
  (9) The brain cognition gap is interesting. It is very similar conceptually to the brain age gap. When associating the brain age gap with other phenotypes, typically age is regressed from the brain age gap and the other phenotype. In other words, age is typically associated with a brain age gap as individuals at the tail ages often show the largest gaps. Is the brain cognition gap correlated with episodic memory and do the group differences hold if episodic memory is controlled for?
  
  We thank the reviewer’s comment regarding the relationship between the brain cognition gap and episodic memory.
  
  Since this question was raised by all reviewers, we have conducted additional analyses. We did find that BCG is independent from the cognitive measure and provided additional information, beyond cognition alone, about factors contributing to resilience. Please visit our response to the first comment of Reviewer 1.
  
  (10) I have the same question for the dopamine results. Particularly, in the correlations that are divided by brain cognition gap sign. I could see these types of patterns arise due to a correlation with a third variable.
  
  For dopamine results, we explored whether age or cognition alone might confound the dopamine–brain cognition gap relationships. However, neither was significantly correlated with the brain cognition gap groups. The associations remained significant after controlling for age, suggesting that the observed patterns are not likely due to these potential third-variable confounder. This is also inline with our observation of significant associations between DA and GAP in an age-homogeneous COBRA sample. That said, we found that entropy, indeed, mediates the direct link between DA and BAG, suggesting that individuals with lower DA exhibit greater regional variability, and in turn larger BCG.
  
  These results have now been embedded into the manuscript. We also highlighted that age has been controlled for in reported correlation and mediation analyses.
  
  Recommendations for the authors:
  
  Reviewing Editor Comment:
  
  We particularly recommend that the authors: (a) compare the performance of their deep learning model with other baseline models, and (b) adjust for cognitive performance within the brain-cognition gap. These steps would strengthen the evidence base.
  
  We thank the editor for their comments. As for the first comments, our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back), with the aim of identifying the states that best capture individual differences across domains. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach. We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework, we did not conduct a comprehensive benchmark across all available machine learning methods, nor was this the aim of the present study. Accordingly, we have adjusted the text to avoid implying methodological superiority beyond the scope of our analyses. Finally, we have added the following paragraph to the discussion:
  
  “Our study used a deep neural network architecture that features dense connections and incorporates an attentional mechanism. While our findings demonstrate that a deep learning framework can provide reasonable predictive accuracy, it is important to note that other machine learning approaches (e.g., tree-based models) may offer comparable predictive power, as suggested by prior benchmarking work (29, 30).
  
  Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back) to identify the states that best capture individual differences across domains. The relative performance of deep learning and other non-linear approaches depends on multiple factors, including sample size, model architecture, feature representation, and domain-specific characteristics of the prediction target. In this context, deep learning was employed as a flexible framework capable of modeling high-dimensional functional connectivity patterns across cognitive states, rather than as a claim of inherent methodological superiority. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.”
  
  As for the second comment, we followed the instructions by Reviewer 1. In response to their request, we first examined the relationship between the Brain-Cognitive Gap (BCG) and the cognitive measure itself. Surprisingly, we did not find any significant relationship in either the DyNAMiC sample (r =0.01, p =0.939) or the COBRA sample (r =0.01, p =0.89) (see Author response image 1).
  
  We then conducted additional analyses, splitting the sample into high and low EM performers, and compared their levels of physical activity and Framingham cardiovascular disease (CVD) risk scores. We found no significant difference in physical activity (DyNAMiC: p =0.56, 95% CI: –14.99 - 8.13; COBRA: p =0.29, 95% CI: –3.54 - 1.05) or Framingham CVD risk score (DyNAMiC: p =0.11, 95% CI: –1.08 - 10.72; COBRA: p =0.41, 95% CI: –1.86 - 4.58) between high and low EM perfprmers. Given the significant difference in physical activity and Framingham CVD risk score between positive and negative BCG groups, our results support that BCG provides unique information, beyond the observed cognitive measure (episodic memory score), regarding factors that contribute to cognitive resilience. These results have been added to Section 2.4, and Figure 3 has been updated.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The top and bottom triangles of the saliency maps, particularly in Figure 2, do not look symmetrical (this is most notable in the hotspot representing the between-network correlation of DMN and FPN). What is going on here? Was the image compressed or altered in some way, or is this a visual artifact of the interpolation method?
  
  We appreciate the reviewer’s insightful comment. Minor differences in the saliency maps between the upper and lower triangles of the FC matrix can arise due to several factors. For instance, Grad-CAM generates saliency maps at the resolution of the convolutional feature maps, which are then upsampled to match the input matrix dimensions. We initially used the default bilinear interpolation, which may have introduced slight asymmetries or blurring, resulting in interpolation artifacts. In response, we have reprocessed the saliency maps using spline interpolation in MATLAB. The updated saliency figures have been included in the revised version of the manuscript.
  
  (2) Pages 11-12. Please make it explicit in the text that the brain gap-education association was not significant in the COBRA dataset.
  
  Thanks for pointing this out. We added the following sentence to the discussion.
  
  “Note that the association with education was significant only in the DyNAMiC sample and did not reach significance in the COBRA dataset.“
  
  (3) Please overlay individual data points onto the boxplots in Figure 3 so that we can appropriately evaluate the data distributions.
  
  Figure 3 has now been updated.
  
  (4) Section 2.6: Was entropy calculated on movie-watching data, resting data, or all fMRI data? Please specify.
  
  We thank the reviewer for pointing this out. We have updated the text (Section 2.6) to clarify that entropy was calculated from the resting-state data. We intended to examine the mediating role of regional variability in the relationship between dopamine and the BCG of the winning model for episodic memory. Because resting state and movie-watching were the winning conditions for EM prediction, but movie-watching was not available in COBRA, we focused on entropy during rest, which exists in both datasets.
  
  (5) Was entropy during the resting state correlated with entropy during the task state, across individuals?
  
  We agree this is an interesting question. However, investigating the correlation of entropy between rest and task states goes beyond the scope of the present study. Our aim here was to test whether regional variability mediates the effect of dopamine on the BCG. Specifically, we examined whether individuals with lower striatal D1DR show higher local variability, which in turn relates to less accurate prediction and a larger gap. We assessed both the relationship between D1DR and entropy and the association between entropy and the gap, and these results have now been added to the manuscript (see also our response to Reviewer 1’s public comment).
  
  Reviewer #2 (Recommendation for authors):
  
  (1) The lack of baseline models to benchmark the predictive performance of their DenseNet models makes their results hard to interpret. This problem is quite common across ML literature. For instance, many DL-based algorithms were developed for tabular data without proper benchmarking against other ML algorithms. When they were properly tested, most weren't better than many tree-based ML algorithms (e.g., https://proceedings.neurips.cc/paper_files/paper/2022/file/0378c7692da36807bdec87ab043cdadc-Paper-Datasets_and_Benchmarks.pdf). I can see that a similar problem might happen here.
  
  For this particular manuscript, the authors made strong statements without doing a proper benchmark, e.g., from the discussion, "Indeed, the predictive power in the current study is stronger than for CPM-based predictions reported before." And "Unlike the BrainNet convolutional neural network, which focuses on staged transformations, our densely connected model promotes extensive feature reuse, possibly leading to more robust feature extraction." I hope to see the performance of the proposed algorithm against 1) other DL algorithms (e.g., fully-connected neural networks, BrainNetCNN, Graph CNN (GCNN), temporal CNN, GRU, and LSTM, see https://doi.org/10.1016/j.neuroimage.2019.116276 and https://doi.org/10.1002/hbm.26415), 2) ML algorithms (e.g., SVR with linear, RBF and polynomial kernels, Elastic Net, XGBoost, random forest, CPM), 3) data reduction algorithms (e.g., PCA regression, Partial Least Square). The results of this benchmark will substantiate the claims made by the authors.
  
  Our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach. We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework, we did not conduct a comprehensive benchmark across all available machine learning methods, nor was this the aim of the present study. Accordingly, we have adjusted the text to avoid implying methodological superiority beyond the scope of our analyses. Finally, we have added the following paragraph to the discussion:
  
  “Our study used a deep neural network architecture that features dense connections and incorporates an attentional mechanism. While our findings demonstrate that a deep learning framework can provide reasonable predictive accuracy, it is important to note that other machine learning approaches (e.g., tree-based models) may offer comparable predictive power, as suggested by prior benchmarking work (29, 30). Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back) to identify the states that best capture individual differences across domains. The relative performance of deep learning and other non-linear approaches depends on multiple factors, including sample size, model architecture, feature representation, and domain-specific characteristics of the prediction target. In this context, deep learning was employed as a flexible framework capable of modeling high-dimensional functional connectivity patterns across cognitive states, rather than as a claim of inherent methodological superiority. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.”
  
  (2) From Figure 6b, it looks like the functional connectivity matrices were converted to different images, and each of the four images (in grey, blue, yellow, and red) was treated as a separate channel. What are these grey, blue, yellow, and red images?
  
  In our study, the inputs to the deep learning models were subject-specific FC matrices of size 273×273. To augment the data, we created different versions of each FC matrix by reordering specific brain networks within the matrix. To visualize that the inputs were augmented, we used different color codings (grey, blue, yellow, and red) in Figure 6b. These colors were intended solely to represent different augmented versions of the same subject’s FC matrix. They were not treated as separate channels in the model. To avoid any confusion or misinterpretation, we have revised this part of the figure and now use only grey coloring to represent the augmented FC matrices.
  
  (3) The differences in performance between within vs. outside studies might simply be due to the fact that the models trained from DyNAMiC captured the brain variation due to age, which is also related to cognitive abilities. I was wondering if age is controlled for, would performance be more similar across the studies? The authors should provide the performance of models that are controlled for age.
  
  We initially conducted partial correlation between FC features and cognitive measures while controlling for age. This is further supported by the fact that the model trained on the age-heterogeneous DyNAMiC sample provided a fairly reasonable prediction in the age-homogeneous COBRA dataset, particularly for working memory (see figure 2d). Moreover, in our post hoc analyses, we additionally controlled for age when examining associations, for example, between GAP and dopamine measures.
  
  (4) Related to point (3), from the discussion, "Validation outcomes thus affirm that the models, particularly those constructed from rest data, are robust to the particulars of the dataset." The performance dropped around half, so I am not sure if this conclusion is warranted.
  
  We thank the reviewer for raising this point. The prediction performance indeed dropped for episodic memory when models trained on the DyNAMiC sample were applied to the COBRA sample, whereas performance for working memory remained nearly identical across datasets. Although both EM and WM are sensitive to age, the divergence in cross-dataset performance suggests that factors beyond age alone may contribute to these differences. To address this, we have revised the discussion as follows:
  
  “Differences between the DyNAMiC and COBRA datasets make cross-dataset prediction a harder problem, as the age ranges of samples significantly vary, and prior studies highlight the importance of individual characteristics like age in predicting behavior from FC (33). In line with this, model performance decreased when predicting EM in the COBRA sample whereas prediction of WM remained largely unchanged. Thus, validation outcomes suggest that the models, particularly those predicting WM, show robustness across datasets, whereas the reduced EM performance highlights potential data-specific influences that limit generalizability.”
  
  (5) Please report the degree of freedom in all of the statistical analyses. Was the Mann-Whitney U test done on the bootstrapped r? If so, the degree of freedom was arbitrarily set by the number of bootstrapping, and hence the p-value can be higher or lower depending on the number of bootstrapping. This could lead to misleading conclusions.
  
  We appreciate the reviewer’s comment and agree that applying statistical tests directly to bootstrapped samples can lead to inflated or misleading p-values, as the degrees of freedom are determined by the number of bootstrap iterations rather than the actual number of independent observations.
  
  In our analysis, the Mann-Whitney U test was applied to 1000 bootstrapped correlation coefficients (r) for each model. While this number is relatively low and was chosen to limit overestimation of significance, we recognize that these bootstrapped samples are not independent, and thus the use of a Mann-Whitney U test can still be problematic. To address this concern, we have revised our statistical analysis. Rather than applying the Mann-Whitney U test to the bootstrapped r distributions, we now compute the difference in correlation coefficients (Δr = r<sub>actual</sub> − r<sub>rest</sub>) for each bootstrap iteration. We then calculate a 95% confidence interval for Δr. If this interval does not include zero, we consider the difference statistically significant. This approach avoids artificially inflating the sample size and adheres more closely to proper statistical inference.
  
  We have updated the Methods (the following text) and Results sections accordingly and clearly stated the limitations regarding the degrees of freedom for all tests.
  
  “For the bootstrap-based comparison of model performance (bootstrap resampling with 1000 iterations), no test statistic with an associated degree of freedom is reported. Instead, statistical inference is based on the bootstrap distribution of the difference in correlation coefficients (Δr) and its 95% confidence interval. As bootstrap confidence-interval–based inference does not rely on an analytic sampling distribution, degrees of freedom are not defined for this procedure.” This has now been explicitly stated in the Methods section to avoid ambiguity.
  
  In the result section, we have reported with corresponding CI.
  
  (6) For predictive performance, the correlation was reported in the table, while R<sup>2</sup> is reported in the text. This is confusing. Also, could you clarify if the R<sup>2</sup> is calculated using the sum square definition, not Pearson r squared? If Pearson r squared was used, then R<sup>2</sup> of a negative Pearson r would be positive, which is misleading (see 10.1001/jamapsychiatry.2019.3671). Also, other performance indices apart from Pearson r and R² should be reported (e.g., MSE and MAE, again see 10.1001/jamapsychiatry.2019.3671). This will allow a better understanding of the models' performance.
  
  We thank the reviewer for this helpful comment. We acknowledge the inconsistency in reporting predictive performance metrics and have revised the manuscript for clarity. In the text, we have reported the r value, whereas in the table, we have reported r<sup>2</sup> using the sum-of-squared definition. Specifically, we now consistently report Pearson correlation (r), mean squared error (MSE), and mean absolute error (MAE) across both the text and Tables 1 and 2.
  
  Regarding r<sup>2</sup>, we confirm that it was calculated using the sum-of-squares definition (i.e.,
  
  rather than as the square of the Pearson correlation coefficient. This ensures that negative correlations do not result in misleading positive R<sup>2</sup> values, as pointed out by the reviewer and discussed in Poldrack et al. (2020). All performance metrics (r, r<sup>2</sup>, MSE, and MAE) are now reported in Tables 1 and 2 to allow a more comprehensive and interpretable comparison of model performance.
  
  We have included a description of the method under section 4.9. Statistical significance analysis.
  
  (7) Could you clarify how data are standardized across training, validation, and tests (including Z-standardization for the cognitive tests)? This is to prevent data leakage.
  
  Thanks for the comments. We did standardization the cognitive test from both training and test, separately.
  
  We have added the following paragraph to the method section:
  
  “A composite score of performances across the three tests was calculated and used as the measure of the cognitive domain in question (i.e., episodic memory, working memory). For each of the three tests, scores were summarized across the total number of trials. The three resulting sum scores were z-standardized and averaged to form one composite score for each domain. The standardization has been carried out independently for the training (DyNAMiC) and test (COBRA) samples.”
  
  (8) There is really no ground truth to confirm that Grad-CAM provides actual feature importance used by the models. Perhaps the authors should compare that with Haufe transformation, which is commonly used in the predictive model for cognition (e.g., https://doi.org/10.1016/j.neuroimage.2021.118648 and https://doi.org/10.1016/j.neuroimage.2023.120115).
  
  We appreciate the reviewer’s comment and the suggested references. The Haufe transformation is primarily applied in traditional machine learning models, particularly in cognitive neuroscience, to interpret linear predictive models by mapping classifier weights back to the input space. However, its direct applicability to deep learning models, especially convolutional neural networks, remains an open research area with no widely established methodologies. Furthermore, the Haufe transformation does not provide feature importance in the same manner as Grad-CAM. Grad-CAM highlights spatial regions within an image that contribute to a model’s decision, making it particularly useful for interpreting convolutional networks in vision tasks. In contrast, the Haufe method offers a weight transformation that is more suited for understanding linear models and may not be as intuitive for feature attribution in complex hierarchical representations such as those learned by deep neural networks.
  
  While we acknowledge that Grad-CAM, like other interpretability methods, does not provide absolute ground truth validation for feature importance, it remains one of the most widely used and validated techniques for deep learning interpretability, particularly in medical imaging applications. Given its integration with frameworks such as Keras and TensorFlow and its ability to provide spatial attributions aligned with domain knowledge, we believe it is a suitable choice for our study. Future work may explore additional interpretability techniques, including adaptations of the Haufe transformation if applicable to deep learning architectures.
  
  We have added more details on Grad-CAM implementations in the Method.
  
  (9) Related to Grad-CAM, "These edges, indicated by a salience intensity of {greater than or equal to}.5, exert a significant influence on the model (Figure 1f)." What does 'significant' in this context mean? And how did the authors come up with the .5 threshold? Is it based on permutation or bootstrapping tests?
  
  We appreciate the reviewer’s comment and the opportunity to clarify our approach. In this context, the term "significant" refers to the regions' relative contribution to the model’s decision, as shown by the Grad-CAM saliency map. However, to avoid implying statistical testing, we will revise the term to "highly contributing."
  
  Regarding the 0.5 threshold, this value was selected empirically based on the normalized Grad-CAM activation values, where saliency scores range between 0 and 1. A threshold of 0.5 was used as a heuristic to highlight regions with relatively strong activation. However, this was not determined through statistical methods such as permutation or bootstrapping tests. We recognize the importance of rigorous threshold selection and will clarify this in the text. Future work could incorporate statistical methods to define thresholds more objectively.
  
  We have included the following text in the Method section:
  
  ”Grad-CAM saliency maps were interpreted qualitatively, with a heuristic threshold (≥ 0.5) applied to highlight regions with relatively higher contribution to the model’s predictions. These values do not reflect statistical significance and should therefore be interpreted descriptively.”
  
  (10) Still related to the saliency map, I believe the upper and lower triangles of the functional connectivity matrix are the same. If so, why are there some differences in saliency? While the difference is not prominent, this might affect the accuracy of Grad-CAM.
  
  Minor differences in the saliency maps between the upper and lower triangles of the FC matrix can arise due to several factors. For instance, Grad-CAM generates saliency maps at the resolution of the convolutional feature maps, which are then upsampled to match the input matrix dimensions. We initially used the default bilinear interpolation, which may have introduced slight asymmetries or blurring, resulting in interpolation artifacts. In response, we have reprocessed the saliency maps using spline interpolation in MATLAB. The updated saliency figures have been included in the revised version of the manuscript.
  
  (11) Why did the authors only report the cross-study for EM on rest, and for WM on n-back? This is a bit unexpected since COBRA has both rest and n-back. If there is no good justification, please report both.
  
  We focused on reporting cross-study results for EM using rest because rest was the winning condition for predicting EM in the DyNAMiC sample. Importantly, n-back did not significantly predict EM in DyNAMiC, and rest did not significantly predict WM. For this reason, we highlighted only the conditions that showed meaningful predictive power in the original analyses.
  
  (12) Are codes, trained models, and data available? To ensure transparency and reproducibility, I hope to see the code from preprocessing to modeling and statistical analyses.
  
  The analysis code is openly available on our GitHub page https://github.com/MorEsm/AI-based-Prediction-of-Cognitive-Function. Due to ethical considerations and GDPR restrictions in the European Union, we are not permitted to publicly share the raw data. However, we can provide detailed information about preprocessing steps and analysis pipelines to facilitate reproducibility.
  
  (13 &14) The authors did not appropriately control for regression-toward-the-mean and the influence of the working memory itself when calculating the brain cognition gap. This is commonly done to brain age (see https://doi.org/10.7554/eLife.87297.4, https://doi.org/10.1002/hbm.25533, https://doi.org/10.1016/j.nicl.2020.102229, https://doi.org/10.3389/fnagi.2018.00317). Otherwise, the brain cognition gap still depends on the cognition/working memory score itself. Based on Tetereva et al., "If, for instance, Brain Age was based on prediction models with poor performance and made a prediction that everyone was 50 years old, individual differences in Brain Age Gap would then depend solely on chronological age (i.e., 50 minus chronological age)." Because of this, Tetereva and colleagues found that the 'uncorrected' brain age gap that predicted chronological age the worst became the best index to predict fluid cognitive abilities. This shows the pitfall of the 'uncorrected' brain age gap. You can apply the same logic to the brain cognition gap.
  
  (14) Additionally, another way to show the unique contribution of brain cognition, over and above cognition per se, is to add both brain cognition and cognition together to predict physical activity, education, and cardiovascular risk.
  
  We thank the Reviewer for raising this important point. In response to their request and also the request from Rev. 1, we first examined the relationship between the Brain-Cognitive Gap (BCG) and the cognitive measure itself. Surprisingly, we did not find any significant relationship in either the DyNAMiC sample (r =0.01, p =0.939) or the COBRA sample (r =0.01, p =0.894) (see Author response image 1).
  
  We then conducted additional analyses, splitting the sample into high and low EM performers, and compared their levels of physical activity and Framingham cardiovascular risk scores. We found that no significant difference in physical activity (DyNAMiC: p =0.56, CI: -14.99 – 8.13; COBRA: p =0.29, CI: -3.54 – 1.05) or Framingham CVD risk score (DyNAMiC: p =0.11, CI: -1.08 – 10.72; COBRA: p =0.41, CI: -1.86 – 4.58) between high and low EM perfprmers. Given the significant difference in physical activity and Framingham CVD risk score between positive and negative BCG groups, our results support that BCP provides unique information, beyond cognitive measure, regarding factors that contribute to cognitive resilience. These results have been added to Section 2.4, and Figure 3 has been updated.
  
  (15) Related to the brain age gap, the brain cognition gap is actually just another way to quantify how generalizable models are to another sample, similar to MAE or MSE. If the models built from DyNAMiC don't fit well with samples from COBRA, you will get a higher (i.e., wider) brain cognition gap, which means a poor fit. The authors should discuss this interpretation - should your biomarker's performance be due to a fit of the model?
  
  We appreciate this insightful comment. We agree that BCG can be interpreted not only as a marker of individual differences and resilience factors but also as a measure of model fit, analogous to error metrics, such as MAE or MSE. A higher gap may, in part, reflect poorer generalizability of models across samples. We have now revised the Discussion to explicitly acknowledge this alternative interpretation and to emphasize that BCG should be viewed both as a candidate biomarker and as a reflection of model performance.
  
  We added the following paragraph in the discussion:
  
  “An important caveat is that BCG can also be conceptualized as an error metric, similar to mean absolute error or mean square error, reflecting the extent to which models trained in one sample generalize to another. From this perspective, a larger gap may not only indicate individual differences related to resilience factors and dopaminergic function, but also reduced model fit or generalizability across datasets. Thus, BCG likely reflects a combination of meaningful biological variability and methodological variance.”
  
  (16) It is unclear why the authors binarized the brain cognition gap when predicting physical activity, education, and cardiovascular risk, and not doing so with the striatal D1DR. It is rarely a good idea to binarize a continuous variable (see 10.1136/bmj.332.7549.1080). In this case, people who had a bigger negative brain cognition gap were treated equally to people who had a smaller negative brain cognition gap. I also do not think it is necessary to separately analyze positive and negative gaps. Perhaps the authors should correlate the corrected brain cognition gap with physical activity, education, and cardiovascular risk and provide scatter plots and effect sizes.
  
  Following the reveiwer suggestion, we directly correlated BCG with physical activity and cardiovascular risk. Our results confirmed our initial analysis that individuals with a negative gap exhibited lower physical activity and higher Framingham CVD risk across both COBRA and DyNAMiC datasets. We have reported these results on page 10.
  
  Author response image 5.
  
  (17) Given that the motivation is to move away from brain age, the authors should benchmark the corrected brain cognition gap against the corrected brain age gap, as well as against the performance when directly predicting physical activity, education, and cardiovascular risk from the functional connectivity metrics.
  
  Author response image 6.
  
  We agree that benchmarking BCG against BAG in predicting lifestyle and vascular risk factors would be valuable. We have calculated adjusted BAG and related it to lifestyle and vascular risk factors. Interestingly, we did not find any significant association, suggesting that BCG might be more sensitive to cognitive resilience. However, this investigation was beyond the scope of the present study. Our aim was not to compare BCG with BAG, but rather to examine whether BCG provides information beyond cognition itself. We also note that introducing BAG would open a separate line of investigation, namely, which cognitive state (rest, movie-watching, n-back) best estimates biological age. While this is an interesting question in its own right, addressing it here would considerably broaden the scope and complexity of an already dense manuscript. To prevent misunderstanding, we have clarified this point in the Discussion and added a caveat noting that future work should explicitly benchmark these approaches. That said, if the Reviewer and/or the Editor incline to add these additional findings into the manuscript, we are open to doing so in a revision.
  
  We have added the following sentence to the Discussion.
  
  “While our focus was to investigate whether the brain–cognition gap provides information about factors contributing to cognitive resilience, we acknowledge that benchmarking BCG against the brain-age gap in predicting lifestyle and vascular risk factors would be valuable. However, addressing this question lies beyond the scope of the present study, and future work should systematically compare these approaches.”
  
  (18) Why was only the working memory score used to create brain cognition, and not episodic memory as well? Including both could provide a more comprehensive measure.
  
  We initially attempted to predict both episodic memory (EM) and working memory (WM). However, EM prediction was only reliable within and across samples for the resting state, whereas WM prediction generalized most strongly from the movie-watching condition. Because COBRA does not include a movie-watching paradigm, we could not evaluate WM prediction across datasets. For this reason, we focused on EM when examining the brain–cognition gap.
  
  (19) The PET mediation analysis seemed to come out of the blue. Is there existing literature showing the relationship between striatal D1DR and cognition? If so, did the authors find a similar relationship in the current data? I also suggest rewriting this section to strengthen the justification for the PET mediation analysis.
  
  We have previously conducted studies in which DA found to be associated with memory (Johansson et al., 2023, Nyberg et al., 2016).
  
  The third aim of our study was to examine whether DA integrity is implicated in brain–cognition gaps (BCG), which we propose as a marker of cognitive resilience. In line with this aim, we found that lower DA receptor availability was associated with larger BCGs (Figure 4). We then asked whether this relationship is mediated by functional signal variability, such that lower DA is linked to reduced signal-to-noise ratio (i.e., greater entropy in functional connectivity), which in turn contributes to less reliable prediction of cognition and, consequently, larger BCGs. Our mediation analysis supports this pathway (see also our reply to Reviewer 1, Comment 6).
  
  Thus, our mediation was not designed to test whether DA predicts episodic memory performance directly, nor whether BCG mediates such a relationship. Instead, we specifically investigated whether the effect of DA on BCG operates through functional variability. We agree that future work could extend our approach by directly examining whether BCG mediates the link between DA and cognitive outcomes. However, in the present study, our primary focus was on testing the mechanistic pathway of DA → entropy → BCG.
  
  Minor recommendations:
  
  (1) Task-based connections are not truly task-based, as they are around 70-80% related to the resting state, capturing non-task-specific functional connectivity. Task-based connections should refer to techniques that derive task-related connectivity, such as psychophysiological interaction and beta-series correlation. Perhaps use terms like "functional connectivity during tasks."
  
  Thank you. This has been corrected throughout the manuscript.
  
  (2) Are there really two studies? The same MRI was used with the same configurations, and participants were from the same city. The only difference is the age range. It may be more appropriate to refer to this as "across age groups" rather than "cross-datasets."
  
  Thank you for this comment. While the two samples share some similarities, there are also several marked differences beyond age range. For example, Movie-watching was administered in DyNAMiC but not collected in COBRA. The resting-state fMRI sequence was 12 minutes in DyNAMiC but only 6 minutes in COBRA. Moreover, DyNAMiC included dopamine D1-receptor PET, whereas COBRA assessed dopamine D2-receptor availability. Even the questionnaires used to measure physical activity differed between the two studies. Given these methodological and measurement differences, we believe that referring to them as “cross-datasets” rather than “across age groups” more accurately captures the distinction.
  
  (3) What kind of movie is "Cockpit"? Can you explain? Different movies may elicit different patterns of connectivity.
  
  We apologize for not providing information about the movie, which has been presented in our recent work (Johansson et al., 2023).
  
  The participants’ reactions to the content of the movie were not monitored, but the clips were selected to be as neutral in their content as possible. The content of the movie: Following his termination as a pilot and the end of his marriage, Valle embarks on a quest to secure new employment. Faced with desperation in the job market, he resorts to disguising himself as a woman with the intention of obtaining a position at a company specially seeking a female pilot.
  
  This information is added to the method section.
  
  “During the fMRI session, participants viewed a 12-minute segment from the Swedish comedy film Cockpit (2012). We did not monitor participants’ responses to the movie, and the chosen clips were selected to be relatively neutral in emotional content. The storyline follows Valle, a recently fired pilot whose marriage has ended, as he struggles to find new employment. In a desperate attempt to secure a job at an airline specifically recruiting a female pilot, he presents himself as a woman.”
  
  (4) There is a typo in the equation numbering (i.e., two equations are designated as #1).
  
  We have now corrected the typo.
  
  (5) From the discussion: "Importantly, this prediction generalizes across conditions." This is not surprising given the similarity between conditions, with around 70-80% variance.
  
  We agree with the reviewer that the high similarity of FC across states likely increases the chance of cross-condition generalizability. However, this generalization is not guaranteed for all models. For example, the model trained on FC during movie-watching successfully predicted episodic memory during rest, but it did not generalize to episodic memory during the n-back condition, although movie-watching and n-back FC patterns are themselves highly correlated. Thus, the observed generalization is meaningful in demonstrating that not all models transfer equally well across states.
  
  That said, we have added the following sentence to the Discussion:
  
  “Importantly, this prediction generalizes across conditions and datasets, suggesting that features derived from resting state FC serve as a relatively stable marker of individual differences in EM, though with reduced strength in COBRA. While such generalization is partly facilitated by the similarity of functional connectivity across states, it is not a trivial outcome. For instance, the model trained on movie-watching data generalized to EM prediction during rest but failed to do so for the n-back condition, even though movie-watching and n-back connectivity patterns are themselves highly correlated. This indicates that successful generalization depends not only on shared variance across states but also on the cognitive processes most relevant to the target behavior.”
  
  (6) It might be helpful to include some figures for the cognitive tasks used. The description is a bit hard to follow without visual aids.
  
  Thanks for the comment. We have had a figure describing this in the initial paper about DyNAMiC (Nordin et al., 2022). We have added the Supplementary Figure (Fig S3) in the manuscript.
  
  Fig S3. Overview of the cognitive tests included in the DyNAMiC study. Adopted from Nordin et al. with permission.
  
  (7) It may not be appropriate to use the term "cross-validation" here, as one dataset was used for testing and the other for training, but not vice versa (so no "cross" per se).
  
  We thank the reviewer for pointing this out. We agree that the term “cross-validation” is not precise in this context, since we trained the model in one dataset and tested it in another without performing the reverse. We have revised the manuscript to use the term “external validation” instead of “cross-validation” to more accurately describe our cross-dataset approach.
  
  (8) I don't have access to the supplementary materials or code/data, so all of the comments here are based on the main text.
  
  We have added the supplementary materials and inserted the GitHub link to the code.<br />
  
  Reviewer #3 (Recommendations for the authors):
  
  I suggest benchmarking against other simpler algorithms and controlling for memory in the brain cognition gap analyses.
  
  The authors might also want to simplify some aspects of the paper. There is a lot going on, which leaves less space to go into enough details for some analyses to warrant claims in the discussion. For example, the authors only compare the deep net to CPM and kernel ridge based on the literature. Direct comparisons would be needed.
  
  Thanks for the comment. We have made an attempt to address the concerns outlined in the public recommendation. Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back), with the aim of identifying the states that best capture individual differences across domains. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach. We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework, we did not conduct a comprehensive benchmark across all available machine learning methods, nor was this the aim of the present study. Accordingly, we have adjusted the text to avoid implying methodological superiority beyond the scope of our analyses. Furthermore, we have controlled for memory as suggested by the reviewer and outlined in response to reviewer 1.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.11.19.624334v2
www.biorxiv.org www.biorxiv.org

Most Beefalo cattle have no detectable bison genetic ancestry

1
1. Public_Reviews 09 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This study used whole genome data to investigate Beefalo ancestry for the first time, filling the gap in the field of Beefalo ancestry. The authors used preserved semen samples to generate genomic data on 47 registered Beefalo and 3 bison hybrids, further questioning the ABA's stated goal of ⅜ bison ancestry. In addition, the authors also show that ancestry profiles of Beefalo and bison hybrid genomes are consistent with repeated backcrossing to either parental species, demonstrating the value of genomic information in examining gene flow between species in the genus Bison. This is an interesting study that still has some major weaknesses that exist, but overall, the work demonstrates the utility of genomic information in validating specific breeding claims for a more complete understanding of gene flow and genetic variation among bovine species.
  
  We thank the reviewer for their thoughtful assessment of our work.
  
  Strengths:
  
  Numerous genetic analysis methods such as PCA, ADMIXTURE, F4 ratios, and local ancestry inference techniques revealed that no single Beefalo set meets the ancestry requirements set by the American Beefalo Association (ABA) and some beefalo had detectable indicine cattle ancestry.
  
  Weaknesses:
  
  While this study contributes to our knowledge of Beefalo ancestry, there are some key issues that need to be addressed in terms of analysing the specific results as well as writing the article.
  
  We have followed the reviewer’s suggestions for improving our study in detail (specified below), and appreciate their close reading of the manuscript.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Shapiro et al. set out to verify the American Beefalo Association's claim that Beefalo cattle possess 37.5% bison ancestry. They employ a comprehensive range of well-established population genomics methods to estimate ancestry in these hybrid populations, including PCA, ADMIXTURE, D and F statistics, and local ancestry inference. Their findings conclusively demonstrate that most Beefalo lack the claimed bison ancestry, with only 8 out of 47 samples showing any detectable bison ancestry, ranging from 2 - 18%.
  
  We thank the reviewer for their thoughtful assessment of our work.
  
  Strengths:
  
  The primary strength of this analysis lies in the comprehensive dataset available to the authors, which includes important foundational Beefalo individuals and various reference populations. The rigorous and multi-faceted methodological approach employs several well-established techniques in population genomics for detecting and measuring admixture. Each method used has a firm basis in the field, providing consistent and robust results. The authors' approach of using PCA to initially assess the data within a global context, followed by more specific analyses using ADMIXTURE and D-statistics, provides a clear and logical progression of evidence. The presentation of these results in figures is particularly effective, clearly illustrating the key findings of the study. Additionally, the examination of both autosomal and sex chromosome ancestry offers a more complete understanding of Beefalo genetic composition and the mechanics of bison-cattle hybridisation.
  
  Weaknesses:
  
  One limitation of this analysis is the relatively low coverage (~2x) of many Beefalo samples. However, the authors have taken steps to mitigate biases that may arise from this. Another weakness is the limited sampling of contemporary Beefalo populations, as the study focuses primarily on historical samples. This may limit our understanding of how Beefalo genetics may have changed over time.
  
  The reviewer is correct that the low coverage obtained for many Beefalo is one potential limitation, although we believe that the downsampling experiment we performed (Fig. S4) shows that this level of coverage is appropriate for summarizing species-level ancestry across Bos, as the reviewer notes.
  
  Sampling contemporary Beefalo individuals would be valuable, though as the focus of our study was to understand the origins of bison ancestry in Beefalo, we prioritized sampling individuals which played an important role in establishing the breed. We also note that contemporary Beefalo breeding involves crossing between Beefalo individuals or backcrossing to cattle, with no additional bison ancestry input since the formation of the Beefalo. As such, sampling individuals that existed close to the breed’s founding should provide the most insight into bison ancestry in Beefalo.
  
  Appraisal:
  
  The authors have clearly achieved their primary aim using a rigorous and comprehensive methodology. Their extensive dataset and multi-faceted analytical approach provide strong support for their conclusions. The study not only addresses its main research question but also reveals unexpected insights into Beefalo genetics, particularly the presence of zebu ancestry.
  
  Discussion:
  
  This study is valuable for several reasons beyond its primary findings. First, it definitively addresses and refutes the claim of 37.5% bison ancestry in Beefalo, providing crucial information for those studying these interspecies hybrids and the viability of their offspring. Second, it reveals the unexpected presence of zebu ancestry in many Beefalo, raising intriguing questions about the breed's development and the potential role of zebu cattle in achieving desired traits. This finding suggests that the distinctive appearance of Beefalo may be due in part to zebu admixture rather than bison ancestry. Third, the study highlights the significant barriers to admixture between bison and cattle, both in controlled breeding programs and potentially in wild populations. This has important implications for conservation genetics and our understanding of gene flow between these species. Lastly, the study demonstrates the power of genomic analysis in verifying breed claims and understanding the complex history of domestic animal breeds. These findings open new avenues for research in bovine genomics, breed development, and the dynamics of interspecies hybridisation.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  I really like this topic and study. But I think much can be more focused and tightened up. All the components are here - just some more refining to really make the storyline clear, the journey of discovery, and the impact of such knowledge.
  
  We thank the reviewer for their thoughtful assessment of our work.
  
  Strengths:
  
  The authors dive directly into the question of genomic ancestry as compared to the breed club's reported ancestry with heavy, quantitative data and critical analytical methods. The questioning line is direct and does not meander. The reader learns about the challenges of breeding associations, and values of understood ancestry, and presents a clear need of re-evaluating the breed standards and expectations of beefalo (if ancestry is indeed the primary goal instead of a phenotype-driven breed mission).
  
  Weaknesses:
  
  Much of the quantitative results are only referred to in the main text with qualitative language. Please incorporate more written quantitative results to highlight evidence that underlines the study narrative because it is quite an interesting study!
  
  The reviewer highlights an important point, and we agree that the qualitative language used to describe the results was generally lacking. We have now described the results quantitatively throughout the manuscript where possible.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) This study is not the first to question claims surrounding bison ancestry in the breed and is the sample size too small to be representative of the entire genetic structure of Beefalo?
  
  The reviewer correctly points out that this study is not the first to address uncertainty in the amount of bison ancestry present across beefalo. All earlier studies, to our knowledge, have been highlighted in the introduction and discussion (Lenoir and Lichtenberger, 1978 and Stormont et al, 1986). However, these studies examined a narrow range of Beefalo sources and used older methods (karyotyping and blood typing), such that comprehensive statements about the proportion of bison ancestry in Beefalo could not be made.
  
  We also agree that an appropriate sampling scheme is crucial for making definitive statements about Beefalo ancestry across the breed. As Beefalo breeding typically involves breeding select “full-blood” individuals with cattle, the ancestry across contemporary Beefalo is likely complex, with the cattle component coming from a wide range of breeds. Therefore, our sampling emphasized “full-blood” representatives, especially those that were involved in the founding of the breed and from which later Beefalo descend. This involved an exhaustive survey of the Beefalo individuals contained within the USDA’s National Animal Germplasm Program. Although we did not extensively evaluate current Beefalo diversity, we believe this approach is most suited for characterizing bison ancestry within Beefalo, as bison ancestry is maintained primarily through the continued use of genetic material from these “full-blood” individuals rather than repeated hybridization between bison and cattle.
  
  (2) Although genomic information is important for breeding research, this requires quality of data. The coverage of the data used in this study was mainly ~2X, and although multiple methods of analysis gave similar results, the ability to identify rare variants (e.g. insertions or deletions of long segments of the genome) may be limited at low coverage, affecting the confidence of the results.
  
  This is an important consideration, and we agree with the reviewer that the sequencing depth obtained for most individuals in our study precludes accurate genotype calling. Therefore, we did not attempt to perform traditional genotype calling. Rather, we used a pseudohaploid calling approach in which a random base was selected to represent the genotype at each position for each individual, using a pre-ascertained set of variants discovered in gaur, a closely related outgroup to bison and cattle. This pseudohaploid approach is common in other situations where coverage is low, for example in analyzing ancient DNA.
  
  Furthermore, our ancestry analyses focused on biallelic SNPs which were discovered in gaur and we did not attempt to call structural variants, given the limitations in coverage. As this outgroup ascertainment approach seeks to target SNPs which were polymorphic in the ancestor of both bison and cattle, which should yield unbiased results in population genetic analyses, we were less interested in discovering rare variation within the species and populations we examined here.
  
  Finally, we performed downsampling experiments comparing low coverage read data to genotypes called from high coverage data, and obtained consistent results between low and high coverage analyses using read-level data and called genotypes (Fig. S7).
  
  (3) Missing from the conclusions is the very important presentation of the results of genomic calling, the basics of what these data look like, coverage histograms, number of SNPs, categorization, annotations, and so on. These are necessary prerequisites for subsequent population analysis.
  
  The reference to “5.29M” on page 14 has been replaced with the exact number of SNPs used in analyses (5,291,534). The average sequencing depth for each sample is also included in Table S1.
  
  (4) The manuscript mentions "most" in a number of places, but can the authors give an accurate number based on the current data? "Most" is not a rigorous description. Based on the simulations of genomic data, how many Beefalo cattle were not detected as hybridized? This may be related to both sample size and where the authors sampled.
  
  We thank the reviewer for this important suggestion. We have now replaced vague summaries of results with precise numbers. However, we are unsure what “simulations” means in this context, as all results were obtained by analyzing empirical data from Beefalo, bison, cattle, and other bovines, rather than simulations.
  
  (5) The information in the third and fourth paragraphs of the Introduction is not sufficiently coherent and could be further consolidated into a more logical presentation.
  
  We have now condensed these paragraphs and edited them for clarity.
  
  (6) "For some analyses we also incorporated published genomes from outgroups". The description here is unclear as to what criteria were used to select these data, and it is possible that the choice of outgroups could lead to different conclusions from the analyses. In addition, ancient DNA data from cattle may be useful for this study and the authors are encouraged to consider it.
  
  Outgroup choice can certainly have a large impact on population genetic analyses. For the species examined in our study, we considered other Bos species, including yak, gaur, and banteng, as suitable outgroups, along with water buffalo, which is the closest outgroup outside of Bos. We have added comparisons of D-statistics using yak as an outgroup as a supplementary figure (Fig. S4), in addition to those using water buffalo as the outgroup which were presented in Figure 2.
  
  As we were examining species-level ancestry, and given the high level of divergence between bison and cattle, relative to that between published ancient and modern cattle genomes, we believed that it was most appropriate to use high quality modern cattle data, rather than poorer quality ancient cattle genomes, for analyses. Additionally, as any hybridization which took place between bison and cattle in the formation of Beefalo would have occurred within the past ~50 years, modern cattle are likely to be the most appropriate proxy for the cattle ancestry in Beefalo, especially given the lack of published historical North American cattle genomes.
  
  (7) The coordinates of the PCA plot need to be further supported by providing values.
  
  We have now updated axis labels for the PCA in Fig. 1A to include the proportion of variance explained for the first two components.
  
  (8) In Figure 1, Beefalo has one individual, NAGP9109, which belongs exclusively to the indicine group. For this individual, wouldn't it be nicer to label it separately in the PCA and ADMIXTURE plots, like Joe's Pride (JP), to make the presentation of the results clearer?
  
  This individual was one which was determined to be mislabeled as Beefalo within the NAGP and is actually a Brahman cattle. Therefore, we have relabeled it as zebu, rather than Beefalo, throughout the figures.
  
  (9) As the sex chromosome data do not fully support the authors' claims, some caution may be needed in describing the results.
  
  We interpret the sex chromosomal results as being fully consistent with patterns seen in the autosomes. However, they do shed some light on the dynamics of bison-cattle hybridization, and suggest male-mediated gene flow in which bison ancestry in Beefalo was introduced primarily through bison bulls.
  
  (10) Would it be appropriate to analyse the results at K = 3 only? The admixture analysis of all bison, cattle, bison hybrids, and buffalo individuals at different K values should further refine the results.
  
  We now also show ADMIXTURE results at K=2 and K=4 (Fig. S2) and present the cross-validation results from ADMIXTURE (Fig. S3).
  
  (11) The conclusions of this article about bison ancestry in Beefalo individuals are completely inconsistent with the American Beefalo Association, and should a description of possible reasons for this discrepancy be added to the discussion?
  
  Our analyses make it clear that there was much less hybridization between bison and cattle leading to the formation of the Beefalo that was previously believed. As the genetic data does not provide insight into exactly why this might be the case, we can only speculate on the precise reasons bison-cattle hybridization did not take place, which we have avoided here.
  
  Reviewer #2 (Recommendations for the authors):
  
  The manuscript is well written, the figures are easily understandable, and the claims made are justified by the results obtained.
  
  It is need to clarify cattle breeding terminology, particularly concerning breeds like the Brahman. While often described as zebu-taurine hybrids, Brahman cattle typically show over 90% zebu ancestry when analysed using ADMIXTURE against panels including European Bos taurus, African Bos taurus, and Bos indicus animals. This context would help explain why "NAGP9109" clusters with the Zebu group.
  
  We thank the reviewer for this useful context, and agree that most Brahman cattle have a high proportion of zebu ancestry. In fact, the zebu group we included primarily consists of Brahman individuals, which we have now clarified in the text, which now reads:
  
  “The reported pedigree in the NAGP for this animal lists its composition as 1/2 Brahman, 1/4 Charolais, 1/8 bison, 1/16th Hereford, and 1/16th Shorthorn, but the American Brahman Breeders Association records this animal (#309519) as purebred Brahman, which is a zebu breed (5 of the other 6 zebu individuals analyzed here are Brahman cattle).”
  
  I suggest three other improvements:
  
  (1) Standardise terminology: The manuscript alternates between "zebu" and "indicine" when referring to these cattle. While both terms are correctly defined in the introduction as "indicine (zebu; Bos indicus)" using one term consistently throughout would improve readability. I prefer "zebu" but leave this choice to the authors.
  
  We agree that this mixed terminology was confusing and have replaced all instances of “indicine” with “zebu.”
  
  (2) Add PCA metrics, including the percentage of variance explained by each principal component would demonstrate the genetic distinctiveness between bison and cattle, and between Taurus and zebu cattle. This would also support the selection of K=3 for the ADMIXTURE analysis.
  
  The axis labels for the PCA have been updated to include the proportion of variance explained for each component. We now also show ADMIXTURE results at K=2 and K=4 (Fig. S2) and present the cross-validation results from ADMIXTURE (Fig. S3).
  
  (3) Improve quantitative precision: The authors could improve precision by replacing qualitative statements with exact counts. For example "39 of 47 Beefalo showed no detectable bison ancestry." The same suggestion applies when describing how many Beefalo had zebu ancestry.
  
  We thank the reviewer for this useful suggestion, and agree that the manuscript used imprecise language in describing the results of certain analyses. We have now added quantitative detail throughout the Results section.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Introduction
  
  The introduction sets a tone that is heavily focused on the genetic revelation that the economics of beefalo are somewhat of a facade. Beefalo are indeed not part-buffalo (bison). It is unclear to me if the introduction also could benefit from motivating this with more of a theoretical framework based on evolution, inheritance, or trait transmission. If this is really meant to be an economics-focused article, then lean more heavily into that. As it stands, it straddles a bit of economics, a bit of legacies that appear false (beefalo are not part bison at all!), and a bit of admixture genetics theory.
  
  We intended the focus of this study to be on documenting the species-level ancestry of Beefalo, and concentrated the information presented in the Introduction on this topic. Given that less hybridization between bison and cattle appears to have taken place to form the Beefalo breed than was previously described, we believe that broader theoretical statements about admixture are less relevant here, beyond highlighting examples of successful and failed interspecies hybridization in Bos. We also avoided speculating on the history of the establishment of the breed beyond what could be understood from the genetic data.
  
  Can the authors give a bit more details about beefalo breeding? Did the breeders select for any quantitative traits and is there a targeted phenotype for beefalo they used as a standard?
  
  Limited information exists about the precise origins of Beefalo, which were never publicly shared—possibly in part for reasons this manuscript addresses. The only criteria defining Beefalo is the proportion of bison ancestry, and so no quantitative traits or specific phenotypes are related to breed standard.
  
  Can the authors provide a few examples of what is known about the incompatibilities and reproductive challenges? What is known from past research or from the Beefalo Association documenting the breeding history?
  
  We provided a general summary of hybridization and incompatibility across Bos, but unfortunately cannot provide details about incompatibilities in Beefalo specifically. Though there is a long history of challenges interbreeding bison and cattle (referenced in the third paragraph of the Introduction), to our knowledge no examination has been carried out of Beefalo specifically and little is known about Beefalo pedigrees (again, perhaps for reasons related to information presented in this study).
  
  (2) Results Section Sequencing Beefalo genomes
  
  Please report the number of polymorphic sites to accompany the genomic read depth averages. It seems the authors could include a larger summary of the genomic data that was used for downstream analyses (like the PCA in the next section). Also, does this dataset include the sex chromosomes? How many variants that are retained for analyses are autosomal, sex-linked, or haploid? Please provide more characteristics of the data that was generated after QC and filtering.
  
  We have now replaced “5.29M” on page 14 with the exact number of SNPs (5,291,534) and added a description of genotype calling to the Results section. We have also included the number of SNPs used for sex chromosomal analyses.
  
  (3) Results section Estimating bison ancestry in beefalo
  
  What is a "foundational" individual? Is this a beefalo pedigree founder, a common sire, or an individual with remarkably high bison content? I see in the introduction Joe's Pride was the "most expensive cattle" but there are surely other aspects of "foundational" that the reader should understand as the results are presented.
  
  We agree that this terminology was imprecise, and have now clarified that we use foundational to mean an early individual that was important in the founding of the Beefalo breed, such as those that were first bred by Bud Basolo.
  
  For the sentence "The reported pedigree in the NAGP for this animal [NAGP9109] lists its composition as 1⁄2 Brahman, 1⁄4 Charolais, 1⁄8 bison, 1/16th Hereford, and 1/16th Shorthorn, but the American Brahman Breeders Association records this animal (#309519) as purebred Brahman.", this is difficult for a reader with limited cattle breed knowledge to infer significance of this. What is the origination of Brahman breed cattle? Does Brahman ancestry come from another mixed origin that could explain this discrepancy? Does the PCA have references to resolve the origin of Brahman? I realize this may sound extraneous but if membership to a breed that is recently formed from several other lineages or breeds, could you be seeing the deeper parts that compose Brahman cattle? How could one validate that the contributors erroneously labeled this individual as a beefalo?
  
  We have now noted that the Brahman breed has primarily zebu ancestry. The placement of this individual in the PCA supports the American Brahman Breeders Association metadata, and suggests that the NAGP labeling is incorrect:
  
  “The reported pedigree in the NAGP for this animal lists its composition as 1/2 Brahman, 1/4 Charolais, 1/8 bison, 1/16th Hereford, and 1/16th Shorthorn, but the American Brahman Breeders Association records this animal (#309519) as purebred Brahman, which is a zebu breed (5 of the other 6 zebu individuals analyzed here are Brahman cattle). We believe NAGP9109 was erroneously labeled as Beefalo by the contributors.”
  
  Figure 1A: Please add % explained by each PC.
  
  We have now updated axis labels for the PCA to include the proportion of variance explained for each component.
  
  Figures 1B and 1C are identical except for the Y axis. Please combine them into a graph with 2 Y-axes (one for PC1 and one for ADMIXTURE). Also, please include the bison in this panel as well.
  
  We have now updated these panels to include bison, although have kept the labeling so that they may be referenced separately in the text.
  
  I see that the authors did both unsupervised and supervised. Can the main text have the supervised graphical result instead of the unsurprised? That is more relevant for ancestry proportions via an assignment probability to ancestry groups. Or, if possible, could the authors consider STRUCTURE to also obtain the probability of assignment to a prior defied parental up to 2-generations back? This is by far the best way to leverage the ancestry information of the cattle and bison parental references in addition to the known F1/bison hybrids. Swap the Supplementary Figure 1 with Figure 1D!
  
  The supervised and unsupervised ADMIXTURE results are highly consistent, as could be expected given the high levels of divergence between species. We prefer to show the unsupervised results in the main text, as this makes the fewest assumptions about the ancestry of the examined individuals, and so also shows that the panels used to represent each species (taurine cattle, zebu cattle, and bison) do not contain individuals which were themselves highly admixed, which could have influenced the supervised ADMIXTURE analyses.
  
  For the unsupervised ADMIXTURE analyses, what were the cross-validation values per K value tested? How did the authors decide that K=3 was the best one to show?
  
  We now also show ADMIXTURE results at K=2 and K=4 (Fig. S2) and present the cross-validation results from ADMIXTURE (Fig. S3).
  
  Regarding "D-statistics ..... are consistent with 0 for most individual Beefalos....", I have two comments. First, by "consistent with", do you mean "are not significantly different from 0", indicating that (explain what this means in your words). Next, "most individual beefalos" means how many? Please provide numbers and values to highlight points or specific findings.
  
  The interpretation of the D-statistics has been clarified and Z-scores and numbers of individuals to quantitatively describe these results have been added. The text now reads:
  
  “D-statistics of the form D (taurus, Beefalo; bison, water buffalo), which test whether Beefalo share more alleles with bison than taurine cattle, again show 39 Beefalo have no excess affinity with bison compared to taurine cattle (-13.04 < Z < 3.14), although the same eight Beefalo identified in PCA and ADMIXTURE as having bison ancestry also have an excess of bison alleles (6.16 < Z < 34.86), confirming their bison ancestry (Fig. 2A).”
  
  "In Beefalo with bison ancestry, that ancestry tends to be present in large contiguous blocks, often tens of megabases in size, indicative of recent admixture (Figure 3A, B)". Please display the quantitative results (mean, max, range, standard deviation, etc.) in the main text and point the reader to the table that contains the values for each individual. The rest of this paragraph also uses the words "most' or "always" - please provide numbers. Is most 30/46 beefalo? Is it always exactly all 47 beefalo? Readers want to see numbers!
  
  The reviewer is correct that this section lacked specificity. We have now provided the exact number of individuals identified with bison and zebu ancestry.
  
  The section starting "Several lines of evidence attest to the efficacy of using these source panels..." could realistically come first in the Results section and before beefalo results are presented. This would build confidence for the reader that this panel of samples passes a QC and will indeed be able to resolve ancestry-based questions.
  
  This section specifically refers to the local ancestry analyses, which we have now clarified in the text.
  
  Figure 3A-C: Please include on each of these figure panels the documented (breeder association) ancestry percentage and the percentage of bison ancestry you obtained from your genomic analyses. Moving it from the legend to the figure is more immediately powerful for the reader. If the authors dated the admixture events as well, please include the meta-data of the association pedigree reporting when bison entered the target individual's genome versus the genome-estimated number of generations since admixture.
  
  Figure 3 has now been updated to include the reported bison ancestry. No attempt was made to date the admixture event or compare with reported pedigrees, as documented Beefalo pedigrees are typically very sparse (and may be unreliable, as our results suggest).
  
  Figure 3 legend: Move the following text from the figure legend to the Results section: "Three bison hybrids are inferred to have ~75% bison ancestry, while eight Beefalo have detectable bison ancestry, ranging from 2-18%. Indicine ancestry is detected in most Beefalo at variable levels, ranging from 2-38%, with most Beefalo having between 2-18%.".
  
  This sentence has been removed from the legend and is now worked into the main text. The corresponding paragraph in the results now reads:
  
  “Local ancestry inference across individual Beefalo and bison-cattle hybrid genomes provides similar estimates of overall Beefalo ancestry, inferring an absence of bison ancestry across the 37 Beefalo that lacked evidence for such ancestry in previous analyses (Fig. 3). Three bison hybrids are inferred to have ~75% bison ancestry, while eight Beefalo have detectable bison ancestry, ranging from 2-18%. Zebu ancestry is detected in 38 Beefalo at variable levels, ranging from 2-38%, with all but two of Beefalo having between 2-18%.”
  
  (4) Results section Beefalo sex chromosome ancestry
  
  Check that the authors do not reference Figure 4B before Figure 4A.
  
  Thank you to the reviewer for noticing this, it has now been corrected.
  
  Figure 4A: Could this panel be considered to merge with the autosomal admixture plot? It helps with comparison. Not a firm request - but it is nice to see what is consistent versus what is discordant.
  
  To avoid cluttering the figure with two highly similar plots, we preferred to separate the autosomal and sex chromosomal results.
  
  Figure 4C: Could this panel be merged with the autosomal ancestry bar graph to help the reader with visual comparisons?
  
  We thank the reviewer for this suggestion, but do not understand exactly which figures they are suggesting to be merged.
  
  (5) Materials and Methods: Modeling Beefalo ancestry:
  
  The language used in this sentence "This approach allows for directly understanding the ancestry of Beefalo individuals relative to these three groups while mitigating the effects of the low sequencing depth obtained for many Beefalo." conflicts with a sentence later in this paragraph which called PCA a model-free analysis. Please correct.
  
  Unfortunately, we are unsure what the reviewer refers to here and believe that this sentence does not conflict with the characterization of PCA as a model-free analytical approach.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.09.16.613218v2
www.biorxiv.org www.biorxiv.org

Pupil size reveals the perceptual quality and effortless nature of synesthesia

1
1. Public_Reviews 08 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  The pupil traces in Figure3 (main results) are heavily pre-processed (per-participant demeaned), loosing any feature besides the effect of interest. As I argued in my first review, I worry that this format gives unrealistic expectations about the effect (the perception of dark/bright colors do not generate a net dilation/constriction of the pupil; perception-related modulations of pupil size are always relative and generally small compared to the numerous other effects registered in pupil size; these include a pupil dilation that is more prominent in the controls and that gets analyzed later on in the manuscript; I do not think that eliminating one of the effects of interests from a main results figure helps the reader understand the results). In the revised manuscript, the authors addressed this concern by adding a Supplementary Figure 4, where a more complete representation of the results is shown (traces from individual trials are baseline corrected and averaged, resulting in more informative timecourses). I would strongly recommend that Supplementary Figure 4 is brought to the main text (Figure 3 could be presented in Supplementary).
  
  We agree that it is important to counter unrealistic interpretations of the effect. However, figures in the main article are the ones that are depicting the effects. Instead, it seems that additional clarification on these effects is needed. First and foremost, Figure 3 in the main manuscript visualizes the core effect: pupil size reveals that synesthesia is a sensory process and the phenomenology of the synesthetic experience can be measured physiologically. Secondly, this allows to advance synesthesia (and phenomenology) research as a new and powerful method.
  
  No doubt, our effect is relative in nature (as almost any pupillometry, fmri, eeg effect etc.). Including variation that is unrelated to the effect would increase rather than decrease confusion, as individual differences (i.e., how the pupil of an individual responds irrespective of the synesthetic experience) are unmeaningful to the question we set out to answer. Individual variations in pupil response shape irrespective of synesthetic color brightness are removed in Figure 3 but still present in Supplementary Figure 4. Thus, Figure 3 is better suited to illustrate our core effect than Supplementary Figure 4, as individual average responses (illustrated on the right) cannot be meaningfully related to the core effect anymore, only the difference can be.
  
  At the same time, the reviewer is correct that this may, not so much among researchers as among a general audience, create the expectation that the pupil will always net dilate when experiencing a dark synesthetic percept. This is clearly not the case, but only over its counterfactual (i.e., not seeing that dark synesthetic percept). We now counter such an unrealistic expectation:
  
  “Note that the effects here are visualized as counterfactuals. So while the pupil dilated for dark relative to bright experienced colors in synesthetes, this does not mean that the pupil net dilates and constricts to dark and bright experienced colors relative to baseline, but only relative to the counterfactual (see Supplementary Figure 4 for net pupil size changes).”
  
  We updated the caption of Supplementary Figure 4 as follows:
  
  "Supplementary Figure 4: Pupil size change to graphemes, split by 0.5 reported color lightness (dark gray = low lightness; light gray = high lightness) without demeaning (i.e., removing the average pupil response shape in the 4s stimulus interval per individual irrespective of brightness perception). (…)"
  
  Responses to physical brightness modulations were only measured in the synesthethes group, not in controls. The authors point out that pupillary light responses have been thoroughly characterized in previous studies, and conclude that synesthethes' responses were in line with the expectations both in terms of amplitude and latency. However, as we are not dealing with standardized measurements, subtle differences in pupil reactivity across the two populations remain a possibility. I recommend that this possibility is mentioned in the discussion.
  
  We agree with the reviewer, if there were any differences in the PLR between the two groups, they must be minor given that the responses follow those reported in the literature so closely. Yet, subtle differences cannot be ruled out fully unless tested and it doesn’t hurt mentioning this in the discussion, which we now do as follows:
  
  Finally, pupil light responses in Block 2 were only assessed in synesthetes. While these closely match such of control populations [50,51], subtle between-group differences cannot be excluded and could ideally be assessed in future and replication work.
  
  Reviewer #2 (Public review):
  
  Synesthesia is a neurological condition where stimulation of one sensory channel leads to involuntary, automatic, and consistent experience of another, unrelated percept. For example, Sir Francis Galton (1880, Nature) famously described the robust tendency of some individual (synesthetes) to associate numerals with a distinct color. Ever since, synesthesia keeps attracting a broad interest in the cognitive neurosciences in light of its implications for the study of domains such as perception, consciousness, and brain connectivity, among others.
  
  Strauch, Leenaars, and Rouw measured pupil size in a group of 16 grapheme-color synesthetes and two matched control groups. The participants were presented with gray digits - that is, visual stimuli having identical physical properties in terms of brightness. Each participant subsequently rated the corresponding evoked color and brightness: unlike controls, synesthetes did so in a very consistent and reliable fashion. Accordingly, this was also shown in their pupils: despite the same objective luminance, digits associated with brighter percepts caused their pupils to constrict and digits associated with darker percepts caused their pupils to dilate more than controls. These results highlight how crossmodal correspondences are deeply rooted in synesthetes, and puts forward pupillometry as a particularly appealing biomarker for some phenomenological experience (at least those grounded in "brightness").
  
  Further strengths of the technique are its temporal resolution and its responsiveness to several constructs. Across several tasks, the authors show for example that responses to synesthetic light are somewhat slower than responses to real light (i.e., they are likely mediated), but at the same time faster than responses to mental imagery. The role of mental imagery can also be reasonably dismissed when considering the second feature of pupil size: its responsiveness to mental effort and cognitive load. The pupils tend to dilate with demanding, challenging tasks, and this was the case when control participants were asked to report the color of a digit for which they did not consistently experience a synesthetic association. The same task was, instead, seemingly effortless for synesthetes, again speaking in favor of the automaticity of number-color correspondences in their case.
  
  Overall, the findings by Strauch, Leenaars, and Rouw are highly significant for the field and likely to be impactful. The strength of their evidence, when accounting for the relatively small sample size and the inherent variability of both phenomenology (color perception and subjective reporting) and physiology (pupil size), is adequate and sufficiently convincing.
  
  Comments on revisions:
  
  I thank the authors for addressing all my comments in a satisfactory way. I think that the paper has improved, especially in terms of transparency of the reporting and clarity of the results.
  
  We thank R1, R2, and R3 for their very useful input to improve our manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.11.24.690102v3
docdrop.org docdrop.org

models_for_writers_eleventh_edition_-_alfred_rosa__paul__eschholz

1
1. LiliShepoka 08 Jun 2026
  
  in Public
  
  Not many of us, however, have been trained to read ac-tively, to engage a writer and his or her writing, to ask why we like one piece of writing and not another. Similarly, most of us do not ask our-selves why one piece of writing is more convincing than another.
  
  I agree schooling is often the reason many of us read mainly for ideas rather than reading actively. However, it is important for our reading and growth as writers to question the text, consider the author's purpose, and think about why one piece may be more effective than another.
Visit annotations in context

Annotators

LiliShepoka

URL

docdrop.org/download_annotation_doc/Reading-to-Writing-models_for_writers_eleventh_edition_-_alf-qqrlo.pdf
www.biorxiv.org www.biorxiv.org

Disruption of small RNAs and mechanistic variation in Segregation Distorter-a sperm-killing drive system in Drosophila melanogaster

1
1. EMBOpress 06 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1
  
  Evidence, reproducibility, and clarity
  
  Summary: Edvalson and colleagues use transcriptomics, cell biology and genetics to study variation between segregation distorter (meiotic drive) strains and find several important results. These include apparent suppression of small RNAs mapping to responder (the drive target) in one of the lines, a general pattern of differential expression consistent with the drive mechanism being upstream of sperm individualization (where defects have been seen previously), and genetic confirmation that perturbing Rsp expression can influence the strength of drive.
  
  Major comments: I found the total RNA sequencing experiment a bit oddly presented. This is partly because it was in the middle of the results (might fit better first), partly because few specific genes were discussed (this might be appropriate given then question, but maybe the question should be more clearly stated), and the complexity of the approach (WCGNA + PANGEA) and how it all fits together. I suggest working to clarify the main points of this section (which are a bit different than the main focus of the Rsp work).
  
  We thank the reviewer for these important points. We liked the suggestion to swap the order of our results. We attempted the change, but we found that we weren't able to make the flow of the results much better. Instead, we primed the transition from smRNA to totRNA in the last paragraph of the smRNA results (lines 190-196). This paragraph now reads:
  
  The dearth of Rsp smRNAs in SD-Mad heterozygotes could be due to a disruption in transcription of the locus or subsequent processing steps. Many factors can influence piRNA production. For example, the piRNA pathway can amplify piRNAs independently of transcription, such as the ping pong cycle, (Czech and Hannon 2016). Notably, Rsp piRNAs do not have a strong ping pong signature in testes (Wei et al. 2021; Chen et al. 2021a). To distinguish between a disruption in transcription or some downstream process, we examined total RNA.
  
  The main reason we elected to describe patterns rather than specific genes is that the 2nd chromosomes we tested (R-16, SD-Mad, SD-5) have all diverged from each other and any single differentially expressed gene could be due to differences in genetic background. Therefore, we elected to point out more broad systematic changes in pathways and correlated gene networks rather than specific genes. We have made it more obvious throughout the total RNA section in the text what our question is regarding the transcriptome and the reasoning for using WGCNA and gene set analysis.
  
  We also appreciate the reviewers point that the complex approach we used to extract changes in pathways and networks is difficult to follow. We have modified our wording to better describe the flow of analyses.
  
  We also note that we have extended our analysis for the comparison of SD-Mad and SD-MadRev, which only differ by the Sd-RanGAP locus. Here we do discuss individual genes that are differentially expressed. See below for details about this new analysis.
  
  Minor comments:
  
  Abstract - Probably worth mentioning Sd-RanGAP here, even if you are using it as a straw man. I agree that the specific mechanism is not known, but some of the genetics are established.
  
  This is a good point. While our study doesn't address RanGAP, it is important to point out that, although its role in drive is unclear, Sd-RanGAP is a necessary component of the system. We added the following language to the abstract:
  
  SD is a multigene complex, frequently associated with chromosomal inversions, where the main driver locus, a truncated duplication of the gene RanGAP kills wild-type sperm containing a satellite DNA called Responder (Rsp).
  
  Line 80 and elsewhere - it would be helpful to be specific here - you are looking at both small and total RNA
  
  We've modified our wording throughout the manuscript to specify when we are referring to total RNA and small RNA.
  
  Fig 1B - is there a reason not to show the values of the replicates here? It would be more transparent.
  
  We thank the reviewer for this comment. We replaced Fig 1B with a chart that is computed from the DESeq2 normalized counts for each comparison and added replicates to all related graphs.
  
  Line 139 - does the experimental design control for 1.688 genomic copy number? Where is it located?
  
  We indeed control for the 1.688 copy number here. Most 1.688 repeats are found on the X chromosome and all flies in our experiments have identical X chromosomes. We changed the text to specify that copy number for 1.688 are the same between conditions.
  
  144-146 - this could be written clearer, and I think it should only refer to 1C, not 1B. Part of the issue is that there are several repeats not discussed, and it isn't clear what is happening with them. I suggest expanding this description so it is more clear.
  
  Thank you for this feedback. We have expanded the description to make this section clearer.
  
  Line 161 - what do you mean (specifically) by "repetitive loci"?
  
  Repetitive loci in this case refers to transposons, satellite DNAs (except simple satellites), and piRNA clusters. We have added text explaining what is included the grouping of "repetitive loci". We have added the following sentence to the text:
  
  Our results demonstrate that SD-Mad and SD-5 haplotypes, despite sharing the same main drive locus, have different effects on smRNAs derived from repetitive loci such as complex satellites (including Rsp), transposable elements, and piRNA clusters.
  
  193-203 - This is an important finding that is somewhat lost in trying to keep track of WCGNA and PANGEA and the different Modules. I suggest clarifying to drive home the point that differential expression appears to start prior to individualization, which suggests and earlier mechanism of drive.
  
  We thank the reviewer for this feedback. We have added wording to out discussion that points out this finding in lines 501-505 which reads:
  
  We suspect that the timing of the proximal cause of SD-mediated drive may align with early spermatogenetic processes; perhaps where cell cycle-related genes are active and appear to be broadly differentially expressed (Figure 2B, Module H). This earlier timing is consistent with temperature shift experiments that place the critical period for SD at or before meiosis (Mange 1968).
  
  Fig 3B & 3C, Fig 4 - same as 1B, is there a reason not to show the actual data points?
  
  A similar issue was brought up earlier, in response we modified all our figures to show replicate points where applicable.
  
  Line ~245 - was the same experiment done with SD-5? (as you do below for Rsp overexpression)
  
  We originally did not include SD5 in this experiment, but we have since measured drive strength of SD5 in a kipfKO background. We found a small but statistically significant difference in drive strength. We added the new SD5 results to the figure and moved the kipfKD data to the supplement along with some added data on a Rsp deletion line generated from Iso1 that bolsters our confidence in the SDMad results.
  
  Significance
  
  This is a strong paper that moves the field forward, even if it leaves questions still to be answered (why the difference between drivers? what is the mechanism? how is rsp interacting with drive?
  
  Several findings move the field forward: the Rsp small RNA results, the differential expression hinting at a molecular mechanism that is upstream of sperm individualization.
  
  The audience is moderately broad. Genetic conflict is gaining in general interest, but aspects of this will be mostly interesting to the hardcore drive crowd.
  
  Reviewer #2
  
  Evidence, reproducibility and clarity
  
  I have only one request: I found it unclear whether the authors were referring to small RNAs or their precursor (long RNA). By reading the text carefully, I could deduce that Fig1A/Table S2 represent the small RNA sequencing, while FigS3A represents total RNA seq (detecting precursor). However, the labeling in the Fig1A and Table S2 only says 'piRNA cluster' or 'Rsp' (without clarifying 'piRNA from piRNA cluster' or 'piRNA from Rsp'), and it took quite some time for me to understand which Fig/data is smallRNA vs. longRNA.
  
  This is helpful feedback. We have added more clarity to which type of RNA is being represented in our figures throughout.
  
  Significance
  
  This manuscript by Edvalson et al. describes their study on SD (segregation distorter) meiotic drive system, examining the role of piRNA derived from Rsp satellite. Although the exact mechanism of drive is still unknown, this study represents a significant step forward in understanding SD-mediated drive.
  
  By using two SD alleles (SD-5 and SD-Mad), they show that Rsp-derived piRNA is depleted in SD-Mad. The authors used total RNA sequencing/small RNA sequencing mutants and carefully designed controls (such as deletion of Sd-RanGAP) to reach the model that Rsp-derived piRNA is involved in SD-Mad-mediated drive. The result that kipferl depletion (that lead to sat DNA expression) rescues SD-Mad's drive phenotype is very interesting. This supports that the decreased Rsp piRNA indeed corresponds to SD-Mad-mediated drive. They further back up this idea by overexpressing Rsp.
  
  Interestingly, SD-5 was not impacted by changes in Rsp expression. Based on this result, the authors state that there are mechanistic variations in the same (SD) drive system. This statement is certainly justified by the data, but I cannot help wondering there might be a unifying mechanism that explains both SD-5 and SD-Mad. I am not suggesting to edit the manuscript or add the discussion: but do they have any speculations on this? For example, SD-5 is simply epistatic to Rsp piRNA production? For example, SD-RanGAP > SD-Mad (some gene on SD-Mad inversion) > Rsp piRNA production > SD-5 > sperm killing?
  
  We thank the reviewer for this insight. We indeed think that the proximal cause of sperm dysfunction could be the same, but there are components of SD5 that act downstream of Rsp piRNAs. The small difference in drive strength in the SD5 KipfKO experiments might support this hypothesis, although it is also possible instead that drive is influenced by changes in some other piRNAs (from the piRNA clusters or satellites).
  
  We modified our wording in the first paragraph of the discussion to point out this possibility. Lines 367-370 now reads
  
  These results suggest that, while SD chromosomes share a target and main drive locus (Sd-RanGAP), the modifiers accumulated on each haplotype may influence the drive mechanisms, either by creating new pathways to drive or acting as tuning knobs on drive strength.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  Summary
  
  In the presented manuscript Edvalson and Wei et al use Drosophila genetics and NGS experiments to investigate the mechanism of meiotic drive through the Segregation Distorter (SD) system. They reveal that two driving haplotypes seem to function via different mechanisms, with drive through SD-Mad but not SD-5 involving small RNAs produced from the Responder (Rsp) satellite, the target of SD drive. SD-Mad testes displaying drive are characterized by lower levels of Rsp sRNAs compared to non-drive controls as well as SD-5, and the ectopic overexpression of Rsp sRNAs through two distinct mechanisms decrease drive in SD-Mad genetic background, specifically. With this work, the authors are adding an important piece of information to the highly complex SD system, indicating that sperm killing is likely achieved by different mechanisms in different SD haplotypes, despite sharing a common driver.
  
  Major comments
  
  Fig1C: It might be interesting to show the fold change between SD-Mad and SD-MadRev in addition to what is displayed. Moreover, can the authors comment on what might be causing the increased smRNA counts for 38C2? Is this because R16 has particularly low 38C2 values?
  
  We appreciate the reviewer's comment concerning the fold change between SD-Mad and SD-MadRev. We have made a figure showing the difference between and put it in Figure S1.
  
  We suspect that the expression difference in 38C2 between the R16 heterozygotes and SD heterozygotes may be due to genetic divergence, since these are different 2nd chromosomes. We have added language pointing this out to the manuscript in line 182. The paper now reads:
  
  *There is no evidence that either 38C2 or Flamenco are involved in SD-mediated drive. *
  
  Fig1/S1: Could the authors also display the Rsp smRNA counts for all Gla crosses similar to panel 1B? What is the interpretation for the increase in Rsp smRNAs in SD-5/Gla relative to R16/Gla but the lack of such an increase in the SD-5/iso1 vs R16/iso1 comparison? Do SD-Mad and SD-5 induce the same strength of drive against each of the two wildtype chromosomes? Experiments: smRNAseq for SD-MadRev/Gla.
  
  We have added a plot to Fig S1 to show the abundance of Rsp small RNAs in the Gla background, similar to Figure 1.
  
  It is difficult to interpret the apparent overabundance of Rsp small RNAs in the SD-5/Gla background. Because differences in Rsp smRNA abundance for SD-5 are inconsistent between the Iso1 and Gla background, our interpretation is that SD-5 is not manipulating Rsp levels. The apparent overabundance of Rsp in the Gla background could be due to an epistatic interaction between Rsp and other components of that particular background. Consistent with this interpretation, the SD-Mad induced reduction of Rsp smRNAs in the Gla background is less dramatic than in the Iso1 suggesting that something about that background is increasing Rsp expression slightly when paired with an SD chromosome.
  
  Fig1: The authors note changes in smRNA levels for other satellites as well as piRNA clusters but do not give any interpretation to this observation. Are they meaningful? Should they be attributed to genetic background?
  
  Our interpretation of the observation that some satellites or piRNA clusters are differentially expressed is that these differences are likely due to epistatic effects from the different 2nd chromosomes used in the study or are incidental to mechanism of SD.
  
  FigS2: Same question also for the deregulated TEs: do they share sequence features with Rsp or are they overrepresented in the clusters that change? Are these explained by differences in insertions between genotypes? Do their total RNAseq values change in any way? What do the percentages in line 162 correspond to? Number of TEs that are deregulated? At which cutoff? It might be informative to compare the data to a cross between driver and R16, or even better the SD-MadRev control. Experiments: totRNAseq for SD-MadRev crosses and optionally crosses to R16.
  
  The Rsp repeat unit does not share significant homology to portions of the genome outside of the pericentromere of 2R with the exception of ~6-12 copies in the intron of Ago3.
  
  As far as TEs are concerned, we surprisingly don't see a strong correlation between piRNA cluster content, dysregulation, and TE transcript abundance. For example, in the SD/Gla backgrounds the total RNA for R1, R2, IGS, and Tc1-Mariner family TEs is down regulated. However, the only major piRNA cluster that is upregulated in both SD/Gla backgrounds (80F) is not enriched for TE fragments matching any of those 4 families. One thing we can note is that the definition of the major piRNA clusters are given in relation to the Iso1 genome which may differ from that of our experimental backgrounds. Without long read resolved genomes for our specific experimental lines generated at the same time as the RNA samples it is difficult to determine how expression at the major piRNA clusters and the corresponding TEs are related. We have described this lack of a correlation in lines 210-217 in the text along with our interpretation for why this could be. The paper now reads:
  
  On the other hand, we did find some differences in repetitive elements related to rDNA (R1, R2, and IGS) and Tc1-Mariner family TEs (all backgrounds; Figure S6). Interestingly, there was no correlation between the expression of TEs and the expression of piRNA clusters that contain fragments of these TEs in the total RNA, nor was there any correlation between the small RNAs from piRNA clusters and the total RNAs for those TEs. PiRNA clusters are usually defined in one isolate of Iso1: rapid turnover of TEs and piRNA sources could explain why we do not see a correlation between piRNA cluster expression and TE expression in our backgrounds.
  
  We investigated differences in TE and piRNA cluster expression in our SD-Mad/Iso1 vs SD-MadRev/Iso1 comparison, but a lack of power due to inter-sample variation prevents us from confidently making any assessments on any TEs or piRNA clusters in that comparison. We did however generate additional gene level transcriptomic data using 3' Digital Gene Expression to bolster our confidence in the totRNA data and found some interesting genes that were in the top most differentially expressed. We have noted those genes in lines 276-287 which read:
  
  To identify genes that might interact to cause drive, we compared the gene expression of SD-Mad/Iso1 to SD-MadRev/Iso1. These genotypes only differ by the presence of the main drive locus, Sd-RanGAP. We performed both totRNA and 3' Digital Gene Expression (DGE) RNA sequencing and examined the overlap in differential expression between the totRNA and DGE sequencing. There are 69 differentially expressed genes where the DGE comparison is significant (PDGE {less than or equal to} 0.01), and the sign of the Log2FC of the totRNA matches that of the DGE. Among this set of differentially expressed genes, 57 show at least a 50% difference in gene expression (absolute Log2FC value of at least 0.58 in DGE). These genes are not enriched in any Reactome gene sets. The top 20 most differentially expressed genes consists of 9 lncRNAs (3 anti- sense RNAs) and 11 protein coding genes: 8 of which are uncharacterized. The 3 characterized genes are Artemis (Arts), Gr61a, and Tono (Figure S98, Supplemental File 1).
  
  We discuss two of these genes in further detail in the discussion in lines 476-486 which read:
  
  First, Tono, a BTB zinc finger-containing transcription factor is upregulated (Log2FCDGE = 1.7) in all SD-Mad comparisons. Tono plays a role in regulating transcription in muscle cells in response to mechanical pressure (Zhang et al. 2024) but also shows enrichment in male germ cells (Li et al. 2022). The putative DNA-binding capacity and ability to form nuclear condensates (Zhang et al. 2024) makes this an interesting candidate gene for interacting with the Rsp satellite. Second, the importin-4 ortholog, Artemis (Arts), which facilitates Ran-mediated import of H3 and H4 is overexpressed in SD-Mad (Log2FCDGE = 2.5). Interestingly, Arts expression is antagonistic to male fertility (VanKuren and Long 2018). Also of note, Apollo, a duplicate of Arts which supports male fertility (VanKuren and Long 2018) is downregulated (Log2FCDGE = -0.6) though it is not in the top-most differentially expressed genes.
  
  Figure S3: Am I reading the PCA plots right in that there are very few gene expression changes when the drivers are in iso1 background but much more in the Gla background? Comment on possible explanations for that. Please indicate the number of significantly changed genes in each comparison. Again, are these changes correlated between the two drivers or can they be attributed to genetic background of Gla vs R16? Would it be interesting to see how SD-Mad/Gla and SD-5/Gla gene expression profiles compare? Experiment: totRNAseq for SD-MadRev crosses.
  
  There did tend to be more differences in the Gla background compared to Iso1. This difference can best be explained by inter-sample variation in the SD-Mad/Iso1 background which we see in the PCA plot in Fig S4A. Another reason for the difference could be that the Gla and Iso1 chromosomes are very different from each other which prevents us from making any 1-to-1 comparisons between the SD/Iso1 and SD/Gla backgrounds. We generally avoid comparing between genetic backgrounds for this reason unless they share differences as these are more likely related to drive.
  
  In Figure S5A it seems that totalRNA levels of Rsp are strongly increased in SD-Mad/Gla but not in SD-Mad/iso1. The iso comparison (less piRNAs but same transcript) could indicate that it is actually transcription of the Rsp that is affected here. This is even pointed out in line 205 without discussion of the fact that the Gla comparison (less piRNAs but more transcript) would rather indicate that transcription is intact, but processing into piRNAs is defective. Could this be clarified using FISH as in Figure S8? If true, SD-Mad/Gla should have much more FISH signal than SD-Mad/iso1. Either way, this discrepancy should be further discussed. Experiments: comprehensive smFISH panel for all crosses (including SD-MadRev).
  
  The reviewer makes an excellent point. Why would Rsp long RNAs be overexpressed in the SD-Mad/Gla background? Earlier we noted that in the Gla background specifically the genotypes that contain an SD chromosome seem to have a higher level of Rsp small RNAs than we might expect given our Iso1 results. We conclude that this is likely due to an epistatic interaction between the 2nd chromosomes used in the study and the rest of the chromosomes. This interpretation could extend to the long noncoding precursors as well.
  
  Further, although the difference between SD-Mad/Gla is significant and SD-5/Gla is not, they do move in the same direction. This is also true in the Iso1 backgrounds but in the opposite direction. Given an interpretation that Rsp expression is higher than expected in the SD/Gla background due to epistatic effects, it becomes clearer that changes in long RNA abundance are related to changes in small RNA abundance though not perfectly indicative. However, due to lower count levels for Rsp in the totRNA, we do not have the power to confidently draw that conclusion.
  
  In general, the totRNA profiles of repeats don't seem to correlate well between the genotypes (iso vs Gla crosses, neither for SD-5 nor for SD-Mad). Is this because values are in general small and/or replicates don't correlate? Should these data even be considered? Also panels 2A and S5C are very different from each other. The additional comparison with the SD-MadRev allele crossed into both Iso1 and Gla should give additional insight. Experiment: totRNAseq for SD-MadRev crosses.
  
  The reviewer brings up a good point. While some repetitive elements had relatively small counts in the totRNA (like Rsp) most had adequately high counts. But these differences are to some degree expected. Although the other chromosomes are controlled for, the second chromosomes are different by design including the two SD haplotypes. In this context, similarities between the two haplotypes may be helpful in determining some unifying aspects of the SD mechanism but differences could be incidental to the genotype and not necessarily related to SD.
  
  It may be generally informative to set the sRNA and RNA comparisons into perspective, for example by including the comparison of SD-Mad crosses versus SD-MadRev crosses to exclude unrelated genetic background components as much as possible.
  
  The reviewer is correct here. Differences in the transcriptomes of SD-Mad and our revertant are much more likely due to the drive phenotype. Due to variation between SD-Mad total RNAseq replicates, we have substantially less power when comparing SD-Mad/Iso1 to SD-MadRev/Iso1. We therefore generated new data to address this point: we did digital gene expression for three biological replicates of SD-Mad/Iso-1 and SD-MadRev/Iso1. We described the results of this new analysis above.
  
  FigS6: I assume this is given, but as it is not specified: is the directionality of differential expression taken into account here? Or could it be significantly up in one and down in the other? Please specify / adjust color scale to allow this distinction.
  
  This is a good point. We have modified the figure to not only indicate significance but also direction and magnitude.
  
  FigS8: Please add a scale bar for all images. 1.688 is labeled as 359 in the legend, please unify or/and explain nomenclature. Consider adding a nuclear outline based on DAPI. It looks like 1.688 is actually more different between control and SD-Mad/Iso than Rsp. Could the authors comment on this? In the text the authors mention that these experiments were done for both SD-Mad and SD-5 heterozygotes, but only the SD-Mad data are shown.
  
  The most abundant component of 1.688 repeats is the 359bp repeat, which is used as a proxy for 1.688 and our 359-bp probe cross hybridizes with other abundant variants of 1.688 on chromosome 3. We agree, there does seem to be some differences in the 1.688 RNA FISH, however we do not yet have evidence that 1.688 is related to the drive phenotype. We have expanded that figure (now supplemental figure 7) with multiple images for each genotype to demonstrate the lack of change in Rsp and 1.688 localization. We have added an explanation of the nomenclature.
  
  The reference to SD-5 in the text was made in error. We do not have RNA FISH images of SD-5/Iso1 heterozygotes. We've modified the text to reflect this.
  
  FigS9B: What does the y-axis label mean? Fold change relative to what? Is this not displaying counts?
  
  This is a good catch by the reviewer. The y-axis is mislabeled and should read "TPM". We have made this change.
  
  To set the KipfKD/KO data in context, please give also the k value for SD-MadRev and compare the smRNA values in this context to the data displayed in F1B. Experiment: drive analysis for SD-MadRev.
  
  Our basis for concluding that Rsp smRNA overexpression may reduce drive strength is in demonstrating that kipfKO is sufficient to rescue wild type sperm in driving backgrounds. We did not introduce KipKD (or KO) to the SD-MadRev background because this chromosome does not drive.
  
  The note that the 3XP3-dsRed cassette needs to be flipped out for Rsp overexpression to influence drive is interesting. It would be great if the authors could show a more detailed scheme of the structure of this insertion including the directionality of the promoter relative to the Rsp fragment and the rest of cluster 38C (including dm6 coordinates perhaps). Small RNA sequencing compared to totRNA sequencing should reveal if the transcription or the processing into piRNAs of the inserted piece is affected, and if more of the 38C piRNAs are affected. Genic transcription has been previously observed to limit Rhino-dependent piRNA production from piRNA clusters (Andersen et al 2017). It might be of interest to the general piRNA community to see how cluster output is influenced through the integration of an internal genic promoter.
  
  We agree that this is an interesting result. We have added more detail to Fig 4A to indicate directionality and genomic location of the insert in terms of dm6.
  
  Figure panel 4A should be adjusted to include annotations of the black boxes and to give genomic locations. It is unclear what the blue brackets mean, and where exactly the insertion took place. Are the attP sites relevant for the experiments? It might be nice to see a piRNA profile over the locus, to put the levels of additional Rsp piRNAs into perspective.
  
  We have removed the black boxes from the schematic as they were only there as an aesthetic choice. We have indicated where exactly the insertion was made. The attP sites are there for future experimental flexibility.
  
  Minor comments
  
  Figure 3B: fold change of satellite RNA is shown. It might be obvious that the fold change relates to KipfKO / WT but this should be stated explicitly. What is the genetic background here?
  
  Thank you for the comment. We added information on the genetic background in the figure.
  
  Figure legends should be extended for clarity throughout the manuscript in main and supplementary figures. All color codes and abbreviations as well as samples / genotypes and assay used should be clearly explained. Few examples include: F1B: smRNA or totalRNA? F3B: fold change relative to what? F4B: what are these data relative to? F4C: smRNA or totalRNA? S2: Is this smRNAseq? Further description of the color code in the volcano panels would be desirable. FS3: typo in A-B should be A-D. Fold changes relative to what. Etc.
  
  Thank you for these helpful suggestions. We have edited the figure legends as suggested to improve the clarity. We appreciate the feedback.
  
  The abbreviation for Kipferl is kipf, not kip.
  
  Thank you for pointing this out, we have made the corrections.
  
  I don't understand the sentence on lines 310-312.
  
  We agree that sentence was confusing. We replaced it with:
  
  "Identifying potential proteins that interact with Rsp may therefore provide important clues about why satellites like Rsp are targets of drive."
  
  **Referee cross-commenting**
  
  I agree with the other reviewer's assessments
  
  Reviewer #3 (Significance (Required)):
  
  General assessment
  
  This study of a highly complex and poorly understood drive system adds a very interesting piece to the puzzle of understanding the interplay between a RanGAP duplication and a large satellite array. It's strengths lay in the use of genetics tricks to modify drive (SD-MadRev allele, KipfKO, Rsp cluster insertion). The main weakness of the study is the relatively low correlation of several observations between drive crosses to the Iso1 and Gla lines and lack of explanations thereof. Neither gene nor repeat expression seem to give a convincing overlap in any direction.
  
  Furthermore, it is interesting that SD-Mad and SD-5 have such different dependencies on Rsp sRNA. While outside the scope of this work, it would be very interesting to see how other drive haplotypes behave: is SD-5 the exception or is it SD-Mad (as the authors have also wondered in the discussion). Such additional comparisons may clarify also the discrepancies in RNAseq.
  
  Advance
  
  While it has been previously shown by the same group that Rsp satellites give rise to smRNAs through the piRNA pathway, it is to my knowledge unclear how and if these smRNAs influence drive. This study thus presents a conceptual advance in that it demonstrates that the role of Rsp smRNAs is not shared among driving haplotypes.
  
  Audience
  
  This study is relevant for a highly specialized audience interested in meiotic drive. It contributes to the understanding of the SD system and may serve as a basis for future research in this area. In addition, results reported in Figure 4 may be of peripheral interest for the Drosophila piRNA community for technical interests.
  
  This reviewers expertise: Drosophila, piRNA pathway, heterochromatin, sRNA
  
  This reviewers limitations: nuclear-cytoplasmic trafficking, cytoskeleton
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2025.12.01.691737
www.biorxiv.org www.biorxiv.org

Experimental evolution to thermal stress indicates climate resilience in a cosmopolitan arthropod

1
1. Public_Reviews 05 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This important study deepens our understanding of how populations of a given species may diverge in their molecular and physiological patterns as a result of adaptation to different thermal regimes. By approaching this question from multiple directions, the authors provide solid evidence for adaptive changes in three strains of the diamondback moth after only three years of experimental evolution, and support the causal involvement of the PxSODC gene in thermal adaptation to both cold and hot temperatures. This work would benefit from more sophisticated phylogenetic analyses, better statistical support, and a more detailed discussion of the differences in the three strains at the pathway level.
  
  We sincerely thank the editors for this positive and constructive assessment. In the revised manuscript, we have addressed the highlighted points by: (1) re-inferring the phylogenetic tree of the PxSODC gene using a model-based Maximum Likelihood method (IQ-TREE) to ensure a robust evolutionary analysis; (2) substantially expanding the description of our statistical methods across all data types to ensure reproducibility and clarify multiple-testing corrections; and (3) adding a more detailed discussion of the pathway-level differences between the hot and cold strains, particularly integrating how their distinct transcriptomic responses align with their shared metabolic adjustments and phenotypic traits.
  
  Reviewer #1 (Public review):
  
  (1) The authors identify pathways that are enriched in different strain comparisons (Figure 3E), but do not provide a detailed interpretation of these results. It would be great if the authors could explain in more detail how the physiological processes of a cold-adapted strain of this species may differ from those of a warmer-adapted strain.
  
  We agree. We have addressed this by directly integrating our pathway enrichment results (Figure 3E) with the observed life-history phenotypes (concurrently addressing Reviewer 2's Comment 36a). We expanded the Discussion to explain that while both strains share convergent adjustments in core pathways (e.g., lipid metabolism for energy reallocation), their specific physiological strategies differ. The cold-adapted strain relies on broader transcriptional reprogramming to maintain homeostasis and support extended longevity/cold hardiness, whereas the hot-adapted strain utilizes broader metabolic rewiring to actively fuel its accelerated development and higher fecundity.
  
  (2) The authors reconstruct a phylogenetic tree of the PxSODC gene using the neighbor-joining algorithm. The limitations of this algorithm have been known for many years now, especially for sequences separated by long evolutionary distances. According to Wang et al. (2016), the last common ancestor of the species shown in Figure S4C occurred 392-350 million years ago. Given this, I would strongly recommend that the authors infer a phylogenetic tree using model-based methods, such as those implemented in RAxML-NG or IQ-TREE. Also, in the absence of a valid outgroup sequence, I would show the gene tree as unrooted or rooted based on the corresponding species tree.
  
  Agree. We have re-inferred the phylogenetic tree of the PxSODC gene using the model-based Maximum Likelihood (ML) method implemented in IQ-TREE. As recommended, in the absence of a valid outgroup sequence, the revised tree is now presented as unrooted. Supplemental Figure S4C (Figure 5-figure supplement 1C) and the corresponding text in the manuscript have been updated.
  
  (3) There is a key piece of the puzzle that is currently missing: the structural mechanism behind the mutational effects described in this study (e.g., Figure 5). The authors could leverage AlphaFold to generate structural models of different mutants and conduct molecular dynamics simulations to examine their conformational dynamics.
  
  We thank the reviewer for this excellent suggestion. We generated AlphaFold structural models of the wild-type (WT) and mutant (MU) PxSODC proteins and conducted 100 ns molecular dynamics (MD) simulations using GROMACS 2022.3 at three physiologically relevant temperatures: 15°C (cold stress), 26°C (favorable baseline), and 32°C (heat stress). Using 26°C as the physiological baseline, three key structural parameters support enhanced thermostability of the mutant protein (Figure 5–figure supplement 3). First, RMSD analysis revealed that under heat stress (32°C), the WT underwent severe conformational drift (RMSD increased from the 26°C baseline of 1.62 to 2.49, an increase of 0.87), while MU remained remarkably stable (from 1.59 to 1.66, an increase of only 0.07). Second, MU possessed a significantly more compact structure, with lower SASA values at 15°C (118.39 vs. 127.29 nm²) and 26°C (113.82 vs. 125.61 nm²), indicating optimized hydrophobic core packing. Third, the intramolecular hydrogen bond network of MU demonstrated dual stress resistance: under cold stress, MU actively increased hydrogen bonds from its baseline (113→119), whereas WT lost bonds (117→112); under heat stress, MU fully maintained its bond count (113→113). These results provide a direct structural mechanism for the enhanced catalytic efficiency of the mutant SOD at lower expression levels.
  
  Reviewer #1 (Recommendations for the authors):
  
  (4) The experimental evolution component of this study is described in the text as lasting for three years. It would help if the number of generations per strain were also reported.
  
  We have added the number of generations per strain. Over the three-year period, the hot strain completed ~75 generations and the cold strain ~15 generations. The ancestral strain was continuously maintained at 26°C throughout this period. The revised text has been updated in both the Introduction and Materials and Methods.
  
  (5) In Figure 3B: There is a typo in the word “Statistics”.
  
  Corrected. The typo in “Statistics” in Figure 3B has been fixed.
  
  (6) In Figure 3D: “CS” appears twice.
  
  Corrected. The duplicated “CS” label in Figure 3D has been replaced with the correct label.
  
  (7) Figure 4: This is not accessible to colorblind readers, who will clearly not be able to tell each color apart. As a non-colorblind person, I, too, have trouble figuring out which color label in panel B corresponds to which color in panel A. For example, I do not know off the top of my head how 'blue' differs from 'midnightblue', 'royalblue', or 'skyblue'. I recommend that the authors replace colors with identifiers, such as 'g1' for group 1 and so on.
  
  We appreciate this suggestion. We have replaced all color-based module labels with alphanumeric identifiers (M1, M2, M3, etc.) and added a corresponding legend. The main text and supplementary materials have been updated accordingly.
  
  (8) Lines 246-247: "Its secondary structure mainly consisted of strands, helices and coils." This sentence is redundant. These three are the only possible secondary structural elements, according to most bioinformatics tools such as PSIPRED, which the authors used. This sentence would be more useful if the authors could report the percentage breakdown of each secondary structural element.
  
  We have removed the redundant sentence and updated the text to report the specific percentage breakdown of the secondary structural elements based on our PSIPRED predictions (approximately 55.24% random coils, 16.19% alpha helices, and 28.57% extended strands). The revised text has been updated in the Results section.
  
  (9) Lines 260-261: "This suggests that the PxSODC gene can alter its expression pattern and function in response to environmental change...". I find this sentence a bit imprecise. Would it not be more precise to mention that the expression of this gene is regulated by temperature triggers?
  
  We agree that the original phrasing was imprecise. We have revised the sentence in the manuscript to state: “This suggests that the expression of the PxSODC gene is regulated by temperature triggers, and its altered function contributes to temperature-adaptive evolution in P. xylostella.”
  
  (10) The data points in Figures S1 and S7 are very small and hard to tell apart without zooming in a lot. Perhaps the authors could change the orientation of those pages to landscape and increase the size of the figures.
  
  Done. We have changed the orientation of Supplemental Figures S1 (Figure 1-figure supplement 1) and S7 (Figure 5-figure supplement 4) to landscape and increased the size of the figures and individual data points to improve visibility.
  
  (11) In Figure S2, the panel labeled as 'C' should be 'B' (based on the caption) and vice versa.
  
  Corrected. The panel labels ‘B’ and ‘C’ in Supplemental Figure S2 (Figure 2-figure supplement 1) have been swapped. The Supplementary Materials have been updated accordingly.
  
  Reviewer #2 (Public review):
  
  (1) The paper in its current form is hard to digest and would benefit from improved clarification of the storyline, as well as a tighter integration between the phenotypic, omics, and functional validation data. Currently, it is not always clear what the relevance is of all the reported results, nor why certain decisions were made, or how all the different methods the authors used fit together. For example, the authors functionally validated a second gene, PxDnmt1, but it is unclear why this particular gene was chosen, nor how it relates to their selection regimes when looking at the results obtained with the phenotyping and omics data collection. Seeing how much work the authors did, this makes the paper overwhelming and difficult to read.
  
  We sincerely appreciate this constructive feedback. In the revised manuscript, we have made significant structural revisions to improve the storyline and logical flow. We have streamlined the Results section (moving extensive descriptive data like life table curves and detailed metabolomics of mutant strains to the Appendix 1-3) to focus on the key findings. Furthermore, we have clarified the logical transitions between experiments. For instance, regarding the choice to validate PxDnmt1, we now explicitly explain in the Results that our untargeted metabolomic analysis of the PxSODC mutant strains revealed consistent alterations in 5-hydroxymethyluracil (involved in DNA demethylation) and 5'-deoxyadenosine (a precursor to the primary methyl donor S-adenosylmethionine) across all developmental stages. This specific metabolic signature provided a strong, data-driven hypothesis linking PxSODC function to epigenetic regulation via DNA methylation, prompting us to functionally validate PxDnmt1. By explicitly stating these rationales, the narrative is now much clearer and cohesive.
  
  (2) The authors at times stretch their results too far, as the ecological relevance of their study design and results is not clear, limiting the generalizability and value of the results for understanding species' adaptive potential under climate change. For example, the selection regimes used present the minimum and maximum known temperatures at which the species can survive and develop, but it is unclear how the temperatures relate to the natural environment of the source population, to what extent wild populations might experience these temperatures, and whether they would experience them at the extended duration used (12h at max/min temperature). Moreover, I wonder whether the comparisons made would identify the genes that matter under natural conditions, as unevolved populations were kept under constant conditions compared to 12h:12h temperature regimes for the evolved populations, and the metabolic and transcriptomic profiling was done under a constant favorable 26°C rather than under thermal stress in a, as far as I can tell, randomly chosen life stage (larval stage).
  
  We appreciate the reviewer raising these important points regarding ecological relevance and experimental design. In the revised manuscript, we have added context and acknowledged these limitations in the Methods and Discussion sections. First, regarding ecological relevance: The source population is from Fuzhou, a subtropical region where summer high temperatures frequently exceed 32°C and winter lows can drop below 10°C, making our selection temperatures ecologically relevant extremes for this population. The 12h:12h cycling temperatures were designed to simulate severe but natural diurnal fluctuations.
  
  Second, regarding constant control vs. cycling regimes: The constant 26°C represents the established optimal developmental temperature and standard laboratory condition for P. xylostella. We acknowledge that comparing cycling selection regimes against a constant control might conflate adaptation to absolute temperature extremes with adaptation to thermal fluctuation itself. We have added this as a caveat in the Discussion. Third, regarding omics profiling conditions: The transcriptomic and metabolomic profiling was conducted under common garden conditions (26°C) specifically to identify constitutive, genetically fixed adaptations resulting from evolutionary selection, rather than immediate physiological plasticity under stress. We have clarified these rationales in the text.
  
  (3) The paper in its current form does not adequately describe the statistical analyses underlying the results, nor do the authors share their code, making it very hard to judge whether the analyses used are appropriate and the results trustworthy. I have concerns about the inappropriate use of t-tests, the lack of correcting for confounding variables, and the need for multiple testing corrections.
  
  We sincerely appreciate this concern. In the revised manuscript, we have made substantial improvements to the description of statistical analyses throughout the Methods section:
  
  (1) Statistical methods for each data type are now described separately and in detail, specifying the tests used, the number and type of comparisons, and sample sizes.
  
  (2) For metabolomic data, we have clarified that FDR correction was applied alongside multi-criteria thresholds (|log<sub>2</sub>Fold Change| ≥ 1, VIP ≥ 1, FDR < 0.05). For transcriptomic data, FDR correction (Benjamini and Hochberg, 1995) was applied via DESeq2.
  
  (3) For WGCNA, we have specified the total number of correlation tests (29 modules × 30 metabolites = 870) and the stringent dual threshold (|r| > 0.8, P < 0.05) used to control for false positives, following standard practice.
  
  (4) For life table parameters, the paired bootstrap method with 100,000 replications was used for all pairwise comparisons among strains.
  
  (5) For all other experimental data (qRT-PCR, SOD activity, O<sub>2</sub><sup>-</sup> levels, survival rates, supercooling/freezing points, etc.), we have specified that t-tests were used only for two-group comparisons, while one-way ANOVA with Tukey's or Tamhane's T2 test was used for three or more groups, with non-parametric alternatives applied when normality assumptions were not met.
  
  (6) The raw data have been deposited in public repositories (see Data availability), and all statistical procedures are now described in sufficient detail to enable independent reproduction of the results.
  
  Reviewer #2 (Recommendations for the authors):
  
  Title
  
  (4) I don't feel the title adequately captures the work, I would instead of 'adaptive evolution' use 'experimental evolution' and I would not use the word 'underpins' but instead 'indicates', as it is not clear from your work whether the adaptations to the lab conditions you used would be ecologically relevant nor whether they are involved in thermal adaptation in wild populations.
  
  Accepted. The title has been revised to: “Experimental evolution to thermal stress indicates climate resilience in a cosmopolitan arthropod.”
  
  Abstract
  
  (5a) Please add the phenotype results to the abstract.
  
  We have added key phenotype results to the abstract. The revised text now reads: “The hot strain showed accelerated development, higher fecundity, and increased survival under extreme heat, while the cold strain exhibited lower supercooling and freezing points, indicating enhanced cold hardiness.”
  
  (6b) The Abstract doesn't really detail the answer to your research question yet: so what insights into the genetic mechanisms underlying thermal adaptation did you gain that are novel?
  
  We agree. We have revised the Abstract to explicitly highlight the novel genetic and molecular mechanisms we discovered. Specifically, we now detail that thermal adaptation is driven by a coordinated mutational, metabolic, and epigenetic (1) an energy-efficient genetic mechanism where non-synonymous mutations in PxSODC enhance superoxide scavenging efficiency, enabling effective oxidative stress management at lower gene expression levels; (2) convergent metabolic adjustments, notably a reduction in lipid metabolism to conserve energy; and (3) epigenetic regulation of thermal tolerance via DNA methylation. The revised text has been updated in the Abstract accordingly.
  
  (7c) Line 3: replace 'ectotherms' with 'arthropods' to match the title?
  
  Done. “Terrestrial ectotherms” has been replaced with “terrestrial arthropods” in the abstract.
  
  (8d) Line 9: replace 'demographic' with 'life history'?
  
  Done. “Demographic” has been replaced with “life history” in the abstract.
  
  Introduction
  
  (9a) The storyline is a bit unclear. Do you want to focus on the increased threat from insect pests under climate change or on the threat of climate change on insect persistence? Please pick one and adapt your storyline accordingly. I would suggest focusing on the first and talking more about the range extension of pest species under climate change (which would also require adaptation to cold extremes).
  
  We agree and have refocused the Introduction on the increased threat from insect pests under climate change, emphasizing that range expansion into new regions requires adaptation to both heat and cold extremes. Both the first and second paragraphs have been revised accordingly.
  
  (10b) Line 31-33: What do you mean by 'shows a positive relationship between the thermal tolerance range and the level of climatic variability'? Are they able to tolerate a larger range of temperatures?
  
  This sentence has been revised as part of the restructured Introduction, which now focuses on the range expansion of pest species under climate change. The revised text reads: “Such range expansion requires adaptation not only to warmer conditions in existing habitats but also to cold extremes encountered during colonization of higher latitudes or elevations (Harvey et al., 2020).”
  
  (11c) Line 33-35: Is this information relevant here?
  
  Agreed. This sentence has been removed as part of the restructured Introduction, which now focuses on the threat of pest range expansion under climate change.
  
  (12d) Line 55-56: What exactly do we not know yet about the mechanisms that enable thermal adaptation that you aim to fill in this paper? Please rephrase your knowledge gap to be more concrete (e.g., "but we do not yet know how...").
  
  We have rephrased the knowledge gap to be more concrete and aligned with the revised storyline. The revised text now reads: “...we do not yet know how long-term thermal selection drives coordinated changes across gene function, metabolic networks, and life history traits to enable thermal adaptation and range expansion in pest species.”
  
  (13e) Line 57: Also, here, the storyline is unclear. Why did you use the diamondback moth as your model species? You provide many different reasons, but it would help if you emphasized one reason that is in line with whichever storyline you want to focus on: is it because it is an insect pest that can tolerate a wide range of temperatures?
  
  We have streamlined this paragraph to focus on the primary rationale: P. xylostella is a globally distributed pest that thrives across a wide range of thermal environments, making it an ideal model for studying the genetic mechanisms of thermal adaptation. Supporting details on genomic resources are retained briefly as they enable the multi-omics approach used in this study.
  
  (14f) Line 65: Demonstrated how? Please give a short summary of the evidence for their genetic capacity to tolerate future climates.
  
  We have added a brief summary of the evidence. Specifically, genome-wide SNP analysis of field populations from 114 locations across diverse biogeographical zones revealed climate-adaptive genetic variability, indicating that P. xylostella can tolerate projected future climates in most regions (Chen et al., 2021).
  
  (15g) Line 72: What does 'Age-stage' mean? Should it read 'Aged-staged'?
  
  “Age-stage, two-sex life table” is an established demographic method developed by Chi (1988) that simultaneously accounts for both age and developmental stage in both sexes. This is a standard term in the field (Chi et al., 2020), so we have retained the original wording but added a brief clarification upon first use.
  
  (16h) Line 78-80: This needs a bit more explanation. Why does an increased ability to scavenge superoxide anions affect adaptability under extreme temperature environments?
  
  We have added a brief explanation. Extreme temperatures induce oxidative stress by elevating intracellular reactive oxygen species (ROS), including superoxide anions, which can damage cellular structures. Enhanced scavenging capacity thus helps maintain cellular homeostasis under thermal stress.
  
  (i) Line 82-86: Please be more precise. What novel insights did you gain about the genetic mechanisms underlying thermal adaptation?
  
  We have revised this sentence to more precisely summarize the novel insights, encompassing both the multi-omics findings and the functional validation of PxSODC.
  
  Results
  
  (18a) The results section is very long and presents an overload of information at the moment, overwhelming the reader. Consider moving some sections to the Supplements (for example, a large part of the phenotypic data that cannot be linked to the omics data and the metabolic profiling of the mutant strains) or leave them out of the paper altogether.
  
  We agree that the Results section was too dense. We have streamlined it by moving the following content to the Supplementary Materials:
  
  (1) Detailed age-stage survival and fecundity curve data for the ancestral, hot and cold strains (Supplementary Text S1).
  
  (2) Detailed life table analysis of the PxSODC mutant strains (Supplementary Text S2).
  
  (3) Detailed untargeted metabolomic profiling of the SODC-MU mutant strains across developmental stages (Supplementary Text S3).
  
  The main text now retains only the key life history comparisons, extreme temperature tolerance results, omics-based evidence linking transcriptomics and metabolomics, functional validation of PxSODC, and the DNA methylation findings, with brief summaries and cross-references to the Supplements for supporting details.
  
  (19b) Please also provide the effect sizes for the different effects you report, for example, how many degrees difference was there between ancestral and cold strains in the supercooling/freezing points, and what was the variation?
  
  We have added specific effect sizes (mean ± SEM and between-group differences) for all key comparisons throughout the Results section, including preadult duration, stage-specific survival rates under extreme heat, supercooling/freezing points, and SODC-MU mutant strain comparisons. For example, the supercooling points of CS pupae (-23.99 ± 0.18°C) were 0.90°C lower than AS (-23.09 ± 0.26°C), and the freezing points were 2.66°C lower (-14.24 ± 0.61°C vs. -11.58 ± 0.52°C). Please refer to the revised manuscript for all updated values.
  
  (20c) Line 93-94: "Intrinsic and finite rate of increase" of what?
  
  Clarified. These are population growth parameters. The revised text now specifies “intrinsic rate of increase (r) and finite rate of increase (λ) of the population.”
  
  (21d) Line 98-99: Please start the paragraph with this summary of the results and then further detail them.
  
  We have restructured this paragraph by moving the summary sentence to the beginning, followed by the supporting details.
  
  (22e) Line 100-109: Why did you look at daily survival and fecundity rates? Please add why this is relevant.
  
  As part of the overall streamlining of the Results section, this paragraph on detailed age-stage survival and fecundity curves has been moved to Supplementary Text S1. A brief justification for their relevance has been added there, noting that these curves capture stage-specific variation in survival and fecundity that summary life table parameters alone may obscure.
  
  (23f) Line 106: What do HS, AS, and CS stand for? And please provide the statistics for comparison of daily survival rates between the strains.
  
  We have defined the abbreviations (HS = hot strain, AS = ancestral strain, CS = cold strain) at their first appearance in the Results section. This paragraph on daily survival and fecundity has been moved to Supplementary Text S1, where the abbreviations are also defined. The survival rates reported are the maximum daily survival rates derived from the age-stage specific survival rate curves (s<sub>xj</sub>), and the statistical comparisons among strains are presented in Supplemental Table S1.
  
  (24g) Line 144-146: Why are these differential metabolites likely to play a crucial role?
  
  We agree this statement was speculative. It has been removed from the revised manuscript.
  
  (25h) Line 159-161: Why is a reduction of lipid metabolites evidence for adaptive evolution?
  
  We have revised this sentence to clarify the reasoning. The reduction in lipid metabolites in both independently evolved hot and cold strains suggests a convergent metabolic response, indicating that lipid metabolism adjustment is a shared adaptive strategy rather than a random change.
  
  (26i) Line 184-185: It is difficult to judge from Figure 3E the extent of overlap in KEGG pathways between the hot and cold strains. Can you adjust the figure to emphasize that overlap more?
  
  Agree. To intuitively emphasize the extent of overlap in KEGG pathways between the hot and cold strains, we have completely redesigned Figure 3E. Instead of presenting two separate panels with unaligned vertical axes, we have consolidated the data into a single back-to-back (mirrored) bar chart with a shared central y-axis.
  
  (27j) Line 211: Not only the red module, but also the blue and green module correlates with many of the shared differential metabolites.
  
  We agree. We have revised the text to acknowledge that the blue and green modules also showed strong correlations with shared differential metabolites, while noting that the red module had the highest number of significantly correlated metabolites and was therefore selected for further analysis.
  
  (28k) Line 215: I would rephrase this as genes being interesting candidates for being involved in thermal adaptation or 'seem to be important for the adaptation of...', as you don't know from these results whether these genes play a critical regulatory role.
  
  Agreed. We have toned down the language to reflect the correlative nature of these results.
  
  (29l) Line 233: Do you mean that you further analyzed 15 genes of the 79 identified candidate genes in the previous paragraph?
  
  Yes, exactly. From the 79 candidate genes, we selected 15 that were both annotated in the genome and had high expression levels (FPKM > 10) for further analysis. We have clarified this in the revised manuscript.
  
  (30m) Line 238: What does SOD stand for?
  
  We have spelled out the abbreviation upon first use in this section.
  
  (31n) Line 254-255: Please provide the stats for this result.
  
  We have added the specific allele frequencies for each strain. The Leu194-Met194 mutation frequency was determined by direct sequencing of 10 individuals per strain, and the frequencies are now reported in the revised text.
  
  (32o) Line 303-304: How did you test for enhanced stability to temperature fluctuations? And enhanced compared to what?
  
  This observation was based on the survival rate data in Figure 5C, where mutant pupae at 43°C showed no significant difference from the ancestral strain, whereas other life stages (eggs, larvae, adults) at 42°C showed significantly reduced survival in the mutant strains. We have revised the text to clarify the comparison.
  
  (33p) Line 324-326: Why do decreased expression levels demonstrate increased O₂⁻ scavenging capacity? And why is that beneficial for adaptation to thermal stress? Please explain.
  
  We have revised this sentence to clarify the logic. The non-synonymous mutations in the hot and cold strains likely alter the protein conformation of SOD enzymes, increasing their catalytic efficiency per molecule. This allows effective O<sub>2</sub><sup>-</sup> scavenging at lower expression levels, which is energetically favorable under thermal stress where energy conservation is critical for survival.
  
  (34q) Line 404-406: I'm confused. Is there a direct link between the gene you knocked out here and the results you presented up until now? How do the reduced levels of 5-methylcytosine relate to the metabolite results you present at the beginning of the paragraph, other than that both could be involved in DNA methylation?
  
  We have revised this paragraph to clarify the logical chain. Among the three metabolites consistently altered across all developmental stages in the SODC-MU strains, 5-hydroxymethyluracil is involved in dynamic DNA demethylation and 5'-deoxyadenosine is a precursor to S-adenosylmethionine (the methyl donor for DNA methylation). This suggested a link between PxSODC deletion and DNA methylation. To test this, we examined PxDnmt1 expression and activity in the thermally adapted strains and found both were significantly reduced. We then used RNAi to silence PxDnmt1 and confirmed that reduced DNA methylation (lower 5-mC levels) directly impaired thermal tolerance. Thus the connection is: PxSODC deletion → altered methylation-related metabolites → reduced DNA methyltransferase activity → decreased thermal tolerance.
  
  (35r) Line 410: Saying that your knockdown of a gene that did not directly pop up in any of your other analyses confirms that DNA methyltransferase is associated with the response to thermal selection is a stretch. Please rephrase.
  
  We agree this was overstated. We have toned down the language to reflect that the RNAi results provide preliminary evidence for a potential role of DNA methylation in thermal tolerance, rather than confirmation.
  
  Discussion
  
  (36a) The phenotype data are currently not discussed at all. Please add it to the discussion and try to integrate it more with the omics data you collected.
  
  We agree. To provide a cohesive narrative and avoid redundancy, we have addressed this comment in conjunction with our pathway interpretation (please see our response to Reviewer 1, Comment 1). In the revised Discussion, we explicitly integrated our specific phenotypic findings (e.g., accelerated development, increased fecundity, and heat survival in the hot strain; prolonged lifespan and lowered supercooling points in the cold strain) with the distinct transcriptomic and metabolomic profiles. This integration demonstrates how molecular and metabolic rewiring directly underpins the divergent life-history traits without engaging in unwarranted speculation.
  
  (37b) Line 433-434: I don't think this adequately represents the relevance of your particular study. I would suggest changing it to be more in line with the storyline of understanding the capacity for global dispersal in insect pests under climate change.
  
  We agree. We have revised this sentence to align with the storyline of pest range expansion under climate change.
  
  (38c) Line 476: This is a very odd statement; don't all species' genomes have genes encoding proteins involved in thermal adaptation? The reference also doesn't seem to be appropriate. I would suggest deleting this sentence.
  
  Agreed. This sentence has been removed.
  
  (39d) Line 483: Please write out SOD the first time you use it in a new section.
  
  Done. SOD has been spelled out at its first use in the Discussion.
  
  (40e) Line 544-548: This is a bit too specific to be the last sentence of the discussion. Try to formulate it more broadly in terms of what future research should focus on in general, not just your specific research.
  
  We agree. We have broadened the final sentence to address future research directions more generally.
  
  Figures
  
  (41a) Figure 1A: I don't think t-tests are appropriate here since you are not simply comparing two treatments, but testing for the effects of 5-6 different temperatures. And how did you correct for replicate populations in your analysis?
  
  Clarified. In Figure 1A, our comparisons are independent pairwise tests between exactly two strains (HS vs. AS) at each specific temperature and time point, making t-tests statistically appropriate. We were not testing for a continuous effect across temperatures. Regarding replicate populations, the individuals used in these assays were drawn from across the six replicate populations per treatment, with each biological replicate (n = 6, with 20 individuals per replicate) comprising individuals pooled from across the replicate populations to account for inter-population variation. We have clarified this in the revised figure legend.
  
  (42b) Figure 1B, Figure 5D, Figure 7: bar graphs are used for count data, so do the data represent the number of individuals with a certain trait value? If they are instead showing the mean of the population/treatment group, please use mean points ± standard errors instead.
  
  Accepted. The data in these figures represent continuous physiological traits (e.g., supercooling/freezing points) showing the mean of the populations, rather than count data. To align with current data visualization standards for continuous variables and to provide full transparency of the underlying data distribution, we have replaced the bar graphs in Figures 1B, 5D, and 7 with scatter plots. These revised figures now display the mean ± SEM overlaid with all individual biological replicate data points.
  
  (43c) Figure 3B: There is a typo in the graph, it reads 'Stattistics' instead of 'Statistics'.
  
  Corrected. The typo ‘Stattistics’ in Figure 3B has been fixed.
  
  (44d) Figure 3C: I don't understand what the colors of the graph mean here. Is it the average differential expression of each replicate compared to the ancestral?
  
  Clarified. We have updated the figure legend to explain that the colors represent the Pearson correlation coefficient (r) between pairs of biological replicates, indicating the degree of transcriptomic similarity among samples.
  
  Methods
  
  (45a) Please start each new methods paragraph with the purpose of the method/analysis, for example, "To investigate XX, we used method X to measure X". It is at the moment hard to understand why certain things were done.
  
  We agree. We have revised each Methods paragraph to begin with a clear statement of purpose, so that the rationale for each analysis is immediately apparent. All changes are shown in the revised manuscript.
  
  (46b) Line 575-578: Why were the selection regimes with cycling temperatures and the control with constant?
  
  The cycling temperatures in the hot (32°C/27°C) and cold (15°C/10°C) regimes were designed to simulate diurnal temperature fluctuations (12h light/12h dark) that more closely reflect natural thermal environments. The control was maintained at a constant 26°C, which is the established optimal developmental temperature for P. xylostella (Liu et al., 2002) and represents the standard laboratory rearing condition. We acknowledge this asymmetry and have added a justification in the revised manuscript.
  
  (47c) Line 581: How many generations was the ancestral population kept in the lab before the start of the selection experiment? And for how many generations were the populations selected?
  
  The ancestral population was maintained in the laboratory for approximately ~170 generations (from July 2012 to the start of the selection experiment) before the thermal selection began. The hot strain was selected for ~75 generations and the cold strain for ~15 generations over the three-year experiment. We have added this information to the revised manuscript.
  
  (48d) Line 585-586: I don't understand what you mean by randomly selecting six replicate populations per treatment for downstream experiments when you only had six replicate populations per treatment to begin with (as detailed in Line 574)?
  
  We apologize for the confusion. All six replicate populations per treatment were used for downstream experiments. We have corrected this sentence to remove the misleading “randomly selected” wording.
  
  (49e) Line 590: Were these 90 eggs also randomly selected, like for the individual life tables? And were these kept at the baseline temperature conditions?
  
  Yes, the 90 eggs were randomly selected and maintained under the baseline favorable temperature (26°C). We have clarified this in the revised manuscript.
  
  (50f) Line 606: Which life history and population fitness parameters were calculated?
  
  We have specified all parameters calculated in the revised manuscript.
  
  (51g) Line 609: Link to software doesn't work.
  
  We have updated the software link to the current working URL.
  
  (52h) Line 611: Please spell out what 'BT' stands for.
  
  Done. “BT” has been spelled out as “bootstrap” upon first use.
  
  (53i) Line 612-613: How many tests did you do? Did you correct for multiple testing? Using what method?
  
  The paired bootstrap method implemented in TWOSEX-MSChart inherently accounts for multiple pairwise comparisons through 100,000 bootstrap replications. We have clarified the scope of comparisons in the revised manuscript.
  
  (54j) Line 620-621: What does biological replicate mean here? Individual eggs / larvae / pupae / adults, or were all or some life stages pooled? Also, you now only detailed which samples were collected for metabolomic profiling, were the same samples used for transcriptomic profiling, or a subset?
  
  Each biological replicate consisted of pooled individuals at the same developmental stage. The same sample collection strategy was used for both metabolomic and transcriptomic profiling, but from independent biological replicates (six for metabolomics, three for transcriptomics). We have clarified this in the revised manuscript.
  
  (55k) Line 637: Also here, how many tests did you do? Were p-values corrected for multiple testing? Using what method?
  
  Differential metabolites were identified through pairwise comparisons using Student's t-test with FDR correction for multiple testing. A multi-criteria threshold of |log<sub>2</sub>Fold Change| ≥ 1, VIP ≥ 1, and FDR < 0.05 was applied. This approach was used for all metabolomic comparisons, including HS vs. AS, CS vs. AS, and SODC-MU vs. AS. We have clarified this in the revised manuscript.
  
  (56l) Line 662: And here: how many tests did you do? Did you correct for multiple testing? Using what method?
  
  In the WGCNA analysis, Pearson correlations were calculated between each module eigengene and each of the 30 common differential metabolites, resulting in a total of 29 × 30 = 870 correlation tests. Following standard WGCNA practice, rather than applying FDR correction, we used a stringent dual threshold of |correlation coefficient| > 0.8 and P < 0.05 to identify significant module-metabolite associations, which effectively controls for false positives (Langfelder and Horvath, 2008). We have clarified this in the revised manuscript.
  
  (57m) Line 663: How did you select these modules? The ones that significantly correlated with differential metabolites? Why did you not use the phenotype data here?
  
  Modules were selected based on significant correlations (|correlation coefficient| > 0.8, P < 0.05) with differential metabolites shared between the hot and cold strains. We chose metabolites rather than phenotype data as the trait input for WGCNA because metabolites serve as intermediate molecular phenotypes that bridge gene expression and organismal phenotypes, providing a more direct link to the underlying regulatory mechanisms. This approach allowed us to identify gene modules most closely associated with the metabolic changes driven by thermal adaptation, which could then be connected to the observed life history and fitness divergence.
  
  (58n) Line 666: move RNA extraction details to before RNAseq methods description.
  
  Done. The “RNA extraction and cDNA synthesis” section has been relocated to before the “Transcriptomic profiling” section for better logical flow.
  
  (59o) Line 836: This paragraph describing the statistics is very short, and it is unclear to what data the described analyses apply. As the different types of data are very different, I expect the analyses to differ as well. Please describe the statistical analyses for each data type in more detail, specifying what tests you used, which, and how many comparisons were performed.
  
  We agree. The statistical methods for life table analysis, metabolomics, and transcriptomics have been detailed in their respective method sections. We have expanded the Data analysis section to specify the statistical tests for the remaining experimental data.
  
  (60p) Line 837: Please include your SPSS scripts to ensure the reproducibility of your results.
  
  The statistical analyses in SPSS were performed using the graphical user interface. As all statistical tests, parameters, and comparison groups have been described in detail in the revised Methods section, and the raw data have been deposited in public repositories (see Data availability), we believe the analyses are fully reproducible. We are happy to provide additional details if needed.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.16.699875v3
www.biorxiv.org www.biorxiv.org

Conserved assembly architecture of the essential herpesvirus packaging accessory factor

1
1. EMBOpress 04 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1
  
  Minor comments 1) The authors suggest that the weak 4th protomer in the HCMV UL52 3-mer map is a consequence of flexibility. This may be the case, but it may also be the case that the class is polluted with 4-mer particles leading to reduced occupancy. Erasing the weak density and running a multi-model 3D classification providing the erased 3-mer and a 4-mer starting map may separate these.
  
  We performed additional analysis (i.e., 3-mer and 4-mer particles were combined into a multi-class ab initio reconstruction followed by multi-class heterogenous refinement) and found that the original 3-mer map was a mixture of 3-mer and 4-mer states.
  
  We have updated Fig. 2a, Supplementary Fig. 2, Supplementary Fig. 3, Supplementary Table 1, Supplementary Movie 1, and removed the discussion of the weak protomer in the 3-mer map from the results section. We have updated our EMDB and PDB depositions accordingly.
  
  2) I found the supplemental figure to show the DNA in the tripentamer map too small, this is an interesting finding and should be shown more clearly.*
  
  We have increased the size of Supplementary Fig. 6 and moved the figure caption to another page to accommodate this enlargement.
  
  Reviewer #2
  
  *Major issues 1) There is a high probability that the tripentamer is an artifact of the cross-linking. Because of this, it'd be great to know more about the cross-linking reaction, ideally mass spec identification and quantification of cross-links. This would also address the authors' speculation of contacts that stabilize the tripentamer. *
  
  Crosslinking is a commonly used technique to stabilize complexes that are observed through other means but do not survive the cryo-EM vitrification process. In an EMSA experiment (Supplementary Fig. 4a), UL32 binds 30 bp DNA and migrates slower than when bound to a 10 bp probe, consistent with formation of a supra-pentameric complex. The samples in the EMSA gels are not crosslinked. Additionally, an SDS-PAGE gel of the crosslinked product used for cryo-EM showed tight bands at molecular weights expected for oligomers, supporting specific crosslinking (Supplementary Fig. 4b). These results suggest that crosslinking stabilizes a species that can form but is relatively unstable in solution.
  
  Moreover, the author's claim "However, mutation of K532A/C535A reduced infectious virion production by half (Fig. 4b), suggesting that the tripentamer interface may play a role in the viral life cycle." Seems to be an overreach. Perhaps this is semantics but the data just show that these residues play a role in viral replication (albeit not a huge role based on the modest effect).
  
  We have modified the title of the results section (Line 216-217) to state that "Residues at the tripentamer interfaces contribute to infectious virion production in HSV-1" as well as Line 234 and 241 to indicate that the residues play a role in the viral life cycle.
  
  2) The density for the potential DNA does not look very convincing, although it still remains the strongest hypothesis. The authors should try to strengthen their argument. Does this putative DNA contact residues that they show are necessary for viral replication? Showing seq conservation on the structure could help their argument for the shared function of DNA-binding.
  
  The DNA likely contacts conserved residues at the base and midsection of the central channel (residues R302, R301, R293, K289, R580, R579, R572; see Fig. 6a). We have shown that these residues are important for the production of infectious virions (Fig. 6c): even a single point mutation (R572A) decreased production of infectious virus particles by more than 90%, and double and triple point mutants (R579A/R580A, K289A/R293A/R301A) eliminated production of infectious virus. Sequence conservation of these charged residues in the central channel regions is shown in Supplementary Fig. 1d, f.
  
  3) My last major issue is stylistic and concerns the descriptions of cryoEM structures. I found that the paper was a bit of challenge to read when the authors would introduce each structure. It was a bit of a slog to get through. Descriptions of the structures veered off into overly detailed comparisons that required constant comparison with the figure and didn't really advance my understanding past "the outer surfaces of the three orthologs are different." This masked the more interesting aspects of the authors' findings. Perhaps this could be summarized in supplementary figures or a table. Because this is a stylistic suggestion, the authors should feel free to ignore this request.
  
  We appreciate the reviewer's concerns about accessibility, but we are excited that these structures allowed us to thoroughly describe the convergent and divergent structural features across the Herpesviridae and hope that our in-depth analysis will allow for detailed mechanistic follow-up.
  
  *Minor comments 1) The descriptions of structure determination in the text were often unclear. For example, "In the 3-mer map, a poorly-resolved fourth protomer is visible at low contour levels, suggesting that an additional protomer is present but highly flexible in this class (Supplementary Fig. 3a)." Alternatively, it could be that the classification algorithm wasn't able to fully separate particles that were 3-mers from the 4mers. *
  
  The reviewer is correct. As described above (Reviewer #1 comment 1), we performed additional analysis and found that the original 3-mer map was a mixture of 3-mer and 4-mer states. We have updated Fig. 2a, Supplementary Fig. 2, Supplementary Fig. 3, Supplementary Table 1, Supplementary Movie 1, the EMDB and PDB depositions, and removed the discussion of the weak protomer in the 3-mer map from the results section.
  
  *When describing the structure determination of the HSV1 accessory factor, the authors describe no other particles other than the tripentamer. Were there other particles observed? It'd be a bit surprising that all of the protein adopted the tripentamer state. *
  
  We agree that this result is striking. We picked particles using a 'blob picker' to avoid introducing template bias and found that the tripentamer is the predominant species. Below we show the results of 2D classification of blob picked particles (classes sorted by particle number; obvious junk classes excluded for clarity). There is one class that suggests a pentamer, but template picking with a pentamer template (based on ORF68) did not yield a pentamer class.
  
  Additionally, as we describe in the results section and show in Supplementary Fig. 6a, further processing of the consensus UL32 map showed that 60% of particles formed a complete tripentamer (i.e., 15-mer) while other the remaining 40% formed incomplete tripentamers, missing one or more protomers (e.g., 17% of particles formed a 14-mer).
  
  Was symmetry applied, particularly for the tripentamer that appears to have C-3 symmetry? This is in materials and methods but not clear why it isn't mentioned when describing the structure determination and results.
  
  No symmetry was applied in the reconstruction for either UL32 or UL52. While we previously noted this in the methods section and in Supplementary Table 1, we have added this information to the results section (Line 169-170), the Fig. 3 legend, and cryo-EM processing figures (Supplementary Figures 2, 5, 6) for clarity.
  
  2) Throughout the paper, the authors use the word "remodel" to describe structural differences between orthologs. However, this word usually carries the implication of conformational rearrangement within a protein, and not across orthologs. Please consider a different description.
  
  We agree with the reviewer and have removed the term "remodel" throughout the manuscript text (i.e., Lines 116, 118, 120, 122, 302, 306) and from Supplementary Figures 1, 3, and 5.
  
  3) Figure 2F is confusing and difficult to interpret. It seems that the main point is that these interfaces are conserved, which might be more easily displayed as a standard sequence conservation score mapped onto the structure. I'm also not sure that this figure is necessary as a main figure and could be supplemental.
  
  We agree that the conservation could also be shown this way and have added labels to universally conserved residues of the protomer interface to Supplementary Fig. 1b, c. We have also moved Fig. 2f to the supplement (now Supplementary Fig. 2g).
  
  4) The authors write "UL32 bound to the shortest probe tested (10 bp, Supplementary Fig. 4a)." This implies that ONLY the shortest probe is bound and that others are not bound. Consider rephrasing.*
  
  We have rephrased to clarify at all probes tested, included the shortest, bound DNA (Line 153).
  
  5) Frustum is misspellt. ;)*
  
  Thank you. Spelling has been corrected (Line 185).
  
  6) In the discussion, the authors speculate that the variability of the outer surface is due to "virus- or host-specific interactions". I'm confused by "host-specific interactions", because the host is the same for all three viruses. Perhaps the authors mean that the different accessory factors could interact with different host factors? If so, are the authors making a Red Queen argument? If so, it'd be pretty cool to do dN/dS analysis to test that hypothesis.
  
  The reviewer is correct in that all three viruses (HSV-1, HCMV, KSHV) infect the same host; however, they replicate in different cell types, which could potentially express different host factors. We have no evidence to support this hypothesis and intended to propose that UL32 and UL52 may be interacting/co-evolving with other viral factors required for genome packaging. We have clarified Line 308 to generalize that "these regions are involved in virus-specific interactions".
  
  To me, this window into evolution of this factor is the biggest advance of the work, and tbh I felt that the authors could lean into this a bit more in the discussion section. Are there any differences in the packaging mechanisms of the different herpes families that can be related to their different behavior? Any other molecular evolution analyses (e.g. dN/dS ratio analysis) that could inform their study?
  
  We agree that understanding the evolution of the packaging accessory factor is an interesting future area of research. There are differences in capsid structure and occupancy of capsid-associated factors across the herpesvirus family (PMID: 34696343). However, we lack a mechanistic (or structural) understanding of viral genome packaging components across the herpesviruses, raising the possibility that there are differences in packaging mechanisms.
  
  Interestingly, the further diverged alloherpesviruses and malacoherpesviruses (other families in the order Herpesvirales) do not appear to encode a factor with similar predicted structure to the Herpesviridae packaging accessory factor (PMID: 41902279). It is unclear how the mechanism of packaging differs in the Orthoherpesviridae and whether replication in mammalian/avian/reptilian cells places additional evolutionary pressure on the viral genome packaging mechanism.
  
  Reviewer #3
  
  Major comments
  
  *1) [I]t is not clear whether the structures presented in the manuscript reflect those produced during HCMV or HSV-1 infection. *
  
  We agree with the reviewer that it is important to consider to what extent purified biomolecules resemble their in vivo counterparts. This criticism can be applied to any ex situ structural analysis. However, our experimental structures allowed us to make testable observations, including the correct assignment of structurally important zinc fingers and the identification of functionally important residues in the central channel.
  
  2) HCMV UL52 was presented to form two distinct structures, a 3-mer and a 4-mer (Fig. 2a). However, the authors acknowledge that the 3-mer is actually a 4-mer when the threshold for the cryo-EM map is lowered. The density is also visible in the PDB validation report for the 3-mer; EMD-74418.
  
  Reviewers #1 and #2 were also curious about the 3-mer. As described above, we performed additional analysis that showed that the original 3-mer map was a mixture of 3-mer and 4-mer states. We have updated Fig. 2a, Supplementary Fig. 2, Supplementary Fig. 3, Supplementary Table 1, Supplementary Movie 1, EMDB and PDB depositions, and removed the discussion of the weak protomer in the 3-mer map from the results section.
  
  *Given that ORF68, BFLF1, and UL32 (Didychuk et al., 2021) form complete pentamer rings, with BFLF1 forming stacked rings, it would seem odd for a protein with conserved function to deviate from a pentamer configuration, suggesting that the structures reported do not reflect the natively produced and functional protein. *
  
  We agree that this is a surprising finding; we initially anticipated that UL32 and UL52 would also form stable pentameric rings. While this study does not resolve a complete mechanism for this factor, it does provide the first structural evidence for the implications of their poor sequence conservation and lack of complementarity.
  
  Furthermore, this is not the first example of a conserved herpesvirus factor that possesses different oligomeric states across different subfamily homologs. As mentioned in the discussion, herpesvirus encode a sliding clamp processivity factor (HSV-1 UL42/HCMV UL44/KSHV ORF59) that shares a common PCNA-like fold, but which has varied oligomeric state across these herpesviruses.
  
  *3) Unlike ORF68 (Didychuk et al., 2021) and UL32 (Suppl. Fig. 4), dsDNA binding experiments were not performed with UL52. Could the partial pentamers simply be poorly formed due to expression in insect cells (mammalian cells were used for protein purification in Didychuk et al., 2021), absence of dsDNA, or inappropriate buffer conditions? Moreover, were the EM grid and vitrification parameters optimized? Grid geometries and chemistries can have profound effects of protein stability especially in the context of the air-water interface, leading to degradation of protein complexes (Glaeser, 2018; D'Imprima et al., 2019). Does UL52 form complexes with dsDNA? Data are shown for the HSV-1 packaging accessory factor. Perhaps dsDNA would stabilize the UL52 pentamer. *
  
  We have purified ORF68 and homologs from both human and insect cell expression systems, and do not observe changes in oligomeric behavior. We find that ORF68 purified as a stable pentamer from human cells (Didychuk eLife 2021) and from insect cells (this work). We have also recombinantly expressed and purified UL32 from human cells. UL32 was largely monomeric after strep affinity purification (chromatogram below, unpublished), as we report from insect cells (this work, Fig. 1c). We switched to insect cell expression systems because of the easier scalability.
  
  Our SEC-MALS data (Fig. 1d) shows that purified UL52 does not oligomerize into a pentamer in solution, so the observed sub-pentameric (3-mer/4-mer) assemblies are unlikely to be an artifact of cryo-EM freezing conditions or the air-water interface. We have not tested if UL52 forms complexes with dsDNA, although it likely does; it is possible that this interaction would stabilize a pentamer.
  
  4) In Didychuk et al., 2021, HSV UL32 is shown to form pentameric rings; negative stained 2D class averages were generated from tagged protein (twin strep tag), produced in mammalian cells (HEK293T), and not purified using size exclusion chromatography. In the present study HSV UL32 was not observed to form pentameric complexes "We first attempted to visualize the pentameric species by negative stain electron microscopy but were unable to identify particles of the expected dimensions." However, it is not clear why this was the case. If the pentameric structures were readily produced in previous experiments, why was cross-linking needed in the current study? As such, the tripentamer complexes seem artifactual in nature.
  
  While a sufficient number of particles were observed in a pentameric state to do 2D class averages in the eLife paper, this was not the dominant state. The results we report in this work are consistent with those reported in the eLife paper. Reviewer #2 (comment #1) was also concerned about the possibility of a crosslinking artifact: we reproduce our response below:
  
  "Crosslinking is a commonly used technique to stabilize complexes that are observed through other means but do not survive the cryo-EM vitrification process. In an EMSA experiment (Supplementary Fig. 4a), UL32 binds 30 bp DNA and migrates slower than when bound to a 10 bp probe, consistent with formation of a supra-pentameric complex. The samples in the EMSA gels are not crosslinked. Additionally, an SDS-PAGE gel of the crosslinked product used for EM showed tight bands, supporting specific crosslinking (Supplementary Fig. 4b). These results suggest that crosslinking stabilizes a species that can form but is relatively unstable in solution."
  
  We have updated Line 148 to clarify this. We have also included a negative stain micrograph, below, in which UL32 pentamers (purified from insect cells) are visible in the absence of crosslinking.
  
  5) Although the data presented in Fig. 4b suggest that interface residues, K532 and C535, might play a role in the formation of the tripentamer and have a minor role in HSV-1 replication, these experiments are incomplete. Single mutations are needed for each residue to assess their individual contribution to tripentamer formation, evidence for a loss of tripentamer formation is needed, and evidence for protein expression is needed.
  
  We agree that we have not unambiguously defined the role of the tripentamer, the precise contributions of residues K532 and C535, or defined the contribution of the tripentamer to HSV-1 viral replication. We seek to report this novel structure to lay the basis for future mechanistic work. Reviewer #2 (comment 1) also questioned the role of these residues in HSV-1 replication, and we addressed this by modifying the title of the results section (Line 216) to state that "Residues at the tripentamer interfaces contribute to infectious virion production in HSV-1" as well as Line 246 and 253 to indicate that the residues play a role in the viral life cycle.
  
  Please refer to Supplementary Fig. 7e for a western blot showing that these mutants do not impact UL32 expression. We included explicit references to UL32 expression on Lines 239 and 288.
  
  *6) In the previous negative stain electron micrographs reported by Didychuk et al., 2021, were the higher order tripentamer complexes seen? *
  
  We did not observed tripentamers in the Didychuk et al. 2021 negative dataset. Tripentamer formation may be concentration dependent. Negative stain EM carried out at nanomolar concentrations would likely cause dissociation of tripentamers, but cryo-EM and EMSA in our work were carried out at micromolar concentrations and were able to capture the higher order tripentamer.
  
  7) Formation of disulphide bonds between cysteine residues in vitro is not indicative of complexes forming in vivo during replication. What evidence is there for disulphide bond formation between packaging accessory factor pentamers for KSHV, EBV, and LCMV? In the present study, the disulphide bond could form due to proximity as a result of the cross-linking and the presence of molecular oxygen rather than a bona fide enzyme catalysed reaction during herpesvirus replication to generate packaging accessory factor tripentamers. *
  
  We agree that it is unlikely that disulfide bonds form during infection and have removed this speculation from the manuscript (Line 343-346).
  
  8) The DNA densities in Suppl. Fig. 6e to 6g are curious. As noted by the authors, the 30mer dsDNAs do not traverse through the central cavity of the pentamer. They appear to make contact with neighboring pentamers, again suggesting that these complexes are artefacts from cross-linking. This should be discussed more thoroughly.
  
  Please refer to above discussion of crosslinking and Supplementary Fig. 4.
  
  9) Previously proposed functional roles for ORF68 include a scaffold for terminase assembly, association of the terminase with the portal, generation of initial free ends, or coordination with other replication machinery (Didychuk et al., 2021). Presuming that the new structures for HCMV UL52 and HSV-1 UL32 occur naturally, how do they fit with the previously proposed functional roles of the herpesvirus packaging accessory factor? A more in-depth discussion of this would be valuable.
  
  The common core fold and pentamer/pentamer-like assemble are common features, as is the conserved, positively-charged central channel. We have added additional discussion of this.
  
  *Minor comments A lack of page numbers and line numbers made reviewing this manuscript more challenging than necessary. *
  
  We have included page numbers and line numbers in the revised manuscript.
  
  *As noted in the 'General comments' section above, ORF68 (3.37Å) and BFLF1 (3.60Å) both form pentamers (Didychuk et al., 2021) and were produced in mammalian systems HEK293T cells. Protein purification in the present study was performed in insect (SF9 or High Five) cells. Does this affect complex stability. Also, the tag was retained for UL32 in Didychuk et al., 2021; could this provide stability of the pentamer in the original studies? *
  
  As discussed above, we have no evidence to suggest that expression in human vs. insect cell expression systems dramatically changes oligomerization behavior (Reviewer #3, comment 3). N-terminal purification tags were also retained in this study for structural work but were removed for SEC-MALS, which shows that UL32 is likely in concentration dependent equilibrium between (unstable) pentamers and monomers.
  
  Suppl. Fig. 3 is missing.
  
  We apologize for this oversight and have included Supplementary Fig. 3.
  
  *"UL52 has two regions remodeled" The use of the word 'remodeled' is not appropriate in this context as it implies a single protein can form two shapes under different conditions rather than distinct structures between two disparate proteins; UL52 compared to ORF68. This should be rephrased. *
  
  This was also noted by reviewer 2, and we have removed the term "remodel" throughout the manuscript text (i.e., Lines 134, 138, 140, 337, 341) and from Supplementary Figures 1, 3, and 5.
  
  *What is the density in the central core of UL52 (Fig. 2a; Suppl. Fig. 2e)? Was any form of focused classification performed to establish the identity of the density within the central pseudocavity? *
  
  As noted in the manuscript, this density could be which could be attributed to co-purified protein or nucleic acid, or part of the unresolved, negatively charged loop (residues 82-181) interacting with the positively charged central channel. We have done additional analysis of the central channel density (3D classification with a focus mask) and do not resolve any distinct densities, suggesting that the density is very dynamic.
  
  *Does UL52 bind to dsDNA? To support the hypothesis that the herpesvirus packaging accessory factor has conserved functions across the three subfamilies dsDNA binding experiments should be performed. *
  
  We have not done this experiment. We think that demonstrating this finding for two of the three herpesvirus subfamilies is sufficient.
  
  There is no discussion about how these data relate to the previous functional model for ORF68 presented in Didychuk et al., 2021. Do the new data alter the previous functional models?
  
  The precise mechanistic contribution of the packaging accessory factor remains unknown, and our data do not delineate between the proposed potential roles described in Didychuk et al. 2021. Importantly, our structural information, demonstration of pentameric ring formation, and significance of the positively charged central channel show that the core function of this factor is likely conserved across the virus family. This was not known before our work.
  
  *There are some interesting grammatical phrases; please address throughout the manuscript. One example - "...a notable shared aspiration..." Proteins do not have aspirations. Please use a more formal scientific statement. *
  
  We have updated the language on Line 327.
  
  *Fig. 4b - Statistical analyses missing. Please provide. *
  
  Fig. 6c - Statistical analyses are missing. Please provide. Protein folding/expression data missing; see Fig. 5C showing mutations that result in poor protein expression.
  
  Suppl. Fig. 7f - Statistical analyses absent.
  
  Statistical analysis of the viral complementation in Figs. 4b and 6c has been included. Note that the viral yields reported in Supplementary Fig. 7f were used to calculate complementation efficiency in Figs. 4b and 6c. Protein expression of mutants shown in Fig. 6c was previously included in Supplementary Fig. 7e and is referenced on Lines 288 and 293.
  
  *Suppl. Fig. 2 and 5 - FSC curves have oddities, especially in the corrected curves. The cryo-EM resolution estimates calculated by CryoSPARC for the UL52 '3-mer' and 4-mer, and UL32 tripentamer are likely overestimated. In the PDB validation files each of the deposited structures has a warning for the resolution estimate "The value from deposited half-maps intersecting FSC 0.143 CUT-OFF 4.31 differs from the reported value 3.32 by more than 10 %", suggesting that the resolution estimates are inaccurate. The authors should provide a resolution estimate using loose masks and generate FSC curves using another software program such as RELION's postprocess to provide resolution estimates. *
  
  Thank you for bringing this to our attention. The differences in the resolution estimates are a known issue and are highly influenced by the tightness of the mask. In the revised manuscript we have updated the FSC curves to not include auto-tightened masks and revised our resolution estimates. This slightly changed the resolution to 3.29 Å for both UL52 3-mer and 4-mer and to 3.09 Å for the UL32 consensus map. Please also see the local resolution estimation maps in Supplementary Figures 2e and 5e for an illustration of the range of resolutions in each map.
  
  Suppl. Fig. 6f and 6g - Is there any visible density that might resemble the EGS crosslinking reagent?
  
  We do not expect to observe density for EGS due to the long flexible linker (~16 Å) between the two reactive groups.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.01.22.701024
www.biorxiv.org www.biorxiv.org

Experimental verification of the error minimization theory using non-standard genetic codes constructed in vitro

1
1. Public_Reviews 04 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  In this manuscript, the authors investigate the relationship between genetic codes and their robustness to single-point mutations. They construct ten alternative genetic codes by reassigning nine codons to Leu, Ser, or Ala, and assess mutational robustness using three reporter proteins subjected to error-prone PCR. This represents an interesting experimental approach to addressing the hypothesis that the standard genetic code is optimized for mutational robustness.
  
  We sincerely thank the reviewer for the positive evaluation of our experimental approach. We are encouraged that the reviewer recognizes the value of constructing multiple non-standard genetic codes in vitro and using them to experimentally examine the relationship between genetic code arrangement and mutational robustness. In the revised manuscript, we have further clarified the scope of our experimental system and the interpretation of the results, particularly emphasizing that our conclusions concern the mutational robustness of individual reporter protein activity measured in an in vitro translation system.
  
  Major comment:
  
  While I find the experimental design valuable, I am not fully convinced by the authors' conclusion that "alterations of the genetic code within the ranges explored in this study have no significant effect on mutational robustness". The current analysis is based on the functional output of three individual reporter proteins. Given that cellular systems involve far more complex interactions, it would be more appropriate to limit this conclusion to mutational robustness at the level of individual protein activity, rather than making broader generalizations.
  
  We thank the reviewer for this important comment. We agree that our original wording was broader than what can be directly supported by the present experiments. Because our analysis is based on the functional outputs of three individual reporter proteins translated in a reconstituted in vitro system, the results do not directly address mutational robustness at the level of the cellular system, protein interaction networks, or organismal fitness.
  
  Accordingly, we have revised the manuscript to limit our conclusion to the mutational robustness of individual reporter protein activity. In the revised Abstract, Results, and Discussion, we now state that within the experimentally tested range of non-standard genetic codes, we did not detect a dependence of the mutation-induced decrease in reporter protein activity on mutational cost. We have also added a statement in the Discussion noting that cellular systems involve many additional layers, including protein–protein interactions, metabolic networks, quality-control systems, and growth selection, and that whether genetic code arrangement affects robustness at these higher biological levels remains an important question for future work.
  
  Specifically, we have added this explanation and the new experiment to the revised manuscript as follows.
  
  Abstract
  
  “This result provides direct experimental evidence that mutational robustness does not significantly change in individual reporter protein activity when the genetic code is altered within the range of mutational cost tested in this study…”
  
  Introduction
  
  “Random mutations decreased reporter protein function at similar levels across all genetic codes examined, implying that alterations of the genetic code within the ranges explored in this study have no significant effect on mutational robustness of individual protein activity.”
  
  Result
  
  “Taken together, these results indicate that mutational robustness of individual reporter protein function did not substantially differ among the genetic codes…”
  
  Discussion
  
  “…suggesting that mutational robustness of protein activity remained largely unchanged within at least the ranges of mutational cost tested in this study. It should be noted that this conclusion is limited to the activity of individual reporter proteins translated in a reconstituted in vitro system. Therefore, whether similar trends would be observed at the level of cellular fitness or long-term evolution remains an open question.”
  
  Specific comments
  
  (1) tRNA modification and expression efficiency (Page 5, line 131)
  
  The authors attribute the observed inefficiency to the lack of chemical modifications in the tRNAs used. However, gene expression efficiency can also be strongly influenced by DNA sequence design. To better support this claim, it would be helpful to compare luciferase activity when expressed using native E. coli tRNAs. This comparison could clarify whether the observed effects are due to tRNA modification status or other sequence-dependent factors.
  
  We thank the reviewer for this important suggestion. We agree that the translation efficiency of NanoLuc templates with 21-, 32-, and 46-codons may be affected not only by the chemical modification of tRNAs but also by sequence-dependent factors, such as codon context and mRNA structure.
  
  To examine this possibility, we performed an additional comparison using native E. coli tRNAs in the tfPURE system. When the NanoLuc templates encoded with 21, 32, or 46 codons were translated using native E. coli tRNAs, the observed luminescence values were 1.2 × 10<sup>10</sup>, 0.78 × 10<sup>10</sup>, and 0.60 × 10<sup>10</sup>, respectively. Thus, the 46-codon NanoLuc template showed lower activity than the 21- and 32-codon templates even with native tRNAs, indicating that sequence-dependent effects indeed contribute to translation efficiency.
  
  However, the difference among these templates with native E. coli tRNAs was within approximately two-fold. This effect was much smaller than the marked decrease observed when the 46-codon template was translated using the in vitro prepared 46 tRNAs SGC system. Therefore, while sequence-dependent effects cannot be excluded, the inefficient translation in the reconstructed 46 tRNAs SGC is likely to be mainly attributable to the limited functionality of unmodified tRNAs decoding NNA codons.
  
  We have revised the manuscript to clarify this interpretation and have added the new comparison using native E. coli tRNAs.
  
  “We also examined whether the lower translation efficiency of the 46-codon NanoLuc template could be explained by sequence-dependent effects, such as codon context or mRNA structure. When the 21-, 32-, and 46-codon NanoLuc templates were translated using native E. coli tRNAs in the tfPURE system (Figure 1–figure supplement 2), the 46-codon template showed lower activity than the 21- and 32-codon templates; however, this difference was within approximately two-fold. Accordingly, we decided to use only the 32 codons used in near-SGC (i.e., excluding NNA codons) in the subsequent construction of non-standard genetic codes.”
  
  (2) Discrepancy between expression level and activity (Figure S7 vs Figure S8).
  
  Although GAL expression levels appear similar across different genetic codes (Figure S7), their activities differ substantially (Figure S8), even in the low-mutation library. This discrepancy warrants further investigation. Possible explanations include differences in protein folding efficiency or translational error rates, as mentioned by the authors in the main text.
  
  To address this, the authors could analyze the protein products using mass spectrometry. If this is not feasible due to low expression levels, alternative approaches such as SDS-PAGE (e.g., with radiolabeling or Western blotting) would still provide valuable information. Additionally, comparing activity after in vitro refolding could help distinguish between folding defects and sequence-level errors. While I understand that the primary aim of this study is to compare mutational robustness across genetic codes, discussing these observations would significantly enhance the mechanistic insight of the work.
  
  We agree that the discrepancy between similar GAL expression levels and different GAL activities across genetic codes is important for interpreting the results.
  
  In our experiment, GAL protein amounts were quantified using a C-terminal HiBiT tag. Because the HiBiT tag was fused to the C-terminus of GAL, this assay indicates that the amount of C-terminally completed GAL products did not differ substantially among genetic codes. However, we agree that this assay does not evaluate the sequence fidelity, amino acid misincorporation patterns, or folding state of the translated products. Therefore, the observed differences in GAL activity despite similar HiBiT signals may reflect genetic code-dependent differences in translational error rates, amino acid misincorporation, protein folding efficiency, or other effects on the fraction of catalytically active protein.
  
  We have revised the Discussion to explicitly describe this interpretation and to clarify that detailed mechanistic dissection of these baseline activity differences, for example by mass spectrometry, SDS-PAGE/Western blotting, or refolding analysis, is an important future direction but beyond the scope of the present study. We also clarified that the main analysis in this study uses the ratio of activity from the high-mutation library to that from the corresponding low-mutation library within each genetic code.
  
  We have added this explanation to the revised manuscript as follows.
  
  “Although protein amounts quantified by the HiBiT tag were comparable among genetic codes, GAL activities differed substantially. This indicates that the activity differences among genetic codes were not primarily attributable to differences in the amount of C-terminally completed translation products. The HiBiT assay does not provide information on the fraction of catalytically active protein, including sequence fidelity or folding state, and therefore cannot distinguish among these possibilities. Detailed characterization of translated products by mass spectrometry would provide further mechanistic insight into how individual non-SGCs affect protein quality. However, the primary objective of the present study was to compare mutation-dependent activity loss across genetic codes. Therefore, we evaluated this effect by normalizing the activity of the high-mutation library to that of the corresponding low-mutation library within each genetic code.”
  
  (3) Protein expression analysis for additional reporters.
  
  Since protein expression levels are critical for interpreting reporter activity, similar analyses should also be performed for luciferase (Luc) and mSG in both high- and low-mutation libraries. This would ensure that differences in activity are not confounded by variations in protein abundance.
  
  We agree that protein abundance is an important factor for interpreting reporter activity. In this study, we performed HiBiT-based protein quantification for GAL because GAL showed the largest variation in absolute activity among genetic codes, even in the low-mutation library. This analysis showed that the amount of C-terminally completed GAL products was broadly comparable among genetic codes and between low- and high-mutation libraries, indicating that the observed GAL activity differences were not primarily attributable to differences in total protein abundance.
  
  For all three reporters, our main analysis was based on the ratio of activity from the high-mutation library to that from the corresponding low-mutation library within each genetic code. This normalization was intended to evaluate mutation-dependent activity loss while reducing the influence of code-specific baseline differences in expression level or protein quality. We believe that the data are sufficient to evaluate the effect of mutations on protein activities. Nevertheless, we agree that protein quantification for Luc and mSG would provide useful information regarding variation in the baseline levels of reporter activity, and this is an important direction for future work.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The study addresses the long-standing question in molecular biology and genetics: why has nature selected the current genetic code (SGC, or standard genetic code)? The authors have tested 'error minimization theory', one of the prevailing hypotheses to explain this. Their approach is to create a minimum genetic code (MGC) and its variants (3^9 theoretical possible codes). Using three parameters to quantify the effect of mutations (Polarity, volume, and hydropathy), they computationally test the cost of these genetic codes (3^9) by simulations. Finally, they test this cost experimentally using an in vitro translation system with 10 select genetic code variants with a range of costs (low to high). They use three randomly mutated reporter genes for this purpose - beta-galactosidase, luciferase, and mSG. They find no correlation between the cost of the genetic code and the reporters' output. Based on these observations, they suggest that error-minimization theory may not explain the current egocentric code.
  
  The question they are asking is very exciting, and their approach is solid. The authors are very careful in their analyses and conclusions.
  
  We sincerely thank the reviewer for the positive assessment of our study and for the helpful suggestions. We are encouraged that the reviewer found the question exciting and the approach solid. In the revised manuscript, we have clarified the rationale for using the MGC/near-SGC framework, added further analyses and explanations of the mutational cost calculations, and revised the wording of our conclusions to more explicitly define the scope and limitations of the present experimental system.
  
  (1) The rationale for using MGC instead of SGC: It is unclear why the authors rely on the MGC for this analysis when the central question concerns the SGC. If the goal is to evaluate whether the SGC minimizes mutational cost, a more direct approach would be to generate alternative variants of the SGC itself and compare their mutational cost distributions. At present, it is difficult to assess whether conclusions drawn from this comparison are fully relevant to the stated biological question.
  
  We thank the reviewer for this important comment. We agree that directly constructing alternative variants of the SGC by changing amino acid assignment from SGC would be the most straightforward approach to testing whether the SGC minimizes mutational cost. However, this approach is currently not feasible in our reconstituted translation system for two reasons.
  
  First, our attempt to construct a 46-tRNA SGC-like system revealed that translation using the 46-codon NanoLuc template was approximately 100-fold less efficient than translation using the MGC or near-SGC (Fig. 1). This low activity likely reflects inefficient decoding of NNA codons by in vitro-prepared tRNAs, which lack native post-transcriptional modifications. Because this system did not provide sufficient translational activity for systematic reporter assays, we restricted subsequent experiments to the 32-codon near-SGC framework, excluding NNA codons. We now describe this technical limitation more explicitly in the revised manuscript.
  
  Second, the MGC framework provides vacant codons that can be reassigned by adding anticodon-variant tRNAs. This feature is essential for constructing multiple genetic code variants in parallel under controlled in vitro conditions. We, therefore, constructed the near-SGC-based non-SGC by adding each tRNA variant to the MGC as an experimentally tractable model system to verify whether differences in genetic code arrangement affect mutation-induced decreases in reporter protein activity.
  
  We have added this explanation to the revised manuscript as follows.
  
  “We first established a minimal genetic code, composed of 21 tRNAs with vacant codons, which allows multiple alternative codon assignments to be introduced under otherwise comparable translation conditions.”
  
  Despite this technical limitation, we believe that the central conclusion of this study—that mutational robustness in individual reporter protein activity does not change significantly when the genetic code is altered within the range of mutational costs tested here—remains well-supported by the present results.
  
  (2) The mutational cost analysis appears biologically oversimplified because all amino acid substitutions are treated equivalently. The analysis assumes that all mutations contribute equally to fitness consequences, which does not reflect biological reality. In natural proteins, the impact of an amino acid substitution depends strongly on its structural and functional context. For example, substitutions affecting catalytic residues, ligand-binding interfaces, phosphorylation sites, or other regulatory motifs can severely impair protein function even when associated changes in polarity, hydropathy, or volume are minimal. Conversely, substitutions in structurally permissive or functionally dispensable regions may have little or no measurable effect despite larger physicochemical differences. Therefore, changes in polarity, hydropathy, and volume alone do not necessarily predict functional consequences.
  
  We agree that the mutational cost used in this study is a simplified measure and does not capture the full biological complexity of amino acid substitutions. As the reviewer pointed out, the functional consequence of a substitution depends strongly on its structural and functional context, including whether the affected residue is involved in catalysis, ligand binding, protein–protein interactions, regulatory motifs, folding, or structurally permissive regions.
  
  In this study, we used physicochemical-property-based mutational costs because this type of definition has been widely used in classical formulations of the error minimization theory. Our aim was therefore not to construct a comprehensive predictor of protein fitness effects, but to experimentally test whether the conventional theoretical cost metrics used to discuss genetic code optimality are reflected in the average mutation-induced decrease in reporter protein activity. We have now clarified this rationale in the revised manuscript.
  
  “It should be noted that this conclusion is limited to the activity of individual reporter proteins translated in a reconstituted in vitro system. Therefore, whether similar trends would be observed at the level of cellular fitness or long-term evolution remains an open question.”
  
  (3) It is not clear why they increased the concentration of the two tRNAs in near-SGC. Have they maintained the same tRNA concentrations in experiments explained in Fig 5 for all 10 genetic codes tested?
  
  We apologize that the rationale for increasing the concentrations of tRNA<sup>Val</sup><sub>CAC</sub> and tRNA<sup>Arg</sup><sub>CCU</sub> was not sufficiently clear in the original manuscript. As we wrote in the previous manuscript, “To improve translation efficiency with near-SGC, we focused on two tRNA concentrations (tRNA<sup>Val</sup><sub>CAC</sub> and tRNA<sup>Arg</sup><sub>CCU</sub>), which were suggested to have low activities in a previous study (Iwane et al., 2016),” we tested whether increasing their concentrations would improve translation efficiency. As shown in Figure 1–figure supplement 1, NanoLuc activity increased as the concentrations of these two tRNAs were raised and used at 100 ng/µL for tRNA<sup>Val</sup><sub>CAC</sub> and tRNA<sup>Arg</sup><sub>CCU</sub> in the optimized near-SGC, referred to as near-SGC (RV), and in all subsequent experiments. Additional anticodon-variant tRNAs required for each non-SGC were used at optimized concentrations determined from Figure 2–figure supplement 1. For each genetic code, the same tRNA composition and concentrations were used for the low- and high-mutation libraries (See Supplementary Table S7). To clarify this point, we added the sentence, “The increased concentrations of these two tRNAs were used in all the subsequent experiments,” in the corresponding part.
  
  Reviewer #3 (Public review):
  
  In this manuscript, Miyachi and Ichihashi investigate whether the arrangement of the genetic code affects mutational robustness. Using an in vitro minimal genetic code with vacant codons, they constructed 10 non-standard genetic codes by reassigning Ala, Ser, and Leu, generating codes with replacement costs that were generally higher than those of the standard genetic code across several amino acid property measures. They then tested how random mutations affected the activity of reporter proteins translated under these altered codes. Although error minimization theory predicts that higher-cost codes should make mutations more harmful, the authors report that protein function declined to a similar extent across all codes examined, suggesting that mutational robustness remains largely unchanged within the range of genetic code alterations tested here.
  
  Strengths:
  
  This is an interesting study that investigates one of the most fundamental and intriguing questions in molecular evolution: the emergence of the genetic code, which is nearly universal across nature. The in vitro approach is a powerful aspect of the work and provides an opportunity to examine this phenomenon experimentally at a depth that has previously been inaccessible.
  
  Weaknesses:
  
  However, the authors' use of random mutation libraries has certain limitations that prevent the study from realizing its full potential to uncover the mechanisms governing the molecular evolution of the genetic code.
  
  We sincerely thank the reviewer for the positive evaluation of our study and for recognizing the strength of the in vitro approach. We are encouraged that the reviewer considers this system a powerful way to experimentally address the emergence of the genetic code.
  
  We also appreciate the reviewer’s constructive comments regarding the limitations of random mutation libraries. We agree that pooled random libraries do not allow us to assign functional effects to individual mutations or to fully uncover the molecular mechanisms underlying mutational robustness. In the revised manuscript, we therefore clarify that our conclusions concern the library-averaged effects of random mutations on individual reporter protein activity, rather than the effects of specific mutations or cellular-level fitness. To address this limitation, we have added explanations of the scope and limitations of the present approach.
  
  (1) Statistical analyses are missing for several of the manuscript's main claims. This issue applies throughout the paper, including, but not limited to, Figures 1D, 2B, 4B-D, and 5B.
  
  We thank the reviewer for this important comment. We agree that statistical analyses are necessary to support the major claims of the manuscript. We have therefore added statistical analyses appropriate for the purpose and experimental design of each figure.
  
  For Fig. 1D, we performed one-way ANOVA followed by Tukey’s post hoc test on NanoLuc activity to compare translation efficiencies among the MGC, near-SGC, near-SGC (RV), and SGC conditions. This analysis showed a significant overall difference among conditions (one-way ANOVA, p < 0.0001). Tukey’s post hoc test showed that near-SGC was significantly lower than MGC, that near-SGC (RV) significantly improved near-SGC translation, and that near-SGC (RV) was not significantly different from MGC. In contrast, the 46-tRNA SGC remained significantly less efficient than near-SGC (RV). We have summarized the major comparisons in Supplementary Table S8.
  
  For Fig. 2B, we compared NanoLuc activity between the 21-code control and the corresponding 21+1-code condition for each codon reassignment using Welch’s t-test on luminescence. This analysis was added to statistically support whether each anticodon-variant tRNA increased NanoLuc translation from the corresponding reassigned template. The statistical results are summarized in Supplementary Table S9.
  
  For Fig. 4B–D, we converted mutation rates per base to estimated numbers of mutations per gene and performed Spearman’s rank correlation analysis to evaluate whether reporter activity decreased monotonically with increasing mutational load. This analysis showed strong negative monotonic trends between mutation rate (estimated mutation number) and reporter activity for all three reporters (ρ = −0.90 to −1.00), supporting that the random mutation libraries reduced protein activity in a mutation-load-dependent manner.
  
  For Fig. 5B, replicate-level data were available for GAL, and we therefore performed two-way ANOVA using genetic code and mutation level as factors. This analysis detected significant main effects of genetic code and mutation level, indicating that GAL activity differed among genetic codes and decreased in the high-mutation library. However, no significant interaction between genetic code and mutation level was detected, indicating that the magnitude of mutation-induced activity reduction was not strongly code-dependent under the conditions examined.
  
  Finally, because the central claim of Fig. 5C, 5E, and 5G is that mutational cost does not systematically predict mutation-induced activity loss, we performed Spearman’s rank correlation analysis between each mutational cost metric and the high-/low-mutation activity ratio. No significant correlations were detected for any reporter or cost metric (Spearman’s ρ = −0.23 to 0.25), supporting the conclusion that mutational cost did not show a detectable monotonic relationship with mutation-induced activity loss within the tested range.
  
  We have added these statistical analyses to the revised manuscript. The following sentences were added to the figure legends:
  
  Fig. 1
  
  “Statistical comparisons in (D) were performed using one-way ANOVA followed by Tukey’s post hoc test on NanoLuc activity; major comparisons are summarized in Table S8.”
  
  Fig. 2
  
  “For each template, NanoLuc activity in the 21-code and corresponding 21+1-code conditions was compared using Welch’s t-test on luminescence. Statistical results are summarized in Table S9.”
  
  Fig. 4
  
  “Spearman’s rank correlation coefficients were ρ = −0.90 for GAL, ρ = −1.00 for Luc, and ρ = −1.00 for mSG”
  
  Fig. 5
  
  “For GAL activity in (B), two-way ANOVA was performed using genetic code and mutation level as factors. Significant main effects of genetic code and mutation level were detected (both p < 0.0001), whereas their interaction was not significant. For (C), (E), and (G), Spearman’s rank correlation analysis was performed between each mutational cost metric and the high-/low-mutation activity ratio. Statistical details are summarized in Table S10.”
  
  (2) In Figure 2A, the authors modify the NanoLuc gene by reassigning Ala, Leu, or Ser to new codons and elegantly show that the in vitro availability of the corresponding tRNAs is important for protein function. However, the functional importance of the specific modified positions within NanoLuc is not clear. As a result, it is difficult to determine what the expected consequences of these codon changes should be, which in turn limits the interpretation of the observed changes in protein activity. To improve the interpretability of this experiment, the authors should report exactly how many codons were modified in each variant and, ideally, examine the effect of progressively increasing the number of reassigned codons.
  
  We agree that the exact positions and numbers of codon replacements should be clearly reported. In the revised manuscript, we have added a list of the modified amino acid positions. In brief, two Ala codons, three Ser codons, or four Leu codons were replaced with the target vacant codon; the modified positions were Ala16 and Ala120, Ser31, Ser49, and Ser150, and Leu32, Leu67, Leu144, and Leu170, respectively.
  
  We also agree that progressively increasing the number of reassigned codons would provide additional mechanistic insight. However, the purpose of Fig. 2 was to test whether each vacant codon could be decoded by the corresponding anticodon-variant tRNA to produce functional NanoLuc, rather than to analyze the positional contribution of each replacement. We previously performed such progressive codon replacement analysis for one reassigned codon, ACG, in a related study (Miyachi et al., 2025), and the results supported the same qualitative interpretation. Although we did not repeat this progressive analysis for all codons in the present study, we expect that the qualitative interpretation of Fig. 2 would not be substantially changed.
  
  We have revised the figure text to clarify the scope of the experiment and added the detailed codon replacement information.
  
  “(A) Schematic illustration of reassignment experiments. Translation with the original MGC and NanoLuc template is shown at the top for comparison. An example of Ala reassignment to the UUG codon is shown at the bottom. In this example, three Ala codons in the NanoLuc sequence were replaced with one type of vacant codon (e.g., UUG), generating a 21 + 1 (UUG-Ala) codon set. Similar reassignment experiments were performed for three amino acids (Ala, Ser, and Leu) and nine vacant codons. Specifically, two Ala codons (Ala16 and Ala120), three Ser codons (Ser31, Ser49, and Ser150), or four Leu codons (Leu32, Leu67, Leu144, and Leu170) were replaced.”
  
  (3) The calculations presented in Figure 3 raise an interesting conceptual question: why does the near-standard genetic code not exhibit the lowest cost? One possible explanation is that the standard genetic code evolved under multiple competing constraints and is therefore not expected to be optimal for any single cost metric, while still achieving strong overall performance. In this context, it would be informative if the authors combined the three cost measures into a single integrated index and examined whether the near-SGC performs more favorably when all three dimensions are considered together. Such an analysis could add important depth to the study.
  
  We agree that the near-SGC is not necessarily expected to minimize each individual cost metric, because the standard genetic code may reflect multiple competing physicochemical, translational, biosynthetic, and evolutionary constraints rather than optimization of a single property.
  
  To address this point, we added an integrated cost analysis combining the three physicochemical cost metrics, Cost<sub>PR</sub>, Cost<sub>MV</sub>, and Cost<sub>HI</sub>. Because these three metrics have different numerical scales, we normalized each metric before integration. We used two types of integrated indices.
  
  First, for each metric m 𝛜 {PR, MV, HI}, we calculated a min–max normalized cost,
  
  Where G denotes the set of 19,683 candidate non-SGCs generated by assigning Ala, Ser, or Leu to the nine vacant codon boxes. We then defined the integrated min–max cost as
  
  Second, we calculated a z-score-normalized cost for each metric,
  
  Where µ<sub>m,G</sub> and 𝜎<sub>m,G</sub> are the mean and standard deviation of Cost<sub>m<sub>norm</sub></sub> across the candidate non-SGCs. The integrated z-score cost was then defined as
  
  Using both integrated indices, the near-SGC ranked first when compared with all 19,683 candidate non-SGCs; in other words, no candidate non-SGC showed a lower integrated cost than the near-SGC. The integrated min–max cost of the near-SGC was 0.01525, whereas the lowest value among candidate non-SGCs was 0.12301. Similarly, the integrated z-score cost of the near-SGC was −2.47947, whereas the lowest candidate value was −1.90838.
  
  We have added this integrated cost analysis as Supplementary Figure 5–figure supplement 7. We have also revised the Discussion to note that the near-SGC does not necessarily minimize every individual physicochemical cost, but performs most favorably when PR, MV, and HI are considered comprehensively. This result is consistent with the idea that the standard genetic code may represent a compromise among multiple constraints rather than optimization of a single physicochemical property.
  
  “We consider that the cost ranges examined in this study represent substantial fractions, especially for MV and HI. Although the near-SGC did not necessarily exhibit the lowest cost for each individual physicochemical metric, this does not mean that it is unfavorable in the multidimensional cost space. Because the SGC may reflect a balance among multiple physicochemical constraints rather than optimization of a single property, we also calculated integrated cost indices by combining Cost_PR, Cost_MV, and Cost_HI after min–max normalization or z-score normalization. In both integrated indices, the near-SGC showed the lowest overall cost when compared with all 19,683 candidate non-SGCs (Figure 5–figure supplement 7), indicating that no candidate non-SGC exhibited a lower combined cost than the near-SGC when the three physicochemical properties were considered comprehensively.”
  
  (4) It is difficult to assess the consequences of the random mutations presented in Figure 4 on reporter gene function based solely on the reported "error rate/base" parameter. In particular, the x-axis in Figure 4B should be converted into the estimated number of mutations per gene. This would make the results more intuitive and would allow the reader to better evaluate the expected degree of disruption to protein function.
  
  We agree that the mutation rate per base alone does not provide an intuitive sense of the expected mutational burden for each reporter gene. We therefore added a second x-axis to Fig. 4B–D showing the estimated number of mutations per gene. This value was calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene.
  
  We retained the original mutation rate per base axis to preserve the direct link to the sequencing-based mutation rate measurement, while adding the estimated mutations per gene axis to improve interpretability. We have revised the figure and figure 4 legend accordingly.
  
  “The lower x-axis indicates the estimated number of mutations per gene, calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene.”
  
  (5) A central limitation of the random mutagenesis libraries used in Figure 5, which also underlie one of the manuscript's main claims, is that the exact mutations and their distribution across the reporter genes are not reported. In addition, protein activity is measured only at the level of the entire library, without directly linking individual mutations to their functional consequences. This substantially limits mechanistic interpretation. In my view, this issue can only be addressed convincingly if the authors test a set of defined variants carrying specific mutations and directly evaluate their functional effects.
  
  (6) Related to the previous point, in Figures 5C, 5E, and 5G, the authors present the ratio between low-mutation-rate and high-mutation-rate libraries. However, because each library contains a different collection of mutations, it is unclear what can be inferred from these comparisons. To overcome this limitation, the authors should assess the effects of altered genetic codes on specific, defined mutations rather than on heterogeneous mutation pools alone.
  
  (7) Along the same lines, in Figures 5C, 5E, and 5G, it is unclear why the effects of random mutations would be expected to correlate with the three calculated cost metrics, given that the positions, identities, and functional relevance of the mutations within the genes are not known. Without this information, the biological meaning of these correlations remains difficult to evaluate.
  
  We agree that using pooled random mutation libraries does not allow us to directly link individual mutations to their functional consequences. We also agree that testing defined variants carrying specific mutations would provide a more direct and mechanistic understanding of how each genetic code affects the functional impact of particular amino acid substitutions. However, the purpose of the present study was different from such a defined-variant analysis. Our aim was to experimentally test whether the conventional mutational cost metrics used in error minimization theory predict the average effect of random mutational loads on protein activity. Because these theoretical costs are themselves defined as average expected physicochemical effects over many possible single-nucleotide substitutions, we reasoned that pooled random mutation libraries provide an appropriate first experimental framework to evaluate whether such average-cost metrics are reflected in the average functional output of translated proteins.
  
  We agree that low- and high-mutation libraries do not contain identical sets of mutations. Therefore, the high-/low-mutation activity ratio should not be interpreted as the effect of the same individual variants before and after additional mutations. Rather, it represents the relative reduction in average activity caused by increasing the mutational burden in a heterogeneous mutation pool under each genetic code. We have revised the text to clarify this interpretation.
  
  We also agree that the positions, identities, and functional relevance of individual mutations are not resolved in this pooled assay. This limitation prevents us from assigning mechanistic effects to specific substitutions. At the same time, using a small set of defined variants would introduce its own selection bias, because the conclusions could strongly depend on which mutations and which protein positions were chosen. Therefore, we consider the random-library approach to be a useful first step for testing library-averaged effects, whereas systematically defined variant analysis or genotype-resolved activity assays will be necessary to reveal mutation-specific mechanisms in future studies.
  
  In response to the reviewer’s concern, we have revised the Discussion to explicitly limit our conclusion to library-averaged effects on individual reporter protein activity. We now state that this approach does not identify the functional effects of individual mutations and that future studies using defined variants or high-throughput genotype–phenotype mapping will be required to determine how specific substitutions contribute to genetic code-dependent mutational robustness.
  
  Result
  
  “To estimate the average activity reduction associated with increased mutational burden under each genetic code, we calculated the ratio of activity obtained from the high-mutation library to that from the corresponding low-mutation library and plotted this ratio against each of the three mutational costs (Fig. 5C).”
  
  Discussion
  
  “A further limitation of this study is that the reporter activities were measured at the level of pooled random mutation libraries. Therefore, the high-/low-mutation activity ratio used in this study should be interpreted as the relative reduction in average activity caused by increasing the mutational burden in a heterogeneous mutation pool, rather than as the effect of identical variants before and after additional mutations. This library-averaged approach was chosen because the mutational costs considered here are also defined as average expected physicochemical effects over many possible single-nucleotide substitutions. In addition, because the non-SGCs constructed in this study were generated by reassigning only Ala, Ser, and Leu, the detectable effects may depend on how frequently mutations involving these amino acids occur in each reporter gene and whether the affected positions are functionally important. If genetic code dependent effects are restricted to a small subset of deleterious variants, such effects may be masked in pooled activity measurements. Future studies using defined variants or high-throughput genotype–phenotype mapping assays will be required to determine the mutation-specific and position-specific mechanisms underlying genetic code dependent effects on protein function (Rozhoňová et al., 2024).”
  
  (8) For each mutagenesis library, the number of variants, the average number of mutations per variant, and the distribution of mutation positions should be reported clearly and transparently. These details are important for evaluating the strength of the conclusions.
  
  We agree that a more transparent characterization of the random mutagenesis libraries is necessary for evaluating the strength and limitations of our conclusions.
  
  In the revised manuscript, we have added the estimated number of mutations per gene to the Results section. This value was calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene. For the high-mutation libraries used in Fig. 5, the estimated numbers of mutations per gene were approximately 8.0 for GAL, 4.5 for Luc, and 3.3 for mSG. We also added position-wise mutation profiles along each reporter gene (Figure 4–figure supplement 2), in addition to the heatmap shown in the original manuscript. These analyses clarify the mutational burden of each library and show that mutations were broadly distributed across the analyzed regions (approximately 300 nt in the middle of each gene) of the reporter genes.
  
  Regarding the number of variants, the translation reactions were performed using 5 nM DNA template in a 5 µL reaction, corresponding to approximately 1.5 × 10<sup>10</sup> DNA molecules. However, this value represents the total number of DNA molecules introduced into the reaction and does not directly indicate the number of unique full-length sequence variants, because multiple molecules can share the same genotype, and our sequencing analysis was designed to quantify mutation frequencies and positional distributions rather than to reconstruct full-length genotypes of individual library members. Therefore, we do not infer the exact number of unique variants in each library. Instead, we report the average mutation burden and position-wise non-reference rate distributions.
  
  We have revised the Results and added Supplementary Figure 4–figure supplement 2 accordingly.
  
  “For this experiment, two random mutation libraries were used: a low-mutation library prepared using the high-fidelity polymerase and a high-mutation library prepared using Taq DNA polymerase at a Mn<sup>2+</sup> concentration that yields mutation rates of 0.002 – 0.005 per base (0.0026 for GAL, 0.0027 for Luc, and 0.0048 for mSG, corresponding to approximately 8.0, 4.5, and 3.3 mutations per gene). We also plotted position-wise non-reference rates along the analyzed regions of each reporter gene, confirming that mutations were broadly distributed across the amplicons (Figure 4–figure supplement 2).”
  
  (9) Because only three amino acids were manipulated in the non-standard genetic codes, it remains unclear whether these particular amino acids occupy positions in the reporter proteins that are especially important for function and therefore likely to generate strong phenotypic effects. More broadly, it is not clear whether the assay is sufficiently sensitive to detect the effects of only a subset of deleterious variants within a pooled library. This point should be addressed more explicitly.
  
  We agree that this is an important limitation of the present study. Because our non-SGCs were constructed by reassigning only Ala, Ser, and Leu, the mutation-dependent effects that can differ among genetic codes are limited to mutations involving these reassigned codons or amino acid substitutions affected by these assignments. Therefore, the sensitivity of the assay depends on how frequently such substitutions occur in the reporter genes and whether the affected Ala, Ser, and Leu-related positions are functionally important.
  
  We have revised the Discussion to address this point more explicitly. In the revised manuscript, we now state that the absence of a detectable cost-dependent effect may reflect not only the limited cost range examined, but also the limited set of reassigned amino acids, the position-dependent importance of Ala/Ser/Leu residues in the reporter proteins, and the sensitivity limit of pooled activity measurements. We further note that future studies using genotype-resolved activity assays (defined variants) will be required to determine whether specific amino acid substitutions or specific protein positions exhibit stronger genetic code-dependent effects.
  
  “A further limitation of this study is that the reporter activities were measured at the level of pooled random mutation libraries. Therefore, the high-/low-mutation activity ratio used in this study should be interpreted as the relative reduction in average activity caused by increasing the mutational burden in a heterogeneous mutation pool, rather than as the effect of identical variants before and after additional mutations. This library-averaged approach was chosen because the mutational costs considered here are also defined as average expected physicochemical effects over many possible single-nucleotide substitutions. In addition, because the non-SGCs constructed in this study were generated by reassigning only Ala, Ser, and Leu, the detectable effects may depend on how frequently mutations involving these amino acids occur in each reporter gene and whether the affected positions are functionally important. If genetic code-dependent effects are restricted to a small subset of deleterious variants, such effects may be masked in pooled activity measurements. Future studies using defined variants or high-throughput genotype–phenotype mapping assays will be required to determine the mutation-specific and position-specific mechanisms underlying genetic code-dependent effects on protein function (Rozhoňová et al., 2024).”
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  While we suggest that you address all the technical points raised by the reviewers, you may specifically want to limit the conclusion of the study to mutational robustness at the level of individual protein activity, rather than making broader generalizations. Also, the statistical analysis needs to be strengthened, as indicated in the reviews.
  
  We thank the Reviewing Editor for these important suggestions. We agree that the conclusion of the original manuscript was broader than what can be directly supported by the present experiments. In the revised manuscript, we have therefore limited our conclusion to mutational robustness at the level of individual reporter protein activity measured in a reconstituted in vitro translation system. We now explicitly state that our results do not directly address robustness at the level of cellular fitness, protein interaction networks, or long-term evolution.
  
  We have also strengthened the statistical analyses throughout the manuscript. Specifically, we added one-way ANOVA followed by Tukey’s post hoc test for Fig. 1D, Welch’s t-tests for Fig. 2B, Spearman’s rank correlation analyses for Fig. 4B–D and Fig. 5C/E/G, and two-way ANOVA for GAL activity in Fig. 5B. These analyses have been incorporated into the revised Results, figure legends, and supplementary information.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Discuss other alternative hypotheses if the error minimization theory is unlikely.
  
  We thank the reviewer for this helpful suggestion. We think that the absence of a detectable relationship between mutational cost and reporter protein activity in our assay should not be interpreted as excluding all possible roles of error minimization in the evolution of the genetic code. Our results specifically address one aspect of the error minimization theory: whether physicochemical-property-based mutational cost predicts the average effect of random point mutations on individual reporter protein activity within the experimentally accessible range of non-SGCs tested here.
  
  In the revised Discussion, we have clarified that the organization of the SGC may have been shaped by multiple factors, including robustness to translational errors, historical constraints associated with genetic code expansion, biosynthetic or coevolutionary processes, stereochemical interactions, and the evolvability of proteins. Our results suggest that the contribution of mutational robustness at the level of individual protein activity may be limited within the range examined here, but they do not exclude the possibility that the SGC provides advantages under other forms of error, at the level of translation fidelity, cellular fitness, or long-term evolution.
  
  We have added a short discussion to clarify this point without expanding the scope of the manuscript beyond the present experimental results.
  
  “It should be noted that this conclusion is limited to the activity of individual reporter proteins translated in a reconstituted in vitro system. Therefore, whether similar trends would be observed at the level of cellular fitness or long-term evolution remains an open question. Moreover, our results do not exclude other possible roles of SGC organization. The SGC may have been shaped by multiple factors, including robustness to translational errors, historical constraints during genetic code expansion, biosynthetic or coevolutionary relationships among amino acids, stereochemical interactions, and effects on protein evolvability (Katoh and Suga, 2023; Koonin and Novozhilov, 2017, 2009; Novozhilov et al., 2007; Wong, 2005).”
  
  (2) A brief description of the PURE translation system can be provided for people from outside the field.
  
  We have added a brief description of the PURE system in the Introduction to make the experimental platform more accessible to readers outside the field. Specifically, we now explain that the PURE system is a reconstituted cell-free translation system composed of purified translation factors, ribosomes, aminoacyl-tRNA synthetases, tRNAs, amino acids, and energy-regeneration components. We also clarify that, in this study, we used a tRNA-free version of the PURE system, in which defined synthetic tRNA sets were supplied externally to reconstruct each genetic code.
  
  Introduction
  
  “A representative platform for such reconstitution is the PURE system (Shimizu et al., 2001), a reconstituted cell-free translation system composed of purified translation components, including ribosomes, translation factors, aaRSs, amino acids, and energy-regeneration components. In particular, a tRNA-free PURE system (Miyachi et al., 2022), in which endogenous tRNA activity is minimized and defined tRNA sets are supplied externally, enables genetic codes to be reconstructed by controlling the supplied tRNAs.”
  
  (3) Figure 5D and F - Technical replicates are provided only for GAL. A similar approach should be taken for LUC and mSG.
  
  We agree that replicate-level measurements for Luc and mSG would further improve reliability. However, repeating the full translation experiments for these reporters was not feasible in the current revision, as each experiment requires large amounts of freshly prepared tRNA-free PURE system and multiple defined tRNA mixtures for every genetic code variant tested. Given these material and technical constraints, we were unable to perform additional biological replicates within the scope of this revision. We would like to emphasize, however, that the GAL replicates shown in Fig. 5D and F are fully consistent across independent experiments, providing direct evidence for the reproducibility of the assay itself. Furthermore, the key metric in our analysis, the activity ratio between high- and low-mutation groups within each genetic code, is an internally normalized measure that is inherently less sensitive to between-experiment variability than absolute activity values. The correlation analyses further showed no significant relationship between mutational cost and this ratio across all three reporters, and this conclusion is consistent regardless of which reporter is examined. Together, we believe these results provide a robust basis for the conclusions drawn, even in the absence of full replication for Luc and mSG.
  
  (4) Provide statistical analysis wherever it is relevant (e.g, to support a lack of correlation).
  
  We have strengthened the statistical analyses throughout the revised manuscript. In particular, to support the lack of detectable correlation between mutational cost and mutation-induced activity loss, we performed Spearman’s rank correlation analyses between each mutational cost metric and the high-/low-mutation activity ratio for all three reporters. No significant correlations were detected for any reporter or cost metric. In addition, we added statistical analyses for other relevant figures, including one-way ANOVA followed by Tukey’s post hoc test for Fig. 1D, Welch’s t-tests for Fig. 2B, Spearman’s rank correlation analyses for Fig. 4B–D, and two-way ANOVA for GAL activity in Fig. 5B.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) In line 122, the phrase "as evenly as possible" is ambiguous and should be explained more precisely.
  
  We thank the reviewer for pointing this out. We have revised the phrase “as evenly as possible” to describe the codon design more precisely. Specifically, we now state that the NanoLuc coding sequences were designed so that the codons available in each genetic code were used with minimal differences in codon counts, while preserving the amino acid sequence of NanoLuc.
  
  “For near-SGC and SGC, the NanoLuc coding sequences were designed so that the codons available in each genetic code were used with minimal differences in codon counts, while preserving the amino acid sequence (Fig. 1B, 32 codons and 46 codons).”
  
  (2) For Figure 1D, a Western blot or another protein gel-based assay would be helpful to exclude the possibility that the observed differences arise from variation in translation efficiency rather than differences in protein activity.
  
  We agree that a protein gel-based assay such as Western blotting would in principle allow us to distinguish differences in translated protein amount from differences in specific activity, and we understand why such data would be informative. However, we would like to clarify that the primary purpose of Fig. 1D was to evaluate the overall functional translation output of each reconstructed genetic code, rather than to determine the mechanistic basis of any observed differences. In this context, NanoLuc luminescence serves as an integrated readout of the entire translation process, encompassing both translational efficiency and protein folding/activity. Crucially, regardless of whether the observed differences in NanoLuc luminescence reflect lower protein yield, reduced specific activity, or a combination of both, the conclusion of Fig. 1D remains the same. Although we did not perform Western blotting in this study, we believe that such an analysis would not change this interpretation and that the current data are sufficient to support this conclusion.
  
  (3) The number 3^9 is not immediately intuitive. It would be helpful if the authors also stated that this corresponds to approximately 20,000 possible non-standard genetic codes.
  
  We have revised the text to state both the exact number and the approximate value: 3<sup>9</sup> = 19,683, approximately 20,000 possible non-standard genetic codes.
  
  (4) The rationale for using the three cost parameters (PR, MV, and HI) should be explained in greater detail. Because these parameters are central to the manuscript, a citation alone is not sufficient. A concise explanation of their biological relevance would improve the clarity and accessibility of the study.
  
  We agree that the biological relevance of the three cost parameters should be explained more clearly. In the revised manuscript, we have added a concise explanation of why polar requirement (PR), molecular volume (MV), and hydropathy index (HI) were used.
  
  These parameters were selected because they have been widely used in theoretical studies of genetic code optimality and represent distinct physicochemical aspects of amino acid substitutions. PR reflects polarity-related interactions and has been a classical metric in error minimization analyses of the genetic code. MV represents side-chain size and steric volume, which could influence packing and structural stability in proteins. HI reflects hydrophobicity, which is closely related to protein folding and hydrophobic core formation. We have also clarified that these metrics are simplified descriptors and do not capture residue-specific structural or functional context, which we now discuss as a limitation of the study.
  
  “PR reflects polarity-related interactions of amino acids and has been used as a classical measure of amino acid similarity in error minimization analyses. MV represents side-chain size and steric volume, which could affect protein packing and structural stability, whereas HI reflects hydrophobicity, which could be closely related to protein folding or hydrophobic core formation.”
  
  (5) In Figure 3, the experimental framework would be easier to follow if the authors included a schematic and data for one representative non-SGC, explicitly illustrating how it differs from the near-SGC with respect to each of the three cost measures.
  
  We agree that showing one representative non-SGC would make the experimental framework and cost calculation more intuitive.
  
  In the revised manuscript, we added a new panel to Fig. 3 comparing the near-SGC with a representative non-SGC. We selected the PR<sub>max</sub> code as the representative example because it clearly illustrates how reassignment of vacant codon boxes can increase one mutational cost metric relative to the near-SGC. In this panel, we first show the codon assignment schemes of the near-SGC and PR<sub>max</sub> code in the same genetic-code format used in Fig. 1. We then show the corresponding heatmap representations for the three physicochemical properties used in the cost calculation: polar requirement, molecular volume, and hydropathy index. The Cost<sub>PR</sub>, Cost<sub>MV</sub>, and Cost<sub>HI</sub> values are shown for each code.
  
  This new panel illustrates how changes in codon assignment are translated into different physicochemical cost landscapes and clarifies how the representative non-SGC differs from the near-SGC with respect to each of the three cost measures.
  
  “To make the design of non-SGCs more explicit, we show one representative non-SGC together with the near-SGC in Fig. 3B. This comparison illustrates how assignment of Ala, Ser, or Leu to the vacant codon boxes changes the three mutational cost metrics, Cost<sub>PR</sub>, Cost<sub>MV</sub>, and Cost<sub>HI</sub>.”
  
  (6) In line 329, the phrase "similar pattern" is ambiguous and should be explained more explicitly.
  
  We have revised the ambiguous phrase “similar pattern” to describe the observation more explicitly. Specifically, we now state that the relative differences in GAL activity among genetic codes observed in the low-mutation library were broadly retained in the high-mutation library, although overall activity decreased.
  
  “For the high-mutation library, GAL activity decreased overall, while the relative differences in activity among genetic codes observed in the low-mutation library were broadly retained.”
  
  (7) Figure S7 appears to be an important control for the experiments shown in Figure 5, and I recommend moving it to the main figures.
  
  We thank the reviewer for this helpful suggestion. We agree that the HiBiT-based quantification of GAL protein amount is an important control for interpreting the GAL activity measurements in Fig. 5, and we appreciate the recommendation to increase its visibility. This analysis shows that the amount of C-terminally completed GAL products was broadly comparable among genetic codes, indicating that the large differences in GAL activity were not primarily attributable to differences in total translated protein amount.
  
  After careful consideration, we have opted to retain this analysis in the supplementary figures because the main focus of Fig. 5 is the relationship between mutational cost and mutation-induced activity loss, quantified by the high-/low-mutation activity ratio. The HiBiT experiment addresses a related but distinct question: whether differences in absolute GAL activity among genetic codes can be explained by differences in protein abundance, and we felt that including it in the main figures might shift the emphasis away from the central message of Fig. 5. Nevertheless, we have added a clear reference to Figure 4–figure supplement 1 in the main text and the figure legend to ensure that readers are directed to this control when interpreting Fig. 5.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.02.24.707864v2
Local file Local file

Untitled document

1
1. skones 03 Jun 2026
  
  in Public
  
  our learned paper signalas the observable relaxation p(Z | Xreb) of the ideal p(Z | X)
  
  i think we want more setup. I think we want to state: "As mentioned in the problem formulation, papers are transformed into reviews before reaching the decision. Given a paper (X), over decisions (D) and reviews (R) ( aka p(D,R|X)). While the flow indicates that reviews precede the decision, now that we have built a shortcut that predicts the decisionwithout the review, we question whether we can generate more \italics{decision-calibrated} reviews. This is valid by simply rearranging the joint distribution like so:
  
  then the formula.
  
  then the both factorizations are vlaid. then we can give a more conceptual understanding: if the conjecture is correct, that much of the decision can be determined by overall presentation quality, then perhaps reviews are simply a means-to-an-end to justify the reviewer's inital recaction.
  
  heres more context about sources that we can cite for this behavior: The strongest terms to search are:
  
  motivated reasoning, post hoc rationalization, biased assimilation, affect heuristic, halo effect, and in peer review specifically prestige bias, reviewer bias, or cognitive bias in peer review.
  
  Here are the most relevant scientific articles.
  
  Kunda — “The Case for Motivated Reasoning”
  
  Ziva Kunda, 1990, Psychological Bulletin
  
  This is probably the canonical article. Kunda argues that motivation can bias the cognitive processes people use to access, construct, and evaluate beliefs. In your case: a reviewer’s initial evaluation of a paper can shape which criticisms feel salient or persuasive.
  
  Best use: cite this for the general mechanism: people reason toward a conclusion they are already inclined to reach.
  
  Nisbett & Wilson — “Telling More Than We Can Know”
  
  Richard Nisbett & Timothy Wilson, 1977, Psychological Review
  
  This is highly relevant to the “reverse reasoning” part. They reviewed evidence that people often lack direct introspective access to the mental processes that caused their judgments, but still produce explanations for those judgments.
  
  Best use: cite this for the idea that reviewers may sincerely report reasons for a judgment without accurately knowing what actually caused the judgment.
  
  Haidt — “The Emotional Dog and Its Rational Tail”
  
  Jonathan Haidt, 2001, Psychological Review
  
  Haidt’s social intuitionist model is about moral judgment, not peer review, but the mechanism maps well: quick intuitive judgment is followed by slower post hoc reasoning. The article explicitly frames reasoning as often coming after intuition rather than causing judgment.
  
  Best use: cite this when you want the phrase “intuition first, reasoning second.”
  
  Lord, Ross & Lepper — “Biased Assimilation and Attitude Polarization”
  
  Charles Lord, Lee Ross & Mark Lepper, 1979, Journal of Personality and Social Psychology
  
  This is a classic study on how people evaluate evidence differently depending on their prior beliefs. People tend to accept congenial evidence more readily and scrutinize uncongenial evidence more harshly.
  
  Best use: cite this for the reviewer behavior where evidence supporting the reviewer’s initial take gets treated as decisive, while contrary evidence gets nitpicked.
  
  Slovic et al. — “The Affect Heuristic”
  
  Paul Slovic, Melissa Finucane, Ellen Peters & Donald MacGregor, 2007, European Journal of Operational Research
  
  This article argues that feelings of “goodness” or “badness” can rapidly guide judgments and decisions. That is very close to what people informally mean by a “paper gestalt”: an overall positive or negative feeling that then colors assessment of specific features.
  
  Best use: cite this for the gut reaction / affective global impression component.
  
  Tomkins, Zhang & Heavlin — “Reviewer Bias in Single- versus Double-Blind Peer Review”
  
  Andrew Tomkins, Min Zhang & William D. Heavlin, 2017, PNAS
  
  This is peer-review-specific. In a large conference-review setting, submissions were reviewed under single-blind and double-blind conditions. The study found that single-blind reviewing advantaged papers by famous authors and high-prestige institutions.
  
  Best use: cite this to show that manuscript judgments are not purely about the paper’s intrinsic content; contextual cues can bias reviews.
  
  Blank — “The Effects of Double-Blind versus Single-Blind Reviewing”
  
  Rebecca Blank, 1991, American Economic Review
  
  This was a randomized experiment at the American Economic Review. It found that reviewers were more critical when author identity was blinded, and acceptance rates were lower under double-blind review.
  
  Best use: cite this as older experimental evidence that review outcomes shift when identity/prestige cues are removed.
  
  Peters & Ceci — “Peer-Review Practices of Psychological Journals”
  
  Douglas Peters & Stephen Ceci, 1982, Behavioral and Brain Sciences
  
  Classic and brutal study: previously published psychology articles were resubmitted to the same journals with altered author/institution information. Most were not recognized, and many were rejected.
  
  Best use: cite this for the unreliability/context-sensitivity of peer review.
  
  Teplitskiy et al. — “The Social Structure of Consensus in Scientific Review”
  
  Misha Teplitskiy et al., 2018
  
  This study analyzed reviews of 7,981 neuroscience manuscripts submitted to PLOS ONE and found that reviewers favored authors closer to them in the co-authorship network. The authors interpret this not just as simple nepotism but as partly reflecting “schools of thought” and substantive evaluative differences.
  
  Best use: cite this if your point is that reviewers’ intellectual/social position can shape what they see as valid or flawed.
  
  Sordi et al. — “Halo Effect in Peer Review”
  
  José Osvaldo De Sordi et al., 2020
  
  This one is directly about the halo effect in peer review. It explores whether belonging to the same professional field or group can bias article evaluation during review.
  
  Best use: cite this for the closest peer-review-specific phrase to “paper gestalt”: halo effect in peer review.
  
  A good synthesis sentence would be:
  
  The phenomenon can be described as an interaction of affective first impressions, motivated reasoning, and post hoc rationalization: reviewers may form an early global evaluation of a manuscript and then selectively construct or emphasize criticisms that make that evaluation appear analytically grounded.
  
  For a paper or lit review, I’d probably cite Kunda 1990 + Nisbett & Wilson 1977 + Slovic et al. 2007 for the psychology mechanism, then Tomkins et al. 2017 + Peters & Ceci 1982 + Teplitskiy et al. 2018 for peer-review-specific evidence.
  
  the key is that based off this realization, we sought to see whether conditioning frontier coding agents for reviews can 1) improve their decision-prediction and 2) improve their generated reviews' alignment to human reviews.
Annotators

skones
www.biorxiv.org www.biorxiv.org

Orco regulates the circadian activity of pheromone-sensitive olfactory receptor neurons in hawkmoths

1
1. Public_Reviews 03 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Joint Public Review
  
  This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.
  
  We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript is much stronger now after the revision which incorporates the requested changes. We added results of new experiments and additional analyses. Although these new insights did not change the previous conclusions, we significantly reworked the Discussion and added further references to clarify the conclusions we want to make.
  
  Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs associate with the olfactory receptor co-receptor (Orco) to be trafficked to the membrane of the cilium of the ORN, where they can be contacted by pheromones and odorants. In Manduca sexta, evidence is accumulating for G-protein coupled metabotropic pheromone transduction and not for OR-Orco dependent ionotropic transduction, as shown for Drosophila melanogaster. In both insect species, besides its chaperone function, Orco can form leaky cation channels, which can regulate the spontaneous spiking activity of ORNs. In this study, we explored this role of Orco.
  
  Strengths:
  
  The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.
  
  Major weaknesses:
  
  At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.
  
  Please see our responses to the detailed comments.
  
  Detailed comments are provided below:
  
  (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.
  
  The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2013). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco ligand candidates (OLCs) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). There, we also demonstrated that OLC15 dose-dependently antagonizes the VUAA1-dependent activation of Orco.
  
  Furthermore, we tested other published Orco antagonists, which were characterized in heterologous assays, in primary cell cultures of hawkmoth ORNs, as well as in in vivo assays in intact hawkmoths. We focused on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific antagonists but instead affected different ion channel targets depending on the time of day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Due to those results and other comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15 as most adequate to antagonize Orco functions in Manudca. In the current study, we focus on Orco without excluding the possibility that other ion channels in the ORNs contribute to the control of membrane potential rhythms.
  
  We have clarified the Methods section accordingly.
  
  (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.
  
  We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We included these additional qPCR experiments and edited the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints.
  
  (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.
  
  Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). For clarification, we added the 2015 citation to the Modeling chapter in the Methods section.
  
  We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs but we know that both are expressed in the pheromone-sensitive ORNs. Thus, as the referee suggests, we added text regarding the presence and localization of OR-Orco heteromers. Consistent data collected across different experiments (heterologous expression systems, primary cell cultures of hawkmoth ORNs, in vivo/in situ studies) support that Orco homomers are present in hawkmoth ORNs. In addition to co-expression of MsexOrco and MsexSNMP-1 with either MsexOr-1 or MsexOr-4 in a heterologous expression system, MsexOrco expression alone was already sufficient to increase intracellular Ca<sup>2+</sup> levels spontaneously as a result of its property as leaky, non-specific cation channel, and in response to VUAA1 application (Nolte et al., 2013). Both in developing hawkmoth pupae and differentiating primary cell cultures of hawkmoth ORNs, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors but where Orco affected spontaneous activity and intracellular Ca<sup>2+</sup> levels dependent on VUAA1 (Nolte et al., 2016). In vitro patch clamp studies of differentiating cultured hawkmoth ORNs during this time window of pupal development characterized ion channels/currents with properties of Orco as a leaky, non-specific cation channel/current that depends on protein kinase C and cyclic nucleotides (Dolzer et al., 2021, 2008; Krannich and Stengl, 2008; Stengl, 1993). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but they do not heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths because all OR-specific antibodies tested did not work in immunocytochemical studies of hawkmoth antennae (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990). Our hypothesis of differential distribution of Orco homomers in the some and dendrite compartment, and OR-Orco heteromers in the cilia is based on differential immunocytochemical localization of Drosophila ORs mainly in the cilia compartment (Benton et al., 2006).
  
  We clarified our manuscript accordingly.
  
  (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.
  
  It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during an experimentally very challenging long-term recording experiment over several days. In addition, we observed over the years in our animal raising facility that in 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths, next to stress signals, rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Because we focus on spontaneous activity and not on pheromone-dependent physiology in this study, we used isolated males that were never exposed to the female pheromones, taking phase dispersal into account. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a phase-dispersed free-running population. As requested by the referees in point (7), we added RAIN to test for rhythmicity in each of our recordings and revised the manuscript accordingly.
  
  Furthermore, in preliminary experiments we briefly exposed hawkmoths to pheromone the night before the start of the experiment. However, we failed to obtain phase-synchronized spiking rhythms. Most likely, a circadian pattern of pheromone exposure would have been necessary as zeitgeber, which could not be used here due to long-term pheromone-dependent effects in spiking activity. These results are added as supplementary figure to Fig 3.
  
  (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.
  
  We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording sites are located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We clarified this in the Methods section.
  
  In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Lee and Strausfeld (1990) mapped all types of antennal sensilla, and together with pheromone-dependent tip-recordings of Kaissling et al. (1989) it was shown that most of the male antennal sensilla are pheromone-sensitive long trichoid sensilla, with one of the two innervating ORNs always responding to bombykal, ensuring high sensitivity to pheromone detection. Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs (review: (Stengl, 2010)). This would indicate that all ORNs, whether they express ORs sensitive to pheromone or general odorants, could potentially share the same Orco-dependent spontaneous activity rhythms. Furthermore, in our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum and is very likely shared by all types of OR-Orco expressing ORNs.
  
  (6.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…
  
  There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that PKC- and cGMP/cAMP-dependent regulations are present for Orco in other insect species. To test this, we are currently characterizing second messenger-dependence of spontaneous spiking activity, which is the focus of a follow-up manuscript. Nevertheless, to provide more evidence for our hypothesis of the current manuscript, we added a new set of tip-recording experiments that demonstrate cAMP-dependent gating of Orco. Because of the addition of this figure, we merged figures 8-10 into Figure 8 and added the cAMP data as Figure 9.
  
  (6.2) … and the PTTF model proposed is somewhat disappointing.
  
  For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper that we refer to in the manuscript (Stengl and Schneider, 2024). We added clarification of how Orco activation can influence cAMP levels. A more elaborate PTFL clock model including many more of the identified ion channels in hawkmoth ORNs is the focus of another manuscript to come.
  
  (6.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.
  
  Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro (Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (Stengl, 2010; Stengl and Funk, 2013; Takagi et al., 2025; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)). We added references to the possible odor transduction mechanisms to the introduction.
  
  (6.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.
  
  While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of indirect cAMP effects via e.g., Orco subunits complexing with other molecules under direct cAMP control, such as other ion channel subunits. Furthermore, it does not exclude so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a zeitgeber time-dependent PKC- and cyclic nucleotide-dependent modulation of Orco. These detailed studies will be published in a follow-up publication. In the revised version of this manuscript, we added tip-recording experiments that indicate cAMP involvement in Orco gating (new Figure 9).
  
  (7) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.
  
  The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=40). We are thankful to the reviewers’ suggestion to use RAIN since this analysis revealed circadian rhythms in 7 of 11 LD recordings, 8 of 12 DD recordings, and 2 of 12 OLC15 recordings. Please see also our response to (4) above, commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.
  
  (8) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).
  
  According to the valuable suggestions of the referees, we used RAIN to detect circadian rhythms in the spiking attributes in each individual animal. Since only 2 of 12 animals displayed a circadian rhythm in OLC15, statistical comparison of circadian amplitudes is not possible. We revised the results section accordingly and added to the figure legend to make it clearer that the heat maps in Fig 5 are representative from one animal each and not averages across animals.
  
  As the reviewer states correctly in (7), wavelet results of circadian rhythmicity must be interpreted carefully because of the low number of circadian cycles in ~3-4 day recordings. Since the heatmaps in Figure 5 visually revealed the presence of ultradian rhythms, the main focus of the wavelet analysis in Figure 6 is in the detection and quantification of ultradian periods up to 20 h.
  
  We revised the Methods section to include references to previous experiments that characterized the effect of different doses of OLC15 and other Orco antagonists and agonists in M. sexta antennae (Nolte et al., 2016). Please see also our response to (1).
  
  (9) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.
  
  We revised the manuscript accordingly and clarified which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).
  
  (10.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).
  
  We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We are currently searching for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single-nucleus transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript. However, we added a set of experiments to this manuscript in which we demonstrate that the effect of increased cAMP on the spontaneous spiking activity is mediated by Orco (new Figure 9).
  
  (10.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.
  
  Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. In section 6.2 we already stated that our experiments do not exclude that Orco is under indirect control of the TTFL. We revised our discussion accordingly.
  
  The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrated that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K<sup>+</sup> concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.
  
  (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:
  
  (a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).
  
  We revised the discussion accordingly.
  
  (b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).
  
  We added those experiments to the revised version of the manuscript (see our response to (2)).
  
  (c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.
  
  We added possible negative feedback elements to the Discussion to explain how our proposed PTFL could in principle work independent of TTFL clock.
  
  Minor weaknesses:
  
  (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?
  
  We have revised the discussion accordingly.
  
  (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.
  
  These large fluctuations stem from doing experiments at different seasons (higher temperature and humidity in the summer months, lower temperature and humidity in winter). Throughout each individual experiment, conditions were stable. We clarified the Methods section accordingly.
  
  Recommendations for the authors:
  
  The authors should post the code for their computational model to a repository like GitHub.
  
  The code for the computational model is now available at https://github.com/a-c-schneider/VijayanForlinoEtAl2025_Model.git
  
  References
  
  Benton R, Sachse S, Michnick SW, Vosshall LB. 2006. Atypical Membrane Topology and Heteromeric Function of Drosophila Odorant Receptors In Vivo. PLOS Biology 4:e20. DOI: https://doi.org/10.1371/journal.pbio.0040020
  
  Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. DOI: https://doi.org/10.1371/journal.pone.0036784
  
  Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. Journal of Experimental Biology 206:1575–1588. DOI: https://doi.org/10.1242/jeb.00302
  
  Dolzer J, Krannich S, Stengl M. 2008. Pharmacological Investigation of Protein Kinase C- and cGMP-Dependent Ion Channels in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Chemical Senses 33:803–813. DOI: https://doi.org/10.1093/chemse/bjn043
  
  Dolzer J, Schröder K, Stengl M. 2021. Cyclic nucleotide-dependent ionic currents in olfactory receptor neurons of the hawkmoth Manduca sexta suggest pull–push sensitivity modulation. European Journal of Neuroscience 54:4804–4826. DOI: https://doi.org/10.1111/ejn.15346
  
  Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Frontiers in Cellular Neuroscience 12:218. DOI: https://doi.org/10.3389/fncel.2018.00218
  
  Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. DOI: https://doi.org/10.1371/journal.pone.0058889
  
  Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Current biology 34:1414-1425.e5. DOI: https://doi.org/10.1016/j.cub.2024.02.042, PMID: 38479388
  
  Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. DOI: https://doi.org/10.3390/insects15121016
  
  Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proceedings of the National Academy of Sciences 108:8821–8825. DOI: https://doi.org/10.1073/pnas.1102425108
  
  Kaissling KE, Hildebrand JG, Tumlinson JH. 1989. Pheromone receptor cells in the male moth Manduca sexta. Archives of Insect Biochemistry and Physiology 10:273–279. DOI: https://doi.org/10.1002/arch.940100403
  
  Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. Journal of Experimental Biology 172:345–354. DOI: https://doi.org/10.1242/jeb.172.1.345
  
  Krannich S, Stengl M. 2008. Cyclic Nucleotide-Activated Currents in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Journal of Neurophysiology 100:2866–2877. DOI: https://doi.org/10.1152/jn.01400.2007
  
  Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. DOI: https://doi.org/10.1038/22566
  
  Lee JK, Strausfeld NJ. 1990. Structure, distribution and number of surface sensilla and their receptor cells on the olfactory appendage of the male mothManduca sexta. Journal of Neurocytology 19:519–538. DOI: https://doi.org/10.1007/BF01257241
  
  Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. Journal of Biological Rhythms 22:502–514. DOI: https://doi.org/10.1177/0748730407307737
  
  Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. DOI: https://doi.org/10.1371/journal.pone.0062648
  
  Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. DOI: https://doi.org/10.1371/journal.pone.0166060
  
  Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. Journal of Biological Rhythms 22:43–57. DOI: https://doi.org/10.1177/0748730406295462, PMID: 17229924
  
  Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. Journal of Biological Rhythms 29:318–331. DOI: https://doi.org/10.1177/0748730414546133
  
  Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. Journal of Biological Rhythms 27:388–397. DOI: https://doi.org/10.1177/0748730412456265
  
  Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. DOI: https://doi.org/10.1371/journal.pone.0121230
  
  Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. DOI: https://doi.org/10.1523/ENEURO.0376-24.2024, PMID: 39880675
  
  Stengl M. 2010. Pheromone Transduction in Moths. Frontiers in Cellular Neuroscience 4:133. DOI: https://doi.org/10.3389/fncel.2010.00133
  
  Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. Journal of Comparative Physiology A 174:187–194. DOI: https://doi.org/10.1007/BF00193785
  
  Stengl M. 1993. Intracellular-Messenger-Mediated Cation Channels in Cultured Olfactory Receptor Neurons. Journal of Experimental Biology 178:125–147. DOI: https://doi.org/10.1242/jeb.178.1.125
  
  Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. Journal of Comparative Physiology A 199:897–909. DOI: https://doi.org/10.1007/s00359-013-0837-3
  
  Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. Journal of Neuroscience 10:837–847. DOI: https://doi.org/10.1523/JNEUROSCI.10-03-00837.1990, PMID: 2319305
  
  Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Frontiers in Physiology 14:1243455. DOI: https://doi.org/10.3389/fphys.2023.1243455
  
  Takagi S, Abuin L, Mermet J, Lee D, Benton R. 2025. A GPCR signaling pathway in insect odor detection. DOI: https://doi.org/10.1101/2025.10.03.680299
  
  Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Current Biology 14:638–649. DOI: https://doi.org/10.1016/j.cub.2004.04.009, PMID: 15084278
  
  Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla. In: Locke M, Smith DS (Eds). Insect Biology in the Future. Academic Press. p. 735–763. DOI: https://doi.org/10.1016/B978-0-12-454340-9.50039-2
  
  Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell and Tissue Research 383:7–19. DOI: https://doi.org/10.1007/s00441-020-03363-x
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.06.17.659282v3
www.medrxiv.org www.medrxiv.org

Heterogeneity of use, access and retention of insecticide-treated nets: implications for subnational tailoring to maximise malaria control

1
1. Public_Reviews 03 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper provides a novel method to improve the accuracy of predictions of the impact of ITN strategies, by using sub-national estimates of the duration of ITN access and use over time from cross-sectional survey data and annual country ITNs received.
  
  Strengths:
  
  The approach is novel, makes use of available data, and has considered all of the relevant components of ITN distributions.
  
  Weaknesses:
  
  (W1.1) The main message of the paper was not very clear, and did not seem to fit the title. The title focuses on sub-national tailoring of ITN, but the abstract did not feature results directly about SNT. It was not very clear what the main result of the paper was - there are several ITN observations in the results and discussion. Most did not seem to be directly about SNT, but rather sub-national differences in use and access were accounted for in the analyses. It was not clear if the same conclusions would be reached without accounting for sub-national differences, but the estimates and predictions could be expected to be more accurate.
  
  Thank-you for highlighting this. We agree the title could be improved to better reflect the main messages of the paper and have now updated it to “Heterogeneity of use, access and retention of insecticide-treated nets: implications for subnational tailoring to maximise malaria control”. All parameters are estimated at a subnational level; this is not always the case a national level. We therefore do not have national-level models without subnational differences that our results could be compared to.
  
  (W1.2) Some of the results seemed to me to be apparent even without a modelling exercise (eg high coverage could not be maintained between campaigns, use would be higher with 2-yearly distributions rather than 3-yearly) or were not in themselves new insights (eg estimates of the duration of use). It would be helpful to clearly state what the novel results are in the abstract, the first paragraph of the discussion and the conclusions, and to make sure that the title is consistent.
  
  It is our understanding assessments on ITN coverage are often made from infrequent surveys, for example from MIS. These are typically conducted six months postcampaign and may miss notable reductions in use and access beyond this. Comparisons on ITN use and access are also frequently made directly between DHS surveys, which can be misleading in isolation if the time between campaigns and surveys is not considered. We have tried to highlight this more clearly in relation to Burkina Faso with the following text:
  
  “The observed decrease in use and access across many regions in Burkina Faso may therefore be a by-product of DHS surveys being conducted at progressively later dates relative to the most recent campaign; this does not necessarily indicate an underlying trend in decreasing use or access over longer timescales.”
  
  We do believe modelling exercises, such as the methodology presented here, can help generate improved estimates of ITN use and access over time than estimates from surveys alone, which can be biased by the relative timings of campaigns. It is also our understanding previous studies have generated national estimates of ITN retention. We are not aware of any previous studies that have estimated the duration ITNs continue to be used for, which is arguably of greater epidemiological importance than retention time. To best knowledge, these have also not been estimated at subnational scales previously.
  
  We acknowledge the novelty of some results were not clearly presented previously and are grateful to the reviewer for highlighting this. We have now highlighted some of the novel findings more clearly in the abstract, with the following text:
  
  “However, subnational variation in ITN retention and the duration that ITNs remain in use have not previously been quantified.”
  
  “Our results highlight that although transmission intensity remains an important factor for subnational tailoring of malaria control interventions, other factors, such as ITN use given access, meaningfully influence optimal deployment strategies.”.
  
  We have also highlighted the novelty and relevance of our findings more clearly in the first paragraph discussion, with the following text:
  
  “Funding constraints have also increased the need for consideration of subnational tailoring, with many recommendations being made on the basis of transmission intensity in the World Health Organisation (2025) Subnational Tailoring Reference Manual. However, a key uncertainty in assessing the potential impact of different ITN interventions has been how long nets remain in use rather than how long they are retained, and how this varies between regions. Here, to our best knowledge, we present the first estimates of subnational variation in ITN retention and the duration that ITNs remain in use, and also quantify for the first time how ITN use, access and retention vary between subnational regions across multiple African countries. Our work supports the change in guidance to optimal coverage as it highlights ITN interventions have notable differences in impact between settings, and that distributing fewer but more effective ITNs, particularly pyrethroid-chlorphenapyr products, is likely to be more impactful than maximising long-term coverage through increased campaign frequencies with pyrethroid-only ITNs. Our work also broadly supports World Health Organisation (2025) recommendations for subnational tailoring, particularly the consideration of deprioritisation of ITN distribution in very low transmission settings. However, our results provide new indications that deprioritisation of areas with higher ITN use given access may lead to greater resurgences in cases, highlighting that subnational tailoring decisions could be optimised further by considering additional factors to transmission intensity alone.”
  
  The novelty and relevance of our results are also now highlighted in the following text, which has been incorporated into the concluding paragraph:
  
  “In conclusion, the work indicates that universal coverage targets of 80% are unlikely to be consistently met due to waning overall ITN use in the intervening years between triennial mass campaigns. Improved coverage can be achieved through more frequent biennial distributions, though this is unlikely to be feasible at scale given the current funding landscape. Indeed, when resources are constrained, deprioritisation of ITN mass campaigns in certain settings is being increasingly considered through subnational tailoring of malaria control interventions. Our work highlights that the relationship between transmission intensity (whether measured in terms of prevalence or clinical cases) and intervention impact is non-linear, and notable resurgences in cases may follow when campaigns are deprioritised in all but very low transmission settings. This broadly supports WHO subnational tailoring guidance, which suggests consideration of deprioritising distribution of ITNs in regions with PfPR<sub>2-10</sub> < 1% (World Health Organization, 2025). However, while the World Health Organization (2025) Subnational Tailoring Reference Manual proposes that the withdrawal of ITNs in favour of indoor residual spraying should be considered in areas with low ITN use, here we estimate that ITN use alone appears to be a notably poorer predictor of the impact of ceasing mass campaigns than use given access. Our findings suggest that regions with higher use given access may experience disproportionately greater resurgences in cases following deprioritisation. This implies that regions with low use given access may warrant consideration for cessation of ITN distribution, rather than decisions being based solely on low overall ITN use irrespective of whether communities have sufficient ITN access. However, subnational differences in ITN use, access and retention are key knowledge gaps in many settings, and when estimated from infrequent surveys they are highly sensitive to bias arising from the timing of surveys relative to when campaigns were conducted. To our knowledge, this study is the first to estimate subnational variation in ITN retention and the first to estimate the duration that ITNs remain in use, which is of greater epidemiological relevance than retention time. It also provides a novel framework to correct for biases in estimates of ITN use and access arising from when campaigns were conducted. Although campaigns have historically aided increasing ITN use and access over time, we estimate the mean duration of ITN use is consistently shorter than mean retention times in all regions. This raises questions about whether punctuated distribution of ITNs through campaigns is the optimal mechanism for maximising their effectiveness and cost-effectiveness. Maximising the cost-effectiveness of interventions has become increasingly pertinent in the current funding context, and consideration of alternative distribution strategies, such as increased distribution through continuous distribution channels, including school- or community-based distribution, may be warranted. Frameworks such as the one presented here, which take into account the potential for impact from different net types and the high variability of ITN duration and use, could support NMP decision making on how best to maximise impact from available funds. Whilst such frameworks may be a useful tool, local knowledge of factors impacting ITN access and use as well as operational decision making will be paramount for NMP-led tailoring of subnational strategies.”
  
  (W1.3) On L236, the link to SNT is stated: "the models indicate trends that can support subnational tailoring of ITNs". They could indeed, but SNT itself is not done in this paper. It seems to be about improving sub-national predictions of the impact of single ITN strategies, by taking into account sub-national variation in access and use duration. This is useful, and the model developed has novel aspects.
  
  Thank-you for highlighting this. We hope our updated title and response to W1.12 below help address this. Where relevant we have also framed our findings in relation to the World Health Organization’s Subnational tailoring of malaria strategies and interventions: refence manual which was published following our original submission; examples of this are highlighted in our response above to W1.2.
  
  (W1.4) Individual countries may have records on when nets were distributed to the regions rather than needing to use the annual country number of nets together with the DHS data. It could be helpful to say what the analysis steps would be in that case.
  
  We have now added the following text of appendix 3.2 to clarify how the methodology could be adapted:
  
  “In contexts where national malaria programmes or other stakeholders have knowledge of the timings of mass campaigns (i.e. when there is no uncertainty in ɸ<sub>ij</sub>), the methodology can be adapted by deterministically evaluating the time since the last campaign (equation S18) for each time point.”
  
  (W1.5) There were several assumptions that needed to be made in building the model. There is some validation of the timing of the distributions (L633 "verified where possible through discussion with interested parties nationally and internationally") and the fit of estimated access and use to survey data, and agreement between predictions of prevalence and MAP estimates. It would be helpful to say which assumptions are important for the results (and would be key knowledge gaps) and which would not make a difference. It might be possible to validate the net timing model using a country where net distributions are known reasonably well.
  
  Thank-you for raising this. We acknowledge that to investigate which assumptions are less likely to make a meaningful difference, we would ideally have conducted a full sensitivity analysis on these. This however would be challenging, since many of these are structural assumptions rather than numerical ones (for example, the assumption of an exponential decay in use and access) which would require the entire methodology to be adapted to conduct a sensitivity analysis. We did validate our estimated campaign timings against some known subnational campaign timings for Senegal. However, we could not source data on when all campaigns were conducted for all regions of Senegal to the nearest month to be able to conduct validation against this. We were also not able to source other use and access data from separate data sources to the DHS to be able to validate our discrete-time models of historical use and access. PfPR2-10 estimates are however fitted to equivalent MAP estimates. These were validated against DHS estimates of PfPR6-59mo, which were not used at any stage to fit our models. We have made slight changes to the original wording in relation to this at the end of appendix 5.2.
  
  (W1.6) What was assumed about what happens to old nets after a mass campaign was not clear. This assumption is likely to affect the predictions of access for the biennial distributions.
  
  To generate our initial estimates of the mean duration of use and retention time with our hierarchical model, we assume nets are only distributed to individuals who do not already have ITNs (appendix 2). This initial step is necessary for our methodology, but is relaxed later under our discrete-time model where we assume ITNs are distributed at random such that individuals with an ITN are equally likely to receive a new ITN (and replace their existing one) following a mass campaign (appendix 4). Much of the aforementioned sections has been rewritten and we hope this is now clearer.
  
  (W1.7) L312 and elsewhere: That use given access declines with net age is plausible. However, I wondered if this could be partly a consequence of the assumptions in the model (eg the two exponential decays for access and use, the possible assumption that new nets displace the current ones when there is a mass campaign).
  
  Declining use given access as nets age is not affected by model assumptions. Due to being fitted independently of each other, there are no constraints that would prevent a faster decay in access than use. Had the data supported this, this would have led to use given access increasing over time since the last campaign. The data did not support this. Further clarification that use and access are fitted independently of each other is has now been provided in the following text:
  
  “All subsequent analyses described are conducted independently for use and access”
  
  (W1.8) The Methods section on Estimating historical use and access seemed to be aimed at readers familiar with formulae, but I think it could lose other interested readers. It could be useful to explain a little more about what is happening at each step and also why.
  
  Thank-you for highlighting this. We have re-written this section in the main manuscript, now named ‘Historical use, access and retention times’, where we now only highlight key equations and provide a high-level overview of the methodological steps. We have sought to provide clearer explanations here behind the rationale for each step to ensure maximum accessibility for interested readers. The original wording was used as a basis for the newly provided series of appendices which provide further technical detail; this wording has also been heavily re-drafted to improve clarity of each step.
  
  (W1.9) The model was fitted to MAP estimates of PfPR2-10, which themselves come from a model. It may be that there is different uncertainty in the MAP estimates for different regions. I couldn't see this on the graph, but maybe the uncertainty is small. Was this taken into account in the fitting?
  
  We only used median MAP estimates of PfPR2-10 to calibrate the baseline EIR for each region in our model. We have clarified our rationale in appendix 5.2:
  
  “Since the relationship between baseline EIR and PfPR2-10 here is specific to malaria simulation, MAP uncertainty estimates were not propagated through to our estimates in baseline EIR since these would not faithfully represent its true uncertainty.”
  
  (W1.10) Was uncertainty from each estimated component integrated into the other components?
  
  Thank-you for highlighting this as this indicates we had failed to clearly indicate this. To confirm, we propagate uncertainty in each component through to our estimates of cases averted. New text has been provided to clarify this in the following text:
  
  “Region-specific uncertainty in ITN efficacy, use, retention, and the relative contributions of continuous and campaign channels is therefore propagated through to our estimates of cases averted.”
  
  Further details are also provided in the preceding text of the same paragraph. The central 95% credible intervals of cases averted shown in figures 5.C and 6 and associated figure supplements are reflective of this uncertainty.
  
  (W1.11) Eyeballing Figure 2 (Burkina Faso), there is a general pattern of decline in all the regions, some differences between the regions and some differences in how well the model fits between the regions. If possible, it could be helpful to say how much better the fit was when using regionspecific compared to countrywide parameter values for access and use, and how different the results would be.
  
  In the “Universal coverage: was it achievable under triennial mass campaigns” results section, we have now provided further emphasis that the observed decrease from DHS data may be driven by surveys being conducted progressively later in relation to the last campaign:
  
  “The observed decrease in use and access across many regions in Burkina Faso may therefore be a by-product of DHS surveys being conducted at progressively later dates relative to the most recent campaign; this does not necessarily indicate an underlying trend in decreasing use or access over longer timescales.”
  
  In the case of Burkina Faso (figure 2.A), aside from months when very small numbers of individuals were surveys where either 0% or 100% use or access was reported, no other data lie outside our 95% credible interval for any region.
  
  We are unable to generate comparisons with countrywide parameters as these are not generated when fitting our discrete-time model, even though they are a by-product of the initial hierarchical model used to generate initial estimates of region-specific ITN retention, which was a necessary methodological step. We hope the extensive revision of the text in the methods and appendices helps to improve the clarity on this. Where national estimates are provided, these are population-weighted means of the subnational median posterior estimates. New text is included in appendix 1 to clarify this:
  
  “National and continental values are reported as population-weighted summaries of the median subnational estimates generated from the discrete-time models”
  
  (W1.12) The question of moving from a campaign every three to every two years may not be the most pertinent question in the current funding landscape. I realise that a paper is in development for a long time, but it would be helpful to comment on what else the model could be used for when fewer rather than more nets are likely to be available.
  
  We acknowledge the funding landscape has changed substantially, but we still believe this work has important implications in the current context. We have emphasised this further in the following text:
  
  “If budget constraints necessitate the deprioritisation of campaigns, our results highlight that this should be avoided, if possible, in regions with moderate to high transmission intensity, particularly those with mean annual incidence exceeding 100– 150 clinical cases per 1,000 people. Shortening campaign intervals from three to two years in moderate- and high-transmission regions is projected to avert more cases than the additional cases that may arise from ceasing campaigns in some lower-transmission settings. Additionally, although pyrethroid–chlorfenapyr ITNs are more costly, the additional cases projected to be averted by them relative to pyrethroid-only and pyrethroid–PBO ITNs are substantial. In certain national contexts it may be more cost-effective for biennial pyrethroid-chlorfenapyr campaigns to be conducted in fewer subnational regions even under reduced budgets. However, more thorough economic analyses will be needed to understand this fully. Moreover, as ITNs remain one of the most cost-effective malaria control interventions, improving the impact of them could still be more cost-effective than the introduction of new untested interventions (Topazian et al., 2023; Schmit et al., 2024).”
  
  We have also related some of our findings to the WHO Subnational Tailoring Reference Manual (as highlighted in W1.2), which we hope better relates our findings to the current context.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors design a custom Bayesian model to estimate the probabilities of access, use and use given access of insecticide-treated nets in six African countries, providing sub-national estimates and inferring the average duration of ITN use and access. An individual-based model was employed to simulate malaria epidemics and estimate the effectiveness of different ITN distribution strategies. The study finds that the mean probability of use or access did not reach 80% (a universal coverage formely targeted by WHO) for any of the regions, even for biennial campaigns, demonstrates that switching from triennial to biennial distribution campaigns increases population use by 7.9%, and evaluates the impact of employing more efficient ITNs on P. falciparum prevalence.
  
  Strengths:
  
  The authors developed a data-driven model that accounts for data collection imperfections and sources of uncertainty while differentiating between ITN use and access. They developed a methodology to infer the timing of a mass campaign from publicly available data instead of assuming fixed dates. The probability of use given access allows for determining the regions where ITN distribution is least effective. This work can help better inform future interventions by identifying regions where increasing mass campaign frequency or employing better ITNs are most effective. Finally, in addition to insights on ITN access and use for the six countries analyzed, the paper contributes a methodological framework that can likely be extended to other countries.
  
  Weaknesses:
  
  Since the models employed are rather complex, the description of the methodology may be hard to follow for most readers. In addition, the models assume many hypotheses, including:
  
  (W2.1) Exponential decay of ITN use/access.
  
  We do acknowledge different modelling studies have typically assumed either an exponential decay or an “S-shaped” smooth-compact loss function, with many of these studies having been validated against cluster-randomised trial data for both functional forms. We believe the ITN age distribution data across the DHS surveys inspected provides reasonable evidence to support the use of an Exponential decay function here. We have now included a proof (appendix 2.1) demonstrating an exponentially distributed ITN age distribution will be yielded for an exponential decay function with the same rate parameter; this is true under periodic ITN distribution and becomes an approximation for a finite number of surveys. We now also included additional text (appendix 2.2) highlighting the empirical ITN age distributions appear to support our exponential decay assumption.
  
  (W2.2) The decay rates for the probability of the ITN repelling and killing a mosquito are the same.
  
  Although the same decay rate parameter (\gamma_N) is present in our expressions for the probability of repellency and mortality (equations (53) and (54)), the half-life of the latter is shorter, since repellency is assumed to decay towards a constant value. These structural forms are not unique to this paper but are shared among all malaria simulation-based studies with ITN interventions. This decay rate parameter has been estimated in previous studies (Sherrard-Smith et al., 2022; Churcher et al., 2024), and we carry through uncertainty estimates from those previous studies into the work presented here; additional text has been added to clarify this:
  
  “Uncertainty in ITN repellency and mortality parameters (equation (53) and (54)) is also propagated forward to this study by simulating random draws from previous posterior distributions (Sherrard-Smith et al., 2022; Churcher et al., 2024) across each distribution event and realisation.”
  
  (W2.3) Given a time instant, all individuals in the same administrative unit and have the same probability of using a net;
  
  Our discrete-time model estimates the proportion of the population with use and access at each time instant. We purposefully do not conflate this with the probability of use and access, which can vary between individuals within the same subnational unit of analysis (urban and rural regions of each administrative-one area). We are grateful this point has been raised as it indicates we had not communicated this sufficiently clearly before. We hope the extensive re-draft of the ‘Historical use, access and retention times’ methods section has helped address this, in particular in the following text preceeding equation (7):
  
  “We do not assume the probability of access is the same for all individuals in a region at a given point in time. Instead, we assume the probability any given individual has access to an ITN at time t<sub>j</sub> can be described by a Beta distribution”
  
  (W2.4) ITN use/access decay models do not depend on the distribution strategy (e.g. bienal vs trienal distribution).
  
  We may not have fully understood this point, but in terms of our historical models of use and access, assumptions are not imposed on the frequency of previous campaigns. Instead, historical campaign timings are estimated from data from DHS surveys and the AMP Net Mapping Project (now detailed in appendix 3.1); historical estimated intervals could be either two or three years (or indeed any interval) as informed by this data. In terms of the duration of use and retention time, these are estimates how long a net would continue to be used, or provide access, if an individual were not to replace it at earlier date; these estimates are therefore independent of campaign intervals, and we have now added addition text to provide additional clarity:
  
  “However, throughout this study, the durations of use and retention time are always estimates of how long an individual continues to use or have access to a net in the absence of future replacement; estimates of these are therefore reflective of behaviour or ITN durability and not distribution patterns themselves.”
  
  We do acknowledge under our approach, use immediately following a campaign is agnostic of campaign frequency; however, given an absence of data on how use changes following a switch from triennial to biennial campaigns, we believe this was a reasonably conservative assumption. Further confirmation is now provided in the following text, with additional preceding context:
  
  “Future campaigns, whether conducted every two or three years, are therefore assumed to achieve a consistent initial level of use.”
  
  (W2.5) The Bayesian model assumes some narrow prior distributions.
  
  Thank-you for highlighting this. We acknowledge the need for further justification for the choice of priors. We have provided this in depth for the hierarchical model of the mean duration of use and access (in appendix 2.2). Further justification for the choice of priors for the discrete-time model are also now provided in appendix 4.2).
  
  The impact of these hypotheses on the estimated parameters is not explored in the paper, and no sensitivity analyses are performed, although some limitations are discussed.
  
  We fully acknowledge we had not conducted sensitivity analyses for many of our assumptions, and we have now tried to provide better justification for our assumptions. The assumptions most likely to influence inference are structural components of the modelling framework rather than scalar parameters that can be varied independently in a conventional sensitivity analysis. Many of the assumptions highlighted above are structural, such as the assumption of an exponential decay (W2.1). In the case of our assumption of exponential decay, multiple elements of the methodology are restricted by this (for example, when correcting for biases that arise from nets being lost between campaigns and survey times when estimating the timing of campaigns in appendix 3.1). Investigating the sensitivity of this assumption over an assumed smooth compact function would require extensive adaptation of the methodology that would be beyond the scope of this paper. Some other assumptions, such the assumption of the same decay rate parameter for repellency and mortality (W2.2) have been estimated in the previous studies referenced and have been validated against cluster-randomised, controlled trials. We nevertheless recognise our justification of some assumptions could have been expanded upon previously, and we hope the changes highlighted above go towards addressing this.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (R1.1) I looked for the reference WHO 2024b for the recent optimal allocation guideline, but there were just three WHO 2024 references in the bibliography. In addition, what exactly the 80% rule applies to is not clear - this could be explained so it is clearer what result to compare to it (or explain that the rule itself is not clear).
  
  We have used the eLife LaTeX/BibTex template for citations throughout and acknowledge this doesn’t show letter suffixes in the reference list for multiple author-year entries. We unsure of how to address this given this is generated by the official template, though we note that when citations are clicked on in the document, the relevant citation is then shown at the top of the page on the web version.
  
  (R1.2) L24 'estimated', but this seems more like a prediction. The words 'estimated' and 'predicted' should be carefully used throughout when combining statistical and mechanistic modelling.
  
  This has now been changed.
  
  (R1.3) The point estimates should always have measures of uncertainty.
  
  The rationale for the omission of credible intervals for some point estimates has now been clarified in the manuscript (appendix 1). The following text has been added:
  
  “Additionally, in relation to uncertainty estimates, credible intervals are shown for all subnational quantities that are directly estimated in our models. National and continental values are reported as population-weighted summaries of the median subnational estimates generated from the discrete-time models (appendix 4) and therefore do not correspond to explicitly estimated model parameters, so credible intervals are not shown for these aggregated estimates.”
  
  (R1.4) It would be helpful to justify the choice of ADM1 as the geographical unit.
  
  We have clarified the rationale for this on the following text:
  
  “Here, (subnational) regions are defined as the first administrative unit below the country level and are further divided into rural and urban areas to align with DHS stratification”
  
  (R1.5) The terminology was slightly confusing: in some places, it sounded as if regions were the sub-national regions, in others as if they were different things (eg L74, L105). L45 'and' seems odd here.
  
  ‘Region’ is used interchangeably with ‘subnational region’ at points in the paper to aid the flow of the text. We hope the use of paratheses around (subnational) in the updated text quoted above (and on the following text) helps provide clarity:
  
  “here, the units of analysis are consistently referred to as (subnational) regions”
  
  (R1.6) Spurious accuracy in some estimates, e.g. L52.
  
  This was a result cited from Bertozzi-Villa et al. (2021) for which uncertainty estimates were not available. We hope the response to R1.3 above helps clarify the rationale for omitting credible intervals for some estimates generated here.
  
  (R1.7) L68 'lose' instead of 'loose'.
  
  Now corrected.
  
  (R1.8) L534. I suspect that the model was actually fitted in Stan via the R interface rstan.
  
  Language adjusted accordingly.
  
  (R1.9) L633 'through' rather than 'though'.
  
  This section has been heavily redrafted and we have checked for typos.
  
  Reviewer #2 (Recommendations for the authors):
  
  The paper is well-written and presents an important contribution to better aid interventions. The proposed models are reasonable, but because of their complexity, even readers who work with epidemic modelling might have issues understanding the methodology.
  
  We thank the reviewer for highlighting that the methodology may be difficult to follow. The methods section has now been substantially rewritten to provide a clearer conceptual description of the modelling framework, with detailed model specification and derivations moved to the appendices. We hope this restructuring will allow readers to follow the modelling approach at a high level in the main text with technical details contained in the appendices.
  
  To improve the clarity of the methods section, I suggest:
  
  (R2.1) Include a list of symbols with the meaning of each variable defined in the text.
  
  Definitions for symbols are now also shown in appendix 1 – tables 1-5.
  
  (R2.2) Include a centralized full description of each model, clearly stating the priors and likelihood (similarly to a Stan code).
  
  There are two models that are fitted with Stan (the hierarchical retention model and discrete-time use/access model). To improve clarity for the hierarchical model, priors are now presented in a single block (equations 11 – 17) in appendix 2.2, with the likelihood (equation 18). For the discrete-time model, we have split the presentation of the priors (equations 37 – 42) and the likelihood expressions (equations 43 – 45) into different subsections (respectively appendices 4.2 and 4.3).
  
  (R2.3) If needed, include additional data preprocessing in the form of an algorithm.
  
  Although we have not included an algorithm outlining the preprocessing steps, we have ensured sufficient detail has been provided to facilitate replicability. For example, in appendix 1, we now outline how use and access are inferred from DHS data:
  
  “ITN use is inferred from DHS data (ICF, 2025) on whether individuals slept under an ITN the previous night, while all individuals who used an ITN are assumed to have access; when fewer than two individuals used an ITN, the ITN is assumed to be able to provide access at random to up to two individuals in a household.”
  
  (R2.4) Mention the main hypotheses and limitations of the model in the main text.
  
  We have ensured key assumptions of the model are stated in the re-written ‘Historical use, access and retention times’ methods subsection; for example, in the following text:
  
  “Due to the sparsity and irregularity of DHS and MIS surveys, we were unable to investigate seasonal fluctuations in either access or use; we therefore assume that nets provide access or are used continuously over some period of time.”
  
  (R2.5) Including a flowchart or diagram that provides an overview of the proposed framework could be helpful.
  
  We have now included a flowchart of methodological steps in appendix 1 – figure 1.
  
  (R2.6) Line 89: Define NMP before presenting the acronym.
  
  We have ensured this is defined in the first instance on line 39.
  
  (R2.7) Equation (1): Explain why you chose the Exponential distribution (e.g. constant hazard), as this is one of the main hypotheses of the model.
  
  As highlighted in our response to W2.1, we have now included justification of this assumption in the final paragraph of appendix 2.2.
  
  (R2.8) Equation (2): Although Equation (2) passes a clear message of how alpha_i^x is distributed, I wonder if it is mathematically correct to express the limit this way, since the argument of the limit is a random variable. Maybe the limit should be applied to gamma_i^x instead.
  
  Thank-you for highlighting this. We acknowledge the limit behaviour was expressed in a short-hand manner that is not strictly mathematically correct. Indeed, the limit should be applied to the decay rate parameter gamma (now shown in equation 10). In appendix 2.1, we have now provided a proof demonstrating the rate parameter of the pooled ITN age distribution should tend to the same decay rate as the assumed exponential loss function.
  
  (R2.9) I think the difference between pho_i^x (Equation (1)) and alpha_i^x (Equation (2)) is not very clear in the text.
  
  In the context of access, rho_{i(l)} and alpha_{i(l)} are respectively the duration an ITN l is retained for and its age at the time of a survey. We hope the redrafted appendices make this clearer, in addition to the inclusion of the new parameter tables in appendix 1.
  
  (R2.10) Line 479: Typo (and or).
  
  Updated wording is now contained in appendix 2.
  
  (R2.11) Line 711: Typo (The limit is equal to infinity).
  
  This has now been corrected.
  
  (R2.12) Equation (15): I could not understand this equation. What is rho(s) and rho(s \in I), where I is one of the intervals mentioned in this equation?
  
  Rho(tau_ik) was introduced as simplified notation for the probability density of the timing of campaign k in region i (tau_ik) but we acknowledge this was not explained clearly. We also acknowledge this equation presented a lot of concepts at once. The equation attempted to describe the probability density of the last campaign in region i relative to time t_j, denoted phi_ij. We no longer make use of this previously notation (rho) for the probability density. This equation has been updated to equation (30), with incremental explanation of its construction now provided on lines in appendix 3.2.
  
  (R2.13) Line 642: What is t?
  
  The use of $t_j \ni t$ was previously used to indicate that the discrete time point t_j lies within continuous time t. We acknowledge this was a non-standard use of notation and was not clearly explained. This section (now in appendix 4) has been rewritten without this notation. The use of t and t_j to denote continuous time and discrete time points respectively is now defined in the core notation table (appendix 1 – table 1).
  
  (R2.14) The proposed model has narrow hyperhyperpriors because of convergence issues. Are the estimated parameters sensitive to the choice of hyperhyperpriors?
  
  We acknowledge limited justification was previously provided for the choice of hyperhyperpriors. We have now provided additional justification within appendix 2.2.
  
  (R2.15) Since the proposed Bayesian models are relatively complex, it might be useful to provide convergence diagnostic plots in the supplement.
  
  Convergence diagnostics were inspected using the ShinyStan packagxe. Chains showed satisfactory convergence based on standard diagnostics. We have not included diagnostic plots due to the large number of parameters in the fitted models. Under the hierarchical model (appendix 2) for ITN use, 146 region-specific parameters (one for each region), 12 country-level hyperparameters (two for each country), and four hyperhyperparameters were estimated. Under the discrete-time model (appendix 4), a further 876 parameters (six for each region) were estimated. In total, 1,038 parameters were fitted for the ITN use models. The same number of parameters were estimated for the ITN access models, giving a total of 2,076 estimated parameters.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2025.08.27.25334550v2
www.biorxiv.org www.biorxiv.org

Systematic Analysis of Network-driven Adaptive Resistance to CDK4/6 and Estrogen Receptor Inhibition using Meta-Dynamic Network Modelling

1
1. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the current reviews.
  
  eLife Assessment
  
  This manuscript presents a useful computational framework for systematically characterising how heterogeneity in initial conditions or biophysical parameters shapes the dynamic behaviour of protein signalling networks, with potential relevance to understanding adaptive drug resistance. While the approach represents a significant methodological contribution, the extent to which its conclusions are biologically informative remains debated, as the model is not qualitatively or quantitatively validated against experimental data. As a result, the strength of evidence supporting the mechanistic claims is viewed as incomplete.
  
  We thank the editors and reviewers for their further assessment of the manuscript. The revised public review raises several issues that overlap with points addressed in our previous response, particularly around the intended scope of MDN modelling, the interpretation of parameter sampling, and the qualitative nature of the experimental comparison. In this final revision, we have made targeted clarifications in the main text, Methods, figure legends, and Supplementary Information to make these points more explicit for readers. We emphasise that the present work is intended as a theoretical and exploratory framework for mapping the qualitative dynamic behaviours accessible to a fixed network topology, rather than as a quantitatively calibrated model of a specific tumour or cell line.
  
  Joint Public Review:
  
  In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.
  
  The authors study the Early Cell Cycle (ECC) network as a proof of concept, focusing on pathways involving PI3K, EGFR, and CDK4/6 with the aim of identifying mechanisms that may underlie resistance to CDK4/6 inhibition in cancer. The biochemical reaction model comprises 50 state variables and 94 kinetic parameters, implemented in SBML and simulated in Matlab. A central component of the study is the generation of large ensembles of model instances, including 100,000 randomly sampled parameter sets intended to represent intra-tumour heterogeneity. On the basis of these simulations, the authors conclude that heterogeneity in kinetic rate parameters plays a stronger role in driving adaptive resistance than variation in baseline protein expression levels, and that resistance emerges as a network-level property rather than from individual components alone. The revised manuscript provides additional clarification regarding aspects of the simulation and filtering procedures and frames the comparison with experimental data as qualitative. Nonetheless, the study is best interpreted as a theoretical and exploratory analysis of the model's behaviour under heterogeneous conditions. Consequently, questions remain regarding the biological grounding of the sampled parameter regimes and the extent to which the reported frequencies of resistance-associated behaviours can be directly interpreted in physiological terms.
  
  While the authors propose a potentially useful computational framework to explore how heterogeneity shapes dynamic responses to drug perturbation, a number of important conceptual and methodological concerns remain to be addressed:
  
  (1) The sampling of kinetic parameters constitutes the backbone of the manuscript, yet important concerns remain regarding its biological grounding and transparency. Although the revised version provides additional clarification on the exploration of "model instances", it is still not sufficiently clear how parameter values and initial conditions are generated, nor how the chosen ranges relate to biological measurements. The kinetic rates are sampled over broad intervals without explicit justification in terms of experimentally measured bounds or inferred distributions. As a consequence, it remains uncertain whether the ensemble of simulated behaviours reflects physiologically plausible cellular regimes or primarily the properties of the assumed parameter space. In this context, the large-scale sampling (100,000 parameter sets) resembles a Monte Carlo exploration of the model rather than a biologically calibrated representation of tumour heterogeneity.
  
  Parameters were sampled from a uniform distribution spanning values 10-5 to 104. Conserved totals were sampled from the range 100 to 104. Each of these is roughly in line with measured spans of orders of magnitude for parameter values and protein expression (REF). Again, we would like to point out that we intentionally kept our ranges broad, and sampled from uniform distributions, to assess upper bounds of heterogeneity, not biologically informed heterogeneity. We also comment on the likely effects of expanding these ranges in our response to (26) in our original rebuttal.
  
  Main text has been updated to include this information. LINES: 175-179
  
  Furthermore, the adequacy of the sampling strategy in such a high-dimensional space (94 free parameters) remains open to question. In the absence of biologically informed constraints, the combinatorial space of possible parameter configurations is vast, and it is unclear to what extent the sampled ensembles can be considered representative. This issue is particularly relevant because the manuscript interprets the frequency of resistance-associated behaviours as indicative of their likelihood.
  
  This was addressed extensively in our original rebuttal, response to point (3). A new section was added to the supplementary text, along with new figures demonstrating the validity of the claims.
  
  The validation presented in Figure 7 does not fully resolve these concerns. The comparison with experimental data is qualitative, and the simulations are performed in arbitrary time units, which complicates direct interpretation alongside time-resolved experimental measurements. Moreover, certain qualitative discrepancies between simulated and experimental trends (e.g., persistent versus decreasing CDK4/6 activity) are not thoroughly discussed. As this figure represents the primary empirical reference point in the manuscript, the extent to which the model captures experimentally observed dynamics remains uncertain.
  
  This was addressed in the original rebuttal, response to point (12). The actual time units are arbitrary in the sense that they are determined by the units of the parameters in our model. It is important to understand that the meta-dynamic analysis is not calibrated to data and so the meaning of time units is far less important than the distribution of behaviours. We have updated the figure to reflect the arbitrary units of time in our simulations.
  
  Finally, aspects of presentation continue to limit transparency. Parameter ranges are described at different points in the manuscript but are not consolidated clearly in the Methods, and the definition of initial conditions remains ambiguous - particularly whether these correspond to conserved quantities or to the dynamic variables used to initialise simulations. In addition, the exact number of model instances underlying specific analyses and figures is not always explicit. Greater clarity on these issues is essential for assessing reproducibility and for interpreting the quantitative claims of the study.
  
  (2) A central conclusion of the manuscript is that heterogeneity in protein-protein interaction kinetics is a stronger driver of adaptive resistance than heterogeneity in protein expression levels. To assess the latter, the authors fix a nominal set of kinetic parameters and generate 100,000 random initial concentrations for the 50 model species. However, according to the simulation protocol described in the manuscript, each trajectory includes three phases: (i) simulation under starvation conditions to equilibrium, (ii) mitogenic stimulation to a second ("fed") equilibrium, and (iii) application of drug treatment. The equilibrium concentrations reached in phases (i) and (ii) are determined by the kinetic parameters of the model and are independent of the initial concentrations, provided the system converges to a stable steady state. In dynamical systems terms, stable equilibria are defined by the parameter set and attract all initial conditions within their basin of attraction. Since the kinetic parameters are fixed in this experiment, the pre-treatment equilibrium that serves as the starting point for drug application should likewise be fixed. Under these conditions, it is therefore not unexpected that sampling a large number of initial concentrations has limited influence on the treated dynamics.
  
  This raises conceptual questions about the interpretation of the comparison between kinetic and expression heterogeneity. If the system converges to a unique stable steady state prior to treatment, then variability in initial concentrations does not propagate into variability in drug response, and the observed dominance of kinetic heterogeneity may partly reflect this structural property of the model rather than a biological principle. Clarification is needed regarding whether multiple steady states exist under the nominal parameter set, and if so, how basins of attraction are explored.
  
  More broadly, it remains unclear why initial protein concentrations can be sampled independently of the kinetic parameters. In biological systems, steady-state expression levels are typically determined by the underlying kinetic rates. A more consistent approach might require constraining initial concentrations to correspond to equilibrium states of the chosen parameter set, thereby introducing relationships between at least some of the 50 initial conditions and the 94 kinetic parameters. Finally, the manuscript employs a non-standard terminology regarding "initial conditions," which may further obscure interpretation of these results and would benefit from clarification.
  
  This was addressed in the original rebuttal, response to point (4). Text was modified to clarify what was meant by initial conditions to clarify that this meant the conserved total for the protein species. A supplementary figure (supp. fig. 4) was added to demonstrate that changes to the conserved totals of protein species does, in fact, shift the dynamics and steady state equilibria of protein species. Text was updated throughout the paper to ensure that our definition of ‘initial conditions’ was consistent throughout the text.
  
  (3) The technical implementation of the modelling and simulation framework remains difficult to evaluate due to insufficient methodological detail. Although the authors state that kinetic parameters are randomly sampled, the manuscript does not specify the distributions from which parameters are drawn, nor whether potential correlations between parameters are considered or explicitly ignored. Without this information, it is not possible to assess how implicit modelling assumptions shape the ensemble of simulated behaviours. Given that the conclusions rely on frequency-based interpretations across sampled parameter sets, greater transparency regarding the sampling procedure is essential.
  
  Updated the main text to clarify random sampling from a log transformed uniform distribution. LINES: 175-179
  
  A further concern relates to the parameter filtering step. The authors report that the "vast majority" of sampled parameter sets produced systems that were "too stiff," and that these were excluded on the grounds that stiff dynamics are not biologically plausible. However, the manuscript does not clearly define how stiffness is assessed, nor why stiffness is interpreted as biologically unrealistic rather than as a numerical property of the formulation. In standard practice, stiff systems are typically handled using appropriate implicit solvers rather than being discarded. Similarly, parameter sets that produce negative state values are excluded, yet such behaviour may arise from numerical artefacts rather than from intrinsic model inconsistency. The rationale for excluding these parameter sets, rather than adapting the numerical scheme, is not sufficiently justified.
  
  The reported rejection rate - approximately 90% of sampled parameter sets - is substantial and raises questions regarding the interplay between model structure, parameter ranges, and numerical methods. As currently described, the filtering step appears to select parameter sets based primarily on computational tractability rather than on experimentally motivated biological criteria. The manuscript would be strengthened by clarifying whether the retained parameter sets are representative of biologically meaningful regimes, and by distinguishing clearly between exclusions based on biological plausibility and those arising from numerical considerations.
  
  This was extensively addressed in the original rebuttal, response to points (6) and (7). Main text was updated to clarify that a solver specific for stiff systems was used. Furthermore, we addressed this issue but consequential analysis revealed that lack of drug response and not achieving steady state in the simulated time period now accounted for the majority of filtering. This had no effect on the distributions of behaviours identified in our analyses. Main text was updated to reflect these changes. Rejection rate was explicitly discussed in main text.
  
  Finally, important aspects of the simulation protocol require clarification. The model is simulated under "fasted" and "fed" conditions until equilibrium is reached, yet the criterion used to determine convergence is not specified. It would be important to describe how equilibrium is assessed (e.g., based on the norm of the time derivatives). Additionally, it remains unclear whether the mitogenic stimulus applied in the "fed" phase is assumed to be constant over time and, if so, how this assumption relates to biological experimental conditions. Greater detail on these implementation choices is necessary to ensure interpretability and reproducibility.
  
  This was addressed in the original rebuttal, response to point (8). Clarification about simulations were added to main text, including explicitly stating that mitogenic and drug inputs were continuous stepwise functions and how steady state equilibrium was defined/calculated.
  
  (4) The manuscript states that the modelling conclusions are strongly supported by existing literature; however, the validation presented does not fully substantiate this claim. As noted above, the comparison with CDK2 and CDK4/6 experimental data remains qualitative, and the use of arbitrary simulation time units complicates interpretation of temporal agreement. The extent to which the model quantitatively or mechanistically recapitulates experimentally observed dynamics therefore remains uncertain.
  
  This was addressed in the original rebuttal, response to points (13) and (14). Wording was changed to remove the suggestion of strong evidence and the tone was shifted to reflect reasonable qualitative support for our analysis, not strong evidence.
  
  The claim that the model reproduces known resistance mechanisms is also difficult to assess in light of Figure S10, where a large fraction of network nodes (~80%) appear implicated in resistance under some conditions. If most components of the network can, in at least some parameter regimes, be associated with resistance phenotypes, the resulting lack of selectivity weakens the strength of model-based validation. It becomes challenging to distinguish specific mechanistic insights from generic consequences of network connectivity.
  
  In addition, the Supplementary Information notes that certain components of the mitogenic and cell-cycle pathways were abstracted or excluded in order to maintain computational tractability. While such abstraction is understandable in a large ODE framework, it raises interpretative questions. Proteins identified as potential resistance drivers within the model may, in some cases, represent aggregated or simplified pathway effects. Clarifying in the main text how such abstractions may influence the attribution of resistance mechanisms would strengthen the biological interpretation of the results.
  
  This was addressed in the original rebuttal, response to points (15). The discussion was significantly revised to reflect our reasoning with respect to our conclusions. We completely understand that more work could be done to verify our claims, however, our intention is to demonstrate the generalised relationship between network heterogeneity and drug resistance, not to predict patient-specific resistance mechanisms.
  
  Drug inhibition is central to the manuscript's conclusions. The revised version clarifies that inhibition is implemented as a fixed fractional modification of specific kinetic rate laws. This abstraction is appropriate for exploring network-level responses, but it represents a stylised perturbation rather than a pharmacologically calibrated model of drug action. For full interpretability and reproducibility, the mathematical form of the modified rate laws, as well as the timing of inhibition relative to network equilibration, should be specified unambiguously. The biological implications of the findings depend critically on understanding this modelling choice.
  
  All equations were included in the supplementary model files, including typeset ODEs, as requested by the reviewers. R15 and R27 contain the relevant equations, which specify the exact implementation of the drug inhibition. Number of time units per simulation phase now included in main text. LINES: 166 – 168
  
  The one-at-a-time perturbation analysis presented in Figure 5 provides an interpretable ranking of first-order control points across the ensemble and offers mechanistic insight into primary sensitivities of the network. However, many targeted therapies act on multiple components, and resistance frequently arises through combinatorial mechanisms. The reported rankings should therefore be interpreted as identifying primary influences under isolated perturbations, rather than as a comprehensive account of multi-target drug behaviour.
  
  Overall, the manuscript succeeds in presenting a conceptual and exploratory framework for analysing how signalling network topology can shape the qualitative landscape of adaptive responses under heterogeneous kinetic conditions. Its principal contribution lies in establishing a systematic platform for large-scale in silico exploration. At the same time, the current limitations in biological calibration, parameter grounding, and validation constrain the extent to which the conclusions can be interpreted as predictive or quantitatively representative of specific tumour contexts. Addressing these issues would further strengthen the connection between the theoretical landscape described here and experimentally observed resistance dynamics.
  
  Joint Recommendations for the authors:
  
  (1) Supplementary Figure S4 is not sufficiently explained in its current form. The structure of the figure, the meaning of its colour coding, and the intended interpretation are not clearly described, making it difficult for readers to extract the key message without substantial inference. Given that the manuscript relies heavily on large-scale ensemble analyses, clear visual communication is essential. A more detailed legend, explicit definition of axes and colour scales, and improved visual labelling would substantially enhance clarity, accessibility, and reproducibility.
  
  Supp. Fig. 4 legend updated with additional detail. LINES: Supp. Text. 256 - 263
  
  (2) The approximately 90% rejection rate of sampled parameter sets should be reported explicitly in the main text of the manuscript rather than only in the Supplementary Information. Given the central role of large-scale parameter sampling in the study, this level of exclusion is a critical aspect of the modelling workflow and directly affects the interpretation of robustness and representativeness. Clear disclosure in the main text would allow readers to properly evaluate the effective size of the analysed ensemble and the implications of the filtering procedure for the generality of the conclusions.
  
  This was explicitly addressed in the original rebuttal.
  
  (3) The model would benefit from quantitative validation against experimental data. In Figure 7C, the authors note in the response letter that the simulations are performed in arbitrary time units. However, the figure itself labels the time axis in hours, which may lead readers to infer a direct quantitative correspondence between simulated and experimental time courses. If the simulations are not calibrated to real time, this labelling is potentially misleading and should be corrected. Either the model should be explicitly time-calibrated and quantitatively compared to experimental measurements, or the figure should clearly indicate that the time axis is dimensionless. Clarifying this point is essential to avoid overinterpretation of the agreement between model and data.
  
  Label updated.
  
  The following is the authors’ response to the original reviews.
  
  Joint Public Reviews:
  
  In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.
  
  The authors studied the Early Cell Cycle (ECC) network as a proof of concept, specifically focusing on PI3K, EGFR, and CDK4/6, with particular interest in identifying the mechanisms that cancer could potentially exploit to display drug resistance. The biochemical reaction model consists of 50 equations (state variables) with 94 kinetic parameters, described using SBML and computed in Matlab. Based on the simulations, the authors concluded the following main points: a large number of network states can facilitate resistance, the individual biophysical parameters alone are insufficient to predict resistance, and adaptive resistance is an emergent property of the network. Finally, the authors attempt to validate the model's prediction that differential core sub-networks can drive drug resistance by comparing their observations with the knock-out information available in the literature. The authors identified subnetworks potentially responsible for drug resistance through the inhibition of individual pathways. Importantly, some concerns regarding the methodology are discussed below, putting in doubt the validity of the main claims of this work.
  
  While the authors proposed a potentially useful computational approach to better understand the effect of heterogeneity in a system's dynamic response to a drug treatment (i.e., a perturbation), there are important weaknesses in the manuscript in its current form:
  
  (1) It is unclear how the random parameter sets (i.e., model instances) and initial conditions are generated, and how this choice biases or limits the general conclusions for the case studied. Particularly, it is not evident how the kinetic rates are related to any biological data, nor if the parameter distributions used in this study have any biological relevance.<br /> (2) Related to this problem, it is not clear whether the considered 100,000 random parameter samples sufficiently explore parameter space due to the combinatorial explosion that arises from having 94 free parameters, nor 100,000 random initial conditions for a system with 50 species (variables).<br /> (3) Moreover, the authors filter out all the cases with stiff behaviour. This filtering step appears to select model parameters based on computational convenience, rather than biological plausibility.<br /> (4) Also, it is not clear how exactly the drug effect is incorporated into the model (e.g., molecular inhibition?), nor how it is evaluated in the dynamic simulations (e.g., at the beginning of the simulation?). Moreover, in a complex network, the results may differ depending on whether the inhibition is applied from the start or after the network has reached a stable state.<br /> (5) On the same line, the conclusions need to be discussed in the context of stability, particularly when evaluating the role of initial conditions. As stable steady states are determined by the model parameters, once again, the details of how the perturbation effect is evaluated on the simulation dynamics are critical to interpret the results.<br /> (6) The presented validation of the model results (Fig. 7) is only qualitative, and the interpretation is not carefully discussed in the manuscript, particularly considering the comparison between fold-change responses without specifying the baseline states.
  
  We thank the reviewers for their thoughtful and constructive comments. In response to their comments, we have undertaken a substantial revision to address all the comments, improve clarity, transparency, and robustness while preserving the paper’s core contribution: a principled, scalable framework (MDN) for mapping how molecular heterogeneity and network architecture shape adaptive drug-response dynamics. At a high level, we clarified the study design and analysis goals, tightened definitions, and added methodological detail where it most advances interpretability. Importantly, these updates leave the analytical pipelines and major conclusions unchanged.
  
  Conceptually, we now make explicit that our objective is coverage of the output space of qualitative dynamics supported by the network topology, not exhaustive enumeration of parameter space. To support this, we added a convergence analysis and clarified that “triplicates” refers to independent ensembles used to demonstrate reproducibility. We also refined how we describe and implement initial conditions (as conserved total abundances that encode expression heterogeneity) and reframed filtering as minimal numerical/feasibility checks, using rejection sampling to obtain the prespecified ensemble size. Solver choices and input modelling (constant step mitogen/drug) are now spelled out succinctly.
  
  We expanded the model specification and rationale (complete reaction list with rate laws and brief biological justifications in the Supplement) and unified terminology throughout. Figures and legends have been overhauled for readability and accuracy, with missing labels added and ordering corrected. For validation, we clarified the nature of the single-cell reporter readout, improved Figure 7’s presentation, and emphasised - consistent with our aims - that comparisons are qualitative.
  
  Finally, we have rewritten the Discussion to centre on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.
  
  We believe these revisions materially strengthen the manuscript and fully address all the reviewers’ comments. A detailed, point-by-point response follows.
  
  Joint Recommendations for the Authors:
  
  (1) It is confusing exactly what are the different sets evaluated in each cases, e.g. "generated 100,000 model instances, each with the same set of ICs but a unique set of randomly generated parameter values" (lines 299-300), "generated 100,000 model instances (in triplicate), each with the same set of 'nominal' parameter values (see supplementary Table S1), and a unique set of ICs, and repeated the analysis as performed previously" (lines 366-368), "combined the 1000 IC sets with each parameter set to create 1000 model instances" (lines 382-383), "repeated for 1000 parameter sets, allowing us to observe how frequently IC variation induced adaptive resistance independent of the chosen parameter set" (lines 386-387). A small table or just a clearer explanation is needed.
  
  In response to these comments, we have revised the main text to clarify the process of model instance generation. Specifically, we have made changes at page 7: line 297 - page 8: line 302, page 8: lines 305 - 310, page 9: lines 372-378, and page 9: line 384 – page 10: line 399 in the revised main text.
  
  We have also added a new Figure (Figure S1) to the supplementary file to allow readers to visualise the model generation process for each relevant set of experiments. Supplementary figures are referenced in the main text where appropriate.
  
  (2) The authors mentioned performing each simulation in triplicate, which is puzzling as the model is based on deterministic ODEs with fixed parameters for each simulation. Under such conditions, one would anticipate identical results from multiple simulations with the same initial conditions and fixed parameters. Perhaps the authors expect the model to exhibit chaos or aim to assess the precision of the parameter estimates through triplicate simulations. Further clarification from the authors would be valuable to comprehend the rationale behind conducting triplicate simulations in a deterministic setting.
  
  We agree that repeating deterministic ODE simulations with identical inputs would be redundant. In our study, “triplicate” referred instead to generating three independent ensembles of 100,000 unique model instances each, where model parameters (or initial conditions) were randomly resampled. These ensembles were analysed separately to assess whether the inferred meta-dynamic distributions converged robustly. Indeed, the distributions from the three replicates were nearly indistinguishable, confirming that the results are reproducible and not artefacts of a particular random draw.
  
  We have revised the main text to clarify this distinction (page 8: lines 305 - 310) and added an extended explanation for meta-dynamic behaviour convergence in the new section Error Convergence in the supplementary text (page 6: lines 184 - 210).
  
  (3) While the lack of a connection between model parameters and biological data (mentioned in the public review) may not be a fatal flaw in the manuscript, the concern about the 100,000 random samples being insufficient to explore the parameter space is valid. In a thought experiment, considering the high and low rate for each parameter and the combinatorial explosion of possibilities (2^94), the number of simulations performed (100,000) represents only an extremely small fraction of the entire parameter space (~1/10^(23)). This limitation might not accurately capture the true heterogeneity present inside a solid tumour. One potential solution is to determine biological bounds on model parameters through data fitting, which can provide more meaningful constraints for the simulations. Alternatively, increasing the number of simulations and adopting more efficient sampling techniques can enhance the coverage of possible parameter sets.
  
  We thank the reviewer for this insightful comment. We agree that the 94-dimensional parameter space is vast, and that 100,000 simulations represent only a fraction of the total combinatorial possibilities. However, the objective of our study is not to exhaustively sample the entire parameter space, but rather to sufficiently sample the ‘output space’ - that is, the complete spectrum of qualitative dynamic behaviours the network topology can generate. The key question is whether 100,000 model instances are sufficient for the distribution of these output dynamics to converge.
  
  To formally address this, we have performed a convergence analysis, which is now detailed in the new supplementary section "Error Convergence" (Supplementary text page 6: lines 184 - 210) and illustrated in Supplementary Figure S12. This analysis demonstrates that the mean squared error (MSE) between dynamic distributions from N and 2N simulations exponentially decreases as N increases, and the distribution of protein dynamics changes negligibly well before reaching 100,000 instances. Furthermore, performing the entire analysis in triplicate with independent random seeds yielded nearly identical meta-dynamic maps (average standard deviation < 0.04%), giving us high confidence that we have robustly captured the network's behavioural repertoire.
  
  We believe this convergence occurs because the system is degenerate: many distinct parameter sets within the high-dimensional space map to the same qualitative outcome (e.g., 'rebound' or 'decreasing'). Our goal was to capture the set of possible outcomes, not every unique parameter combination that leads to them.
  
  Regarding the parameter range, we intentionally chose a broad, unbiased range (10<sup>-5</sup> to 10<sup4></sup>)as a proof-of-concept to delineate the theoretical upper limit of heterogeneity the network can support, thereby capturing even rare but potentially critical resistance dynamics. We agree with the reviewer that a future direction is to constrain these ranges using biological data. Such an approach would transition from defining what is possible (the focus of this manuscript) to predicting what is probable in a specific biological context. We have added this important point to the Discussion (page 16: lines 663-679) to highlight this avenue for future work.
  
  (4) One of the manuscript's main results indicates that protein interactions play a more significant role in driving adaptive resistance than protein expression. To explore the impact of protein expression, the authors fixed a nominal parameter set and generated 100,000 initial concentrations of the 50 proteins in the ODE model. However, the simulations' equilibrium concentrations in the "starvation" and "fed" phases, which form the initial condition for the treated phase, are uniquely determined by the nominal model's kinetic parameters and not the initial conditions, which remain identical for each simulation. From a dynamical systems perspective, stable steady states are determined by the model parameters and attract all initial conditions within their basin of attraction. As a result, a random sampling of the initial conditions has a limited impact on the model dynamics. The authors' conclusion that "the ability of expression to induce resistance also seems to be dependent on the master parameter set" can be explained by this dynamical systems perspective, where the resistance state corresponds to a stable steady state determined by the master parameter set. Considering this, the evidence presented in the manuscript may not fully support the authors' conclusion regarding the importance of protein expressions relative to protein dynamics. The discrepancy might be attributed to a possible misunderstanding of this point, and further clarification from the authors could be helpful.
  
  We thank the reviewer for the thoughtful perspective. We agree that, in a monostable system with fixed kinetic parameters and fixed conserved totals, varying only the initial split among moieties (e.g., X vs pX) will not change the final steady state; trajectories converge to the same attractor. In our analysis, however, “initial conditions” predominantly refer to total protein abundances (e.g., X_tot = X + pX + complexes), used as a proxy for expression heterogeneity. These totals are invariants on the simulated timescale (no synthesis/degradation in the pre-equilibration phases), and therefore alter the value of the steady state under a given parameter set. In other words, our IC sampling mostly varies conserved totals rather than merely redistributing a fixed total; hence the equilibrium reached after the starvation/fed pre-equilibrations depends on the sampled totals and the kinetics. This can be seen in the new Supplementary Figure S4, showing that changing the ICs does shift the eventual steady state even when kinetic parameters are fixed.
  
  We have revised the text to: (1) define ICs explicitly as total abundances for multi-state species, (2) distinguish “initial split” from “conserved totals,” and (3) clarify that expression effects are context-dependent rather than universally dominant (page 4: lines 139-141 and page 10: lines 413-416)
  
  (5) Additionally, it is important to note that the random sampling of 100,000 initial concentrations might not sufficiently explore the vast space of possible initial conditions. In the thought experiment mentioned earlier, where each protein can have high or low expression concentrations, there are approximately 2^(50) = ~10^(15) possible combinations of initial concentrations. Thus, the 100,000 random simulations only represent around ~1/10^(10) of the possible initial conditions in this simplistic scenario. Consequently, this limited sampling of initial conditions may not provide enough information to draw meaningful conclusions, even if the initial conditions were more directly linked to kinetic rates.
  
  Please see our response to Comment (3). Briefly, our ICs are continuous total abundances (conserved moieties), not binary high/low states; many IC configurations converge to the same qualitative attractors, so we estimate distributional properties rather than enumerate all combinations. Our convergence diagnostics (independent replicates and sample-size doubling) show that the meta-dynamic distributions stabilise well before N=100,000 (see Supplementary Figure S12). We have clarified this in the Supplementary Information (Error Convergence section) with the new convergence results.
  
  (6) The authors implement a parameter selection step in the manuscript, where they filter out parameter sets that lead to what they term non-biological simulations. However, the rationale for determining if a given parameter set results in a stiff system of ODEs remains unclear. The authors cite references [38,39] to support the claim that stiff equations are not biologically plausible. Still, upon review, it is evident that [38] does not include the term "stiff," and [39] discusses using implicit methods to simulate stiff ODE models without specifically commenting on the biological plausibility of stiff systems. The manuscript lacks direct evidence to justify the conclusion that filtering out parameter sets that result in stiff ODE systems is reasonable. Since the filtering step accounts for the majority of discarded parameter sets, a stronger foundation is required to support the statement that stiff equations are non-biological.
  
  We thank the reviewer for pointing out the issue in our original justification. The reviewer is correct: stiff systems are a common feature of biological models, and our claim that they are likely ‘biologically implausible’ was not well substantiated. The filtering of these model instances was, in fact, due to a computational limitation rather than a biological principle. The issue was that these parameter sets produced systems of ODEs that were so numerically stiff they were unsolvable within a reasonable timeframe by the SUNDIALS ODE solver suite, which is specifically designed for such systems.
  
  Following the reviewer's comment, we investigated the source of this prohibitive stiffness. We discovered it was not an intrinsic property of the parameter sets themselves, but rather an artifact of our simulation setup. The extreme stiffness occurred almost exclusively during the initial integration timesteps, caused by the large initial discrepancy between the concentrations of active and inactive protein forms. This large discrepancy created the conditions for overtly stiff solutions i.e. unsolvable with implemented ODE solve settings. To overcome this problem, we set a large maximum number of steps in the ODE solver for the first couple of time points, enabling the solver to overcome the excessively stiff portion of the solve. We found that the vast majority of the previously 'unsolvable' model instances could now be successfully simulated. Consequently, the number of parameter sets discarded due to solver failure is now negligible (< 1%), and this filtering step no longer accounts for the majority of discarded parameter sets. Most importantly, the distributions of dynamics were not significantly altered by this adaptation.
  
  We have revised the " Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section to reflect this more accurate understanding. We have corrected our original claim regarding the biological plausibility of stiff systems and corrected our use of the references. Ref [38] was included to demonstrate that models of biological systems are stiff, which was a major conclusion of that paper, and [39] was originally included to demonstrate that solving ODEs is reliant on solvers that can integrate stiff systems. Upon review, ref [39] has been removed.
  
  Overall, this investigation has made our analysis more robust by allowing us to include a wider, more representative range of parameter sets, and has tangibly improved the quality of our study.
  
  (7) Additionally, it is important to consider the standard method for accounting for stiff systems, as presented in [39], which involves using implicit numerical methods for ODE simulation. The authors mention using numerical methods from the SUNDIALS suite, which includes implicit methods, but the specific numerical method used remains unclear. Furthermore, it would be valuable for the authors to disclose the number of parameter sets that were filtered to obtain the final set of 100,000 accepted parameter sets. This information would provide insights into the extent of filtering and the proportion of parameter sets that were excluded during the selection process.
  
  We apologise for the lack of specific detail and have now updated the text. To clarify, all ODE simulations were performed using the CVODE solver from the SUNDIALS suite. This solver employs an implicit, variable-order, variable-step Backward Differentiation Formula (BDF) method, which is robust and specifically designed for handling the stiff systems common in biological network modelling. We have now explicitly stated this in the "ODE model construction, modelling, and simulations (page 4: lines 162 – 164)" section of the Methods.
  
  Regarding the filtered parameters, we have included a revised and detailed discussion of this in the "Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section (see our response to comment (6) above). Briefly, after applying the filters, ~40–45% of instances did not reach steady state within the simulation timeframe, and ~50–55% did not meet the minimum drug-response criterion. Approximately 10% satisfied all criteria and were retained for analysis. Importantly, we employed ‘rejection sampling’ and continued drawing until we had N = 100,000 accepted instances that satisfied all the criteria.
  
  (8) An important step in the simulation process described by the authors is the simulation of the "fasted" and "fed" states until an equilibrium is reached. However, it is not clear how the authors determine if the system has reached an equilibrium. It would be helpful if the authors could provide more information regarding the criteria used to assess equilibrium in the simulations. Regarding the "fed" state, it is not explicitly stated whether the mitogen stimulus is assumed to be constant throughout the "fed" experiment. Considering the dynamic nature of mitogen stimulation in biological systems, it would be beneficial if the authors could clarify this assumption and discuss its biological relevance.
  
  We apologise for the lack not specifying this in the original text. A simulation was considered to have reached equilibrium when the concentration of every protein species changed by < 1% over the final 100 time steps of the simulation phase. We have now added this criterion to the "Sampling and filtering of model instances (page 5: lines 177 – 179)" part of the Methods section.
  
  Regarding the second part of the comment, in our simulations, both the mitogenic and the drug inputs were modelled as constant, stepwise functions that, once turned on, remained at a fixed concentration for the remainder of the simulation. The biological rationale for this choice was to rigorously test for bona fide adaptive resistance. By maintaining a constant mitogenic and drug pressure, we can ensure that any observed recovery in the activity of downstream proteins is due to the internal rewiring and adaptation of the signalling network itself, rather than an artefact of the removal or decay of the external stimulus/drugs. We have now clarified this rationale in the "ODE model construction, modelling, and simulations (page 4: lines 168 – 171)" part of the Methods section.
  
  (9) The "Description of Model Scope and Construction" section in the Supplementary Information should include explicitly the model reactions and some discussion about their specific form (e.g., why is '(((kc2f1*pIR*PI3K) / (1 + (pS6K/Ki2))) + (kc2f2*pFGFR*PI3K))' representing the phosphorylation rate of PI3K, with pS6K in the denominator?).
  
  The reviewer is right to ask for model justification. We have expanded the Supplementary “Description of Model Scope and Construction” section (page 2: line 63 – page 5: line 185) to include a complete reaction list with rate laws and a brief rationale for each. We also explain the specific PI3K phosphorylation term: activation by pIR and pFGFR is attenuated by pS6K via a denominator, which captures the well-described S6K-mediated negative feedback that reduces activation (e.g., via IRS1 phosphorylation).
  
  (10) In line 349, the statement "Given that CDK46cycD is only strongly suppressed in just under 60% of the model instances (Figure 3C)" lacks clarity regarding where to look to interpret the 60% value. If this means that 4 out of the 7 model instances are resistant, and the other 2 proteins also have the same percentage of resistance, then there is no apparent reason to focus solely on CDK46cycD.
  
  The reviewer is correct; the figure reference was an error, which has been rectified in the main text (page 9: line 355). The actual figure reference was to Supplementary Figure 2A, which shows the heatmap of all the frequencies for each protein dynamics for all the active protein forms. CDK4/6cycD shows a sustained decreasing dynamic for 59.93% of model instances, which is where this number was derived. We have also now explicitly referenced this number in the supplementary Figure 2A legend.
  
  We focus on CDK4/6cycD because it is the direct pharmacological target of CDK4/6 inhibitors. Our point was to suggest that even when the target is suppressed in the majority of instances (~60%), this does not reliably propagate to uniform downstream inhibition across the network, thus highlighting emergent, network-driven adaptive responses.
  
  (11) We observed that in Fig. 5A, the authors show that multiple pathways are blocked. However, it is unclear whether they reduced the value of one parameter in the experiment or simulated multiple combinations of parameter inhibition. Considering the large number of parameters (94) in the model, if the authors simulated all possible combinations of parameter inhibition, the number of combinations would be significantly more than 94. An actual inhibitor typically has an inhibitory effect on multiple molecules. Therefore, it would be necessary to identify the parameters that lead to drug resistance when multiple molecules are inhibited. However, examining the inhibition patterns for all 94 parameters would be practically impossible. As a potential approach, we suggest using ensemble learning techniques, such as random forests, to handle this problem efficiently. With a dataset of binary outputs indicating the presence or absence of resistance for a sufficient number of inhibition patterns, ensemble learning can be applied to find the parameters that contribute to drug resistance. Popular feature selection algorithms like Boruta could be utilised to identify the most relevant parameters. The results obtained by ensemble learning are similar to the ranking in Fig. 5C, potentially providing a more robust validation of the authors' findings. By incorporating these additional analyses, the authors could strengthen the reliability and significance of their results related to parameter inhibition and drug resistance.
  
  We appreciate the suggestion and the opportunity to clarify. Figure 5A depicts multiple pathways were interrogated, but in the analysis, parameters were inhibited one at a time (OAT) - not in combination. We have revised the figure legend and added a section named “Protein knockdown perturbation analyses (page 6: lines 228 – 233)” in the Methods section to make this explicit. Moreover, some additional text in the main text has been slightly modified to make this clearer (page 11: lines 462-463, page 24: lines 856-857).
  
  We chose the OAT design intentionally to obtain causal, first-order attribution of control points across a broad parameter ensemble without confounding from simultaneous co-inhibition. This provides an interpretable ranking of primary drivers (Figure 5C) that is consistent with the paper’s mechanistic focus. We agree that a multi-target inhibition approach could be a useful next step; however, an exhaustive combinatorial screen is beyond the scope of this proof-of-concept. In such future studies, the ensemble learning, as suggested by the reviewer, could be layered onto our MDN framework to assess robustness of the ranking under co-inhibition.
  
  (12) In explaining the parameterization of the model, we find an implication of a quantitative model. However, upon examining the results in Fig. 7D, we observe that they are only qualitatively correct. When comparing Figs. 7A and 7C, we note that many model instances are immediately suppressed, and the time scale remains unknown. We believe it would be essential for the authors to explain how the model of this study maintains its quantitative nature despite the results in Fig. 7. If such an explanation cannot be provided, it raises concerns regarding the biological reliability of several findings within this study.
  
  While our framework is built on quantitative ODEs, the validation we present in Figure 7 is indeed qualitative. This is an intentional and key feature of our study's design. Our goal was not to build a calibrated, quantitative model of a specific cell line (e.g., MCF10A), but rather to establish a proof-of-concept theoretical framework that systematically explores the full spectrum of dynamic behaviours a given network topology can possibly generate. To achieve this, we intentionally sampled parameters from a very broad, unbiased range to delineate the theoretical upper limit of heterogeneity. This in silico population is therefore designed to be far more heterogeneous than any single isogenic cell line.
  
  The striking qualitative agreement seen between our meta-dynamic distributions and the single-cell data in Figure 7D is thus not a failure of quantitative prediction, but rather a strong validation of our core premise: that a significant degree of signalling heterogeneity exists in cell populations and that our framework can effectively capture its emergent properties.
  
  Regarding the specific comment on Figure 7C, we apologise for the lack of clarity. Nominally, we chose to simulate for 24 hours however, the x-axis in our simulations represents arbitrary time units, as the timescale is dependent on the meaning/units of the parameter values. The goal is to compare the qualitative shape of the response (e.g., rebound, sustained decrease), not the absolute time in hours. Moreover the rapid initial suppression seen in many of our model instances (Fig 7C) is a direct parallel to the rapid suppression seen in the experimental data (Fig 7A). This initial phase is followed by a wide variety of adaptive behaviours (or lack thereof) in both our simulations and the real cells, which is the key phenomenon we are studying.
  
  We have revised the text (page 14: lines 598-601) and Figure 7’s legend to state more explicitly that our validation is qualitative and to clarify the purpose of our broad, uncalibrated approach. We have also added a note in the Discussion (page 18: lines 744-747) that calibrating this framework with cell-line-specific data is a natural next step for generating quantitative, context-specific predictions.
  
  (13) Related to the previous point, the experimental data is presented as fold-change during CDK4/6 inhibition, and we notice that the initial fold-change at time 0 varies between 1 and 1.8. The difference in initial fold-change is unclear to us, as our understanding of fold-change typically corresponds to the change from baseline, typically represented by the protein concentration at time 0.
  
  Furthermore, while the experimental data exhibits uniformly decreasing CDK4/6 activity, a substantial number of simulations indicate constant CDK4/6cycD, showing a significant qualitative discrepancy between the simulations and experimental findings. This disparity makes it difficult for us to interpret the comparison between the two datasets effectively, given the complexities in comprehending the experimental fold-change figure.
  
  As Figure 7 serves as the primary validation of model simulations in the manuscript, we believe that the current presentation may not provide a compelling reason to believe that the model accurately captures experimental data. To enhance clarity and validation, we suggest overlaying the experimental data over the simulations or considering the median and 10/90% percentile of the experimental data, which may potentially offer improved readability and facilitate a more robust interpretation of the comparison.
  
  The experimental data from Yang et al. (ref 55, main text) measures kinase activity using a nucleus-to-cytoplasm translocation reporter system, wherein a bait protein is phosphorylated by the target kinase causing it to translocate from the nucleus to the cytoplasm. Hence, the y-axis represents the ratio of nuclear vs. cytoplasmic fluorescence, not a fold-change from a t=0 baseline. The variation in the starting value (between 1 and 1.8) reflects the inherent heterogeneity in the reporter's localization across individual cells even before the drug is added. We have updated the y-axis label and revised Fig. 7’s legend to state this explicitly.
  
  The most likely explanation for the discrepancy between experimental dynamics and our simulation dynamics is that the experimental data comes from an isogenic cell line that is largely sensitive to CDK4/6 inhibition. Our simulations are derived from a very wide parameter sweep, where the intent is to represent all possible cell states. It is quite striking that that there is such a high correlation between the experimental data and simulations, indicating that perhaps the heterogeneity of even isogenic cell lines is significantly greater than might be intuited; a point we now mention in the revised Discussion (page 17: lines 716-727).
  
  It is worth noting again, that our analysis is intentionally constructed to be as heterogeneous as possible, and is not trained on any biological data that might otherwise constrain the output-behaviour space. The isogenic cell line almost certainly represents a much more constrained output-behaviour space than our analysis.
  
  The y-axis label has also been updated accordingly. As mentioned in (12) this result is intended as a qualitative validation, showing that cell lines indeed have highly variable signalling dynamics. Given the range of parameters tested, we think it is surprising that the degree of agreement between the experiment and our analysis is as high as it is. Again, we believe this suggests that heterogeneity may be more prevalent than is intuited. We do not believe we have made any strong quantitative claims in the main text, and we certainly aim to work towards biological, quantitative validation in the future. Finally, we altered the wording of the results heading (page 14: line 562) to make it clear that we are only making qualitative claims and removed the claim that the evidence was strong.
  
  With these clarifications and corrections, we believe the validation is now much more compelling. The key point is not a perfect quantitative match, but the strong similarity in the distribution of heterogeneous behaviours.
  
  (14) The authors mention simulating treatment with 10nM of CDK4/6i or Ei, but specific details on how this treatment is included in the model simulations are not provided. This lack of information makes it challenging to fully evaluate the comparison between model simulations and experimental evidence in Figure 7. It would be highly appreciated if the authors could clarify how the treatment with CDK4/6i or Ei is incorporated into the simulations to facilitate a better understanding and interpretation of the results.
  
  To clarify, the effects of the inhibitors were incorporated directly into the kinetic rate laws of their respective target reactions.
  
  CDK4/6 inhibitor (CDK4/6i): This was modelled as an inhibitor of the formation of the active CDK4/6-cyclin D complex. We have now explicitly detailed this in the description for reaction R27 in the "Description of Model Scope and Construction" section of the Supplementary Information.
  
  Estrogen Receptor inhibitor (Ei): This was modelled as an inhibitor of the estrogen-dependent activation of the Estrogen Receptor. This is now explicitly detailed in the description for reaction R15 in the same supplementary section.
  
  It is however important to reiterate that our goal in Figure 7 is qualitative, shape-based comparison; therefore, we used a fixed fractional inhibition (reported in Methods) rather than a calibrated IC50/Hill model.
  
  (15) The authors state strong support for their modelling conclusions based on the literature. However, we still have concerns regarding the validation of the model against CDK2 or CDK4/6 data in Figure 7, as it appears less convincing to us. Furthermore, the authors list known resistance mechanisms that are replicated in their modelling. Nevertheless, we find the conclusion somewhat weakened by Figure S10, where approximately 80% of the nodes are implicated in some form of resistance pathway. This raises questions about the model's selectivity, as many proteins included in the model seem to drive resistance in some manner. In the Supplementary Information, the authors mention excluding or abstracting some protein species from the mitogenic and cell cycle pathways to manage computational resources effectively. This abstraction makes it difficult to determine if the proteins identified as potential drivers of resistance genuinely drive resistance or might represent abstractions of other potential drivers. To enhance the manuscript's clarity and address potential concerns about the model's selectivity and abstraction, we suggest providing more details and discussion in the main text.
  
  The reviewer's observation that a large number of nodes are implicated in resistance pathways in Figure S10 is correct. However, we argue this is not a weakness of the model's selectivity, but rather a key finding that reflects the biological reality of adaptive resistance. The literature is replete with a wide and growing number of distinct mechanisms of resistance even to a single class of drugs (1,2), which supports the idea that cancer can co-opt a wide variety of network nodes to survive.
  
  Figure S10 is not a binary map where every implicated node is equal, instead it is a likelihood map, where the colour and weight of the connections represent how often a particular interaction participates in driving resistance across the theoretical full range of possible network dynamics. The figure shows that while many nodes can contribute to resistance, they do so in a hub-like manner i.e. small subsets of nodes coordinate to drive resistance. This provides a rationalised, data-driven prioritisation of the most dominant and recurrent resistance strategies. We draw two important conclusions from this work 1) Resistance likely occurs due to resistance hubs, not individual proteins, and 2) that the frequency of a resistance hub in an MDN analysis is likely proportional to the frequency of that hub emerging as a resistance mechanism in a population of cells and patients.
  
  Regarding the issue of abstraction, the reviewer is correct that this is an inherent feature of any tractable systems model. In our case, several species in the mitogenic/cell-cycle pathways are module-level proxies to control model size. The highly implicated "hub" nodes in our model likely represent critical cellular processes that are themselves composed of several individual protein interactions.
  
  To address these concerns, we have significantly revised the Discussion (page 16: lines 681 – 694) to: (1) frame resistance as a network-level phenomenon; (2) show that our frequency-based ranking is selective, prioritising the most probable, recurrent mechanisms; and (3) clarify that - given model abstraction -our findings implicate critical processes (modules), not just single proteins, as the drivers.
  
  Overall, these changes do not alter our main conclusions: adaptive resistance is an emergent, network-level property; many routes exist, but a smaller set of nodes/modules consistently carry the largest influence across heterogeneous contexts.
  
  (16) We consider that the figures and legends, including the supplementary information, are inadequately explained. The information provided is insufficient for us to comprehend the figures fully, leading to the need for interpretation on our part as readers. This could potentially introduce biases when trying to understand the claims made by the authors. To improve our understanding, it would be essential for the authors to assign appropriate labels to the figures and provide comprehensive explanations in the legends. For example, in Fig 3, we suggest labelling the tree diagrams in panels A and B, as well as the colour bars. We also recommend applying the same approach to other figures, adding accurate axis labels and descriptions of colour gradients to enhance clarity.
  
  We thank the reviewer for this critical feedback. To address this comment, the figure legends have been revised where appropriate and greatly expanded to improve their comprehension. Moreover, we have added explicit labels to all previously unlabelled components, such as the cluster dendrograms and colour code bars in Figure 3A, B.
  
  (17) To enhance readability, we recommend interchanging the order of Figures 1 and 2 in the sequence they appear in the main text. Alternatively, the text can be adjusted to refer to the figures in the correct order. Additionally, attention should be given to the bottom of Fig 1, which appears to be cropped or cut off. Furthermore, the incorrect word spacing in some figure elements, such as Fig. 3A title, Fig. 5B title, and Fig. 6B y-label, should be corrected for improved visual presentation.
  
  Following the reviewer’s comment, the order of Figures 1 and 2 has been switched to reflect the order in which they are referred to in the main text. These Figures have been re-exported to fix unintentional word spacing errors.
  
  (18) We recommend that the language used to refer to the initial conditions in the manuscript is clarified and homogenised. Currently, the authors use different terms such as "basal expression," "protein expression," "state variable values," or "initial conditions" to refer to them. This variation in terminology can be confusing for readers. In particular, the use of "basal expression" is problematic, as it typically refers to the leaky value of a reaction in the absence of an inducer, making it another biophysical parameter of the system rather than an initial condition. To enhance clarity and consistency, we suggest the authors decide on a single term to refer to the initial conditions throughout the manuscript and provide a clear explanation of its meaning to avoid any confusion. This will help readers better understand the concept being discussed and prevent any potential misinterpretations.
  
  We thank the reviewer for this very helpful suggestion. To resolve this and improve clarity, we have homogenized the language throughout the manuscript. We now clarify the use the following 3 terms in their specific contexts:
  
  We use “protein abundances” exclusively for the conserved total abundances of multi-state species (e.g., Xtot = X + pX + complexes) that are sampled across instances to represent expression heterogeneity.
  
  We use ‘initial conditions’ to refer to initial values of the state variables in a model simulation. This term is related to protein abundance as the setting of initial conditions for conserved species sets the protein abundance. This is explicitly stated in the text (page 3: lines 87 - 91).
  
  We use “state variables” to refer to the time-dependent model species.
  
  We avoid the term “basal expression” in technical descriptions. Where a biology-facing phrase is helpful, we use “protein expression level”. This is used when referring to the biological concept that the initial conditions are intended to represent, i.e. the heterogeneity in protein amounts across a cell population.
  
  We have performed a thorough search-and-replace to ensure this new convention is applied consistently and have removed the potentially confusing term "basal expression" from the revised manuscript.
  
  (19) Why are saturable functions (e.g., Michaelis-Menten functions) ignored in the model? What are the potential consequences?
  
  The main objective of this work was to perform a large-scale, systematic exploration of a high-dimensional parameter space (94 parameters) to map the full repertoire of qualitative dynamic behaviours a network topology can support. Using saturable functions like Michaelis-Menten kinetics would have roughly doubled the number of parameters to be explored (from k to Vmax and Km for each enzymatic reaction), making a parameter sweep of this scale computationally intractable. We therefore prioritised the breadth of the parameter search over the depth of kinetic detail, which we believe is the appropriate choice for a proof-of-concept study focused on heterogeneity.
  
  This simplification has potential consequences. A major one is that our model cannot capture phenomena that arise specifically from enzyme saturation, such as zero-order kinetics or certain forms of ultrasensitivity (switch-like responses). However, we argue that this is an acceptable trade-off for two main reasons: (1) Our analysis is based on classifying broad, qualitative response shapes (increasing, decreasing, rebound, etc.). Mass-action kinetics are fully capable of generating this rich spectrum of behaviours; and (2) by varying the mass-action rate constants over nine orders of magnitude (from 10<sup>-5</sup> to 10<sup4></sup>), our parameter sweep effectively samples a vast range of reaction efficiencies. A very low rate-constant can approximate the behaviour of a saturated, low-efficiency enzyme, while a high rate-constant can approximate a highly efficient, non-saturated one. In this way, the broad sweep of the rate parameter partially reflects the effects that would be captured by varying Vmax and Km.
  
  For transparency, we have added a brief rationale to the “ODE model construction, modelling, and simulations” part of the Methods (revised main text, page 4: lines 153-155) and the "Description of Model Scope and Construction" section in the Supplementary file (Supplementary text page 2: lines 63-73).
  
  (20) Given the relevance of the concept of "heterogeneity" in this work, a short discussion about biochemical noise and its implications on the analysis (e.g., why it is not included, and if it will be a next step) would be appreciated.
  
  Our MDN modelling framework represents heterogeneity by creating an ensemble of deterministic models, where each model instance has a unique set of kinetic parameters and/or initial protein abundances. We propose that this is a powerful way to mechanistically represent the functional consequences of all sources of cellular variation. Over time, the effects of genetic mutations, epigenetic states, and even the time-averaged impact of intrinsic biochemical noise will manifest as changes in the effective interaction strengths and protein concentrations within a cell. Our large-scale parameter/IC sweep is designed to systematically explore the full range of dynamic behaviours that can emerge from this underlying biological variation. Therefore, our approach does not compete with stochastic modelling but is complementary to it. While stochastic simulations can capture the dynamic trajectories of single cells, our framework provides a panoramic view of the entire spectrum of possible stable phenotypes that can emerge at the population level. We agree that modelling intrinsic biochemical noise (stochasticity arising from finite copy numbers), e.g. using chemical Langevin or SSA, is a possible extension in future work but expected to be very computationally expensive. We have added a brief discussion on this as future direction in the revised Discussion.
  
  (21) We have noticed that the first four paragraphs of the Discussion section overlap with the Introduction, as they mainly reiterate the significance of the study itself rather than focusing on the specific results obtained. To avoid redundancy and provide a more cohesive and informative discussion, we recommend that the authors shift the focus of the Discussion section towards presenting potential interpretations, even if they are not definitive, of the results obtained. By doing so, the Discussion will serve as a valuable platform for deeper analysis and insightful observations, allowing readers to better comprehend the implications and significance of the research findings.
  
  We thank the reviewer for this structural feedback. Following the reviewer's feedback, we have significantly rewritten and restructured the Discussion section. The redundant introductory material has been removed.
  
  The rewritten Discussion centres on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.
  
  We believe this substantial revision has transformed the Discussion into a much more insightful and valuable part of the manuscript that directly addresses the reviewer's concerns.
  
  (22) The supplemental text file containing the model equations can be a bit challenging to read and understand. It would be greatly beneficial if the authors could consider generating a file using a typesetting program.
  
  We have now included a typeset list of state variable equations and ODEs, along with the original model files.
  
  (23) The authors mentioned that some model parameterizations result in negative solutions, which is surprising. Access to the model equations would help understand why this happens and is crucial for researchers who may want to use this approach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.ach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.
  
  The reviewer is correct to be surprised by the mention of negative solutions, as negative concentrations are physically impossible. We clarify that these are not a result of any structural flaw in our model's equations but are a well-known, although rare, numerical artifact of floating-point arithmetic in computational solvers.
  
  Our model is constructed using standard mass-action and first-order kinetics, which structurally guarantee non-negativity. However, when a species' concentration approaches the limits of machine precision (i.e., becomes a very small number extremely close to zero), the ODE solver can, in rare instances, numerically undershoot zero, resulting in a small negative value. If this occurs, it can lead to instability in subsequent integration steps.
  
  This is not a biological phenomenon but a computational one. Therefore, the standard and appropriate procedure, which we follow, is to implement a filter that discards any simulation trajectory where such a numerical instability occurs.
  
  (24) The reference listed for the CDK4/6 and CDK2 measurements is Yang et al. [55] in the figure caption, but as Xe et al. in lines 559-561 of the manuscript.
  
  The text has been updated to match citation.
  
  (25) We suggest that the authors revise and cite a previous study conducted by Yamada et al. (Scientific Reports, 2018), which presents an approach to expressing cell heterogeneity as a probability distribution of model parameters.
  
  Following this suggestion, we have revised the Discussion (see response to comment (21)) to include and discuss Yamada et al. (Scientific Reports, 2018), which models cell heterogeneity as a probability distribution over parameter values.
  
  (26) In the manuscript, on line 677, the authors state, "This indicates that there is an upper limit to the degree to which parameter sets can influence the qualitative shape of a protein's dynamic within a given network topology." We wish to highlight that this finding may not be particularly surprising. Given that the parameters were randomly determined within a specific range, it is understandable that altering the number of parameter samples would not substantially impact the distribution of model instances.
  
  We thank the reviewer for this insightful comment, which allows us to clarify the significance of this finding. While it is true that any sampling from a fixed distribution will eventually converge statistically, our conclusion is not about statistics but about the intrinsic, constraining properties of the network's topology. The novelty is not that the distribution converges, but that it converges to a surprisingly limited and finite repertoire of qualitative dynamic behaviours. A complex, non-linear network with nearly 100 free parameters could theoretically generate an almost endless variety of complex dynamics. Our finding is that this specific biological topology acts as a powerful filter, robustly channelling the vast majority of the near-infinite parameter combinations into a small, recurring set of functional outputs (increasing, decreasing, rebound, etc.).
  
  The reason for this finite limit is mechanistic, as the reviewer's comment prompted us to investigate further. Our parameter sweep already covers an extremely wide, 9-order-of-magnitude range. As we pushed parameter values to even greater extremes in exploratory simulations, we found they do not generate novel, complex dynamic shapes. Instead, they tend to drive network nodes into saturated states- either permanently "on" (maximally activated) or permanently "off" (minimally activated). In both cases, the node becomes unresponsive to upstream perturbations.
  
  Therefore, further expanding the parameter range would be unlikely to uncover new behavioural categories; it would simply increase the proportion of model instances classified as "no-response." This demonstrates a fundamental principle: the network topology itself enforces an upper limit on its dynamic complexity. We think this inherent robustness is what allows for reliable cellular signalling in the face of constant biological variation. We believe this is a non-trivial finding, and we have revised the Discussion (page 16: lines 664 - 680) to state this conclusion and its implications more clearly.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.01.24.525460v2
www.biorxiv.org www.biorxiv.org

Pharmacological inhibition of MCL-1 disrupts mitochondrial cristae and depletes the human neural progenitor cell pool

1
1. EMBOpress 02 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  We thank all reviewers for the valuable feedback and critical insight on our study. We acknowledge the concern that the manuscript, in its initial form, appeared descriptive and did not provide the mechanistic insight inferred from the current data. In the revised manuscript, we will (i) more clearly delineate what mechanistic inferences can be drawn from the existing data, (ii) expand our discussion of the caspase-independent mechanisms, and (iii) incorporate additional experiments/analyses aimed at identifying downstream effectors that mediate the observed phenotypes. In this revision plan, we have included six new figures addressing some of the major issues raised by reviewers.
  
  Specifically, to address questions about mechanistic insight, we generated stable ACSL1:HaloTag expressing hESCs. Currently presented as Figure 1A for reviewers____. __ACSL1 is a critical enzyme that catalyzes the first step of fatty acid oxidation at the outer mitochondrial membrane. Our previous analysis and work from the Opferman lab demonstrated that ACSL1 contains a BH3-like domain. Thus, we examined the effects of MCL-1 inhibition on the mitochondrial localization of this enzyme. Our findings pinpoint that MCL-1 inhibition is causing the displacement of ACSL1 from the mitochondria (__Figures 1B-C for reviewers). Our interpretations of the effects of MCL-1 inhibition are 2-fold: 1) as we show in our data, MCL-1 inhibition causes disruption of the mitochondrial cristae, altering the microenvironment for fatty acid oxidation, and 2) as seen in cancer cells, the MCL-1 inhibitor may also displace ACSL1 from the mitochondria. In the new version of the manuscript, we will focus on these 2 mechanisms as mechanistic outcomes of MCL-1 inhibition.
  
  We have included data of cells treated with Perhexilin (CPT1/2 inhibitor), and Etomoxir (CPT1a inhibitor) (Figure 2 for reviewers). This experiment determines whether direct perturbation the FAO pathway mimics the effects of the MCL-1i.
  
  We have assayed the effects of MCL-1 inhibition on oxygen consumption rates in NPCs. Currently presented as Figure 3 for reviewers.
  
  We will perform MCL-1:MICOS proximity ligation assays and/or immunoprecipitation assays to determine whether MCL-1 inhibitors disrupt the association of MCL-1 with MICOS. Preliminary data suggesting an association (albeit, very weak) are shown in Figure 4 for reviewers. __Reviewer #1____ (Evidence, reproducibility and clarity (Required)): __
  
  Summary: This study claims that beyond its canonical anti-apoptotic function, MCL-1 has essential non-apoptotic roles in human neurodevelopment. Pharmacologic inhibition of MCL-1 in human neural stem cells disrupts mitochondrial inner membrane architecture by destabilizing cristae and the OPA1-MICOS complex, leading to swollen mitochondria with disorganized cristae. These structural defects impair fatty acid oxidation and lipid droplet homeostasis, linking cristae integrity to metabolic competence. Independently of apoptosis or proliferation, MCL-1 inhibition selectively depletes intermediate neural progenitors, indicating a direct role in lineage progression. Overall, the work positions MCL-1 as a key regulator of mitochondrial structure-metabolism coupling that instructs neural progenitor identity and human neurogenesis.
  
  Overall: The study does a good job of using (in most assays) caspase inhibition (e.g., QVD treatment) to block apoptotic responses induced by MCL-1 inhibition. As a result, many of the phenotypes caused by inhibition are likely to be independent of caspase activation. As a result, this manuscript would be of interest to researchers that study the topics of the BCL-2 family and cell death signaling, mitochondrial bioenergetics and dynamics, neurodevelopment, and cellular metabolism. However, as currently presented the manuscript is only descriptive and lacks mechanistic insight.
  
  We thank Reviewer 1 for the insightful evaluation of our work. We are encouraged that the reviewer finds the study relevant to investigators in the fields of BCL-2 family biology, mitochondrial dynamics and bioenergetics, neurodevelopment, and cellular metabolism. We also thank the reviewer for pointing out the need to increase the mechanistic insight of our findings. As mentioned above, in the revised manuscript, we are proposing to address this.
  
  Major Concerns:
  
  1) The authors only use a single MCL-1 inhibitor and never use other non-targeting BH3-mimetics (such as venetoclax) as negative controls. This seems like a missed opportunity to demonstrate that the phenotypes observed are MCL-1 dependent.
  
  This is an excellent point. We will include venetoclax (ABT-199) to examine their effect on intermediate progenitors (TBR2 +) and early born neurons (BIII tubulin +).
  
  2) There is no mechanism proposed in this study other than reliance upon QVD as not affecting the phenotypes. As submitted, the manuscript only can speculate that these phenotypes are due to non-apoptotic roles of MCL-1 inhibition. The authors have missed an opportunity to explore MCL-1's non-apoptotic functions directly.
  
  Mechanistically, we propose MCL-1 is acting in 2 ways: 1) as we show in our data, MCL-1 inhibition causes disruption of the mitochondrial cristae, altering the microenvironment for fatty acid oxidation, and 2) as seen in cancer cells, MCL-1 inhibitors may also displace ACSL1 from the mitochondria.
  
  In the past few weeks, since receiving the initial reviews, we have focused on testing the 2nd possibility, since the accumulation of lipids was also seen in cancer cells (see PMID: 38503284). We have successfully generated stable ACSL1:HaloTag expressing hESCs (Figure 1A for reviewers). Our findings included here, ACSL1 is displaced from the mitochondria by MCL-1 inhibition in NPCs (Figures 1B-C for reviewers).
  
  Other concerns exist that weaken the impact of the study.
  
  Figure 1 should include the fact that QVD inhibition (shown in Sup Fig 2) does not obviate the phenotype induced by pharmacological inhibition of MCL-1 on mitochondrial morphology. We would like to clarify that QVD does prevent the phenotypes induced by MCL-1 inhibition on mitochondrial morphology. In Fig1B, we report an increase in volume and surface area at 24h and 48h along with a decrease in mitochondrial content at 48h when NPCs were treated with MCL-1i only. However, NPCs co-treated with QVD in Supp Fig 2B did not exhibit any significant morphological phenotypes on average or at min/max values. Reviewer 1 may be referring to Fig 1B's corresponding min/max values presented in Supp Fig 2A where we reported an increase in __max __volume.
  
  Figure #
  
  Volume
  
  Surface Area
  
  Fig 1B (MCL-1i only, avg values)
  
  Increase (avg vol)
  
  increase (avg)
  
  Supp Fig 2B (MCL-1i+QVD)
  
  no change
  
  no change
  
  Supp Fig 2A (MCL-1i only, max/min values)
  
  increase (max vol)
  
  no change (max)
  
  For clarity, we will move Supplementary Fig 2A into Supplementary Fig 1.
  
  Figure 2 would benefit from evidence that caspase inhibition does not repress the phenotype on mitochondrial cristae morphology (volume and area). Furthermore, the FIB-SEM data are very hard to appreciate as the size precludes visualization of individual mitochondria.
  
  While we included the visualization of the segmented mitochondria and cristae (Figure 2C), as well as snapshots through the z-stack for segmented cristae only (Figure 2E) and segmented mitochondria separately (Supp Figure 3A) in the original manuscript, we are also now attaching the FIB-SEM 3D reconstruction videos (New Supplementary Videos 1-2 for reviewers) (1. Mito and cristae, 2. Cristae only, 3. Mito only) for ease of visualization purposes.
  
  Figure 3 reports that MIC60 and OPA1 appear to be downregulated in response to MCL-1 inhibition, but these appear to be more significant only when QVD is added. Why would the phenotype be obscured in the non-QVD setting (Fig. 2B&C). How does MCL-1 inhibition lead to changes in MIC60/MICOS/OPA1? This seems quite preliminary at this point.
  
  In Figures 3B and 3C, we report decreased protein levels of short-form OPA1 and MIC10 only, not MIC60. We argue that our data with QVD shows that the cell death function of MCL-1 (i.e., inhibiting cell death effectors from initiating the caspase cascade) is not the main trigger of the phenotypes we report (cristae dysregulation and fatty acid oxidation disruption), however, cells without a functional cristae and/or defects in FAO, may not be able to survive long-term. Thus, QVD treatment preserves these cells that may not survive the dismantling of such an essential structure. To confirm this, we have performed immunofluorescence of cleaved caspase 3 (Figure 5 for reviewers). These results show that indeed MCL-1 inhibition at the time points of our study doesn't result in increased activation of Caspase-3. We reported similar results of MCL-1 inhibition in oligodendrocyte precursor cells (Gil and Hanna et al., Glia, 2025, PMID: 41420072)
  
  The loss of MIC60 and OPA1 should repress electron transport chain function, are such impacts observed in the cultured cells? This could be shown by assessing oxygen consumption, etc. Such data would enhance the authors' conclusion that MCL-1 inhibition leads to defects in mitochondrial physiology*. *
  
  We completely agree with this comment by Reviewer 1. In our revision, we will include an assessment of mitochondrial oxygen consumption rate, using the Seahorse analyzer (mitochondrial stress test), of NPCs treated with MCL-1i. Preliminary data (n=3) are currently presented as Figure 3 for reviewers. Interestingly, these data show a more nuanced cellular response. Consistent with our conclusion that MCL-1 inhibition does not cause apoptotic cell death, MCL-1i did not affect mitochondrial respiration at baseline. The specific deficits appear in spare respiratory capacity and maximal respiration, meaning cells can sustain routine mitochondrial function but lose the ability to respond to increased energetic demand. This suggests MCL-1 loss creates a mitochondrial reserve deficiency rather than a generalized bioenergetic failure. The results with caspase inhibitors show a near-zero OCR across both 24h and 48h timepoints, and significant reductions in maximal respiration, spare respiratory capacity, and non-mitochondrial OCR. Remarkably, these conditions are not detrimental to newborn neurons, as shown in Figure 7. This is very interesting because it suggests that, under severe bioenergetic failure, neural stem cells (PAX6+) can differentiate into newborn neurons in a TBR2-independent manner. More relevant to this study, our results unequivocally demonstrate that TBR2-positive cells depend on the non-apoptotic function of MCL-1
  
  In Figure 4, the differences between transcripts (qPCR data) and protein (immunoblot) data are often confusing and not well explained. Why do the authors propose that mRNA expression is decreasing whereas the protein expression is increasing? Example CPT1. Furthermore, it is unclear what these data mean functionally? Is this reflective of enhanced lipid oxidation or simply a response to inhibition of fatty acid oxidation? Clarification of the impact of these findings is necessary.
  
  We agree with Reviewer 1 that the results could be hard to interpret. However, the effects of MCL-1 inhibitors on the transcription of fatty acid oxidation genes have been widely cited by the work of Opferman and Walensky (PMID: 36198266). We speculate that the effects on transcription are triggered by mitochondrial signaling. The mechanistic insight into this phenomenon would be an interesting next step.
  
  In the case of CPT1, we addressed this comment and found that the difference is due to differential expression of isoforms The RT-qPCR shown in Figure 4, is on CPT1c, whereas the western blot is on CPT1a. Unfortunately, after trying several products, we determined that there are no good antibodies for CPT1c. Thus, since we can't compare gene and protein expression, we will include CPT1a RT-qPCR data to complement the western blot.
  
  The increase in lipid droplet number induced by MCL-1 inhibition has been previously documented, but it is unclear whether this increase is related to an inability to oxidize lipid (defective fatty acid oxidation) that leads to increases in the cellular abundance or whether this indicates that MCL-1 inhibition leads to enhanced storage. Do other inhibitors of fatty acid oxidation lead to similar increases in lipid droplet size and abundance? Does QVD inhibition affect this phenotype?
  
  This is a great point raised by Reviewer 1, and one we have also wondered about. We conducted an experiment using C16 BODIPY to address this point (Figure 6 for Reviewers). We observed no changes in C16 lipid droplet accumulation in count, volume, or surface area when cells were treated with MCL-1 inhibitor for 24 hours total with or without a starvation period in the last 6 hours of treatment. However, we observed significant pan-lipid droplet accumulation in the same conditions. This contrast suggests that FAO of exogenous LC-fatty acids is not reliant on MCL-1. This finding does not discount from the requirement of MCL-1 for other FAO processes especially given the major limitation of how much C16 BODIPY (fluorescent palmitate) can be administered to the cells (10µM) which was 10-fold less than what we exogenously supplied to the cells for the pan-BODIPY experiment (100µM, see Figure 5). It is entirely possible that this small dose was not enough to detect any lipid droplet accumulation.
  
  We have now also included experiments using etomoxir and perhexiline to assess their effects on TBR2/PAX6 (Figure 2 for reviewers). The results indicate that inhibiting the FAO pathway does not fully mimic the effects of MCL-1i on TBR2. However, we show that MCL-1i displaces ACSL1 from the mitochondria, a step that is upstream of CPT1/2. We suggest a model in which the coordinated non-apoptotic function of MCL-1 at the outer mitochondrial membrane promotes ACSL1 activity and, in the inner mitochondrial membrane, regulates mitochondrial cristae morphology. While our data point to this model, we are limited by the tools to investigate it further, but it will be a great direction for future experiments.
  
  For Figure 6, while these data may be very meaningful, as presented they are very hard to appreciate. Insets that show the neuronal populations would help to convey the point that the differentiation is impacted. Also, are there other methods that could confirm these observations (qPCR to show changes in differentiation).
  
  We agree with Reviewer 1. In the new version of the manuscript, we will include panels that zoom into the cell populations we quantified. The current panels will go to a new Supplemental figure. We will also add the TUBB3 to the qPCR panel in the new version.
  
  Figure 7 is also very hard to appreciate. What is the reader to see? Can these be quantified? It seems that QVD may be rescuing in this figure, does this suggest that MCL-1 inhibition might be inducing death. All of this needs to be quantified.
  
  We will provide quantification of BIII tubulin branching, and it will be included next to the images provided.
  
  BCL-XL has also been implicated in affecting mitochondrial electron transport chain function (See PMID: 19255249, 21926988, 21987637). Can BCL-XL inhibitors affect any of the phenotypes associated here?
  
  We will include experiments to test the effect of BCL-2 and BCL-XL inhibitors on TBR2 cells to address this comment.
  
  Please be carefully avoid using the term "MCL-1 loss", when talking about pharmacological inhibition. Only genetic ablation (e.g. knockout, silencing, etc.) should be termed loss.
  
  We have now removed the reference to MCL-1 loss in line 199.
  
  __*Reviewer #1 (Significance (Required)):
  
  The study advances in human cells the impacts of MCL-1 inhibition. They replicate many impacts previously observed in mouse systems and refine analyses to impacts on MICOS complex, lipid droplet storage, and neuronal differentiation. While these findings are important and would be well received by a wide audience, the study fails to provide almost any mechanistic insight into how these phenotypes are being induced. The only common theme is that blocking caspase activation in many assays fails to block the phenotype.
  
  *__
  
  __Reviewer #2_ (Evidence, reproducibility and clarity (Required)): _*
  
  Summary: This manuscript by Hanna et al. investigates non-apoptotic roles of MCL-1 in human neural stem cells and connects MCL-1 inhibition to mitochondrial cristae formation and beta-oxidation. Connecting these roles to brain development, the authors also show a reduction in the number of progenitor cells upon MCL-1 inhibition, independently of caspase activity. Throughout their work, the authors make use of an impressive array of imaging techniques. While the methods used offer sufficient evidence to connect MCL-1 inhibition to cristae architecture, the mechanistic underpinnings of this effect remain unexplored. *__
  
  We thank Reviewer 2 for the thoughtful and positive assessment of our manuscript. We appreciate the reviewer's recognition that our study reveals non-apoptotic roles of MCL-1 in human neural stem cells. We are also grateful for the acknowledgment of the imaging approaches employed, which allowed us to connect MCL-1 function to cristae architecture with multiple complementary techniques. We acknowledge the reviewer's point that the mechanistic basis by which MCL-1 influences cristae structure remains insufficiently defined. In the revised manuscript, we will clarify the limitations of the current data, expand our discussion of potential mechanisms, and incorporate additional analyses to identify downstream effectors that mediate these structural and metabolic changes.
  
  Major comments:
  
  - In Fig. 1B, the very same representative images are shown for both conditions (DMSO and S63845) at 48 hours.
  
  We deeply appreciate Reviewer 2 for catching this unintentional duplication that occurred during figure preparation. We have now corrected this issue.
  
  - For Western Blot analysis, it looks like the authors only quantified the band density of their proteins of interest without considering varying levels of control protein (Actin) levels. Normalizing the protein levels to actin would account for any differences in loaded protein amounts (although a Ponceau staining might be preferable still to exclude this). This is especially relevant for Fig. 4E, where actin levels visibly differ between the conditions.
  
  All WB quantifications were normalized to Actin (this detail is now added to the y-axis of all band density graphs and figure legends). In addition, we will transform the data to a logarithmic scale to "normalize" for gel-to-gel variability.
  
  - The authors offer evidence that MCL-1 inhibition impedes proteolytic cleavage of OPA1-L into the OPA-1-S isoforms, yet do not explore the mechanism behind this. Since OPA1 is cleaved by both OMA1 and YME1L, determination of the levels of these proteases could help shed some light on the mechanism leading to cristae reorganization.
  
  We will follow up on Reviewer 2's comment with a WB analysis of OMA1 and YMEL in cells treated with an MCL-1 inhibitor.
  
  - Generally speaking, while the authors show all those effects (cristae defects, FAO dysfunction) upon MCL-1 inhibition, it would be interesting to see whether any of those effects can be rescued by blocking FA import e.g. through carnitine palmitoyl- transferase 1a (CPT1a) inhibition with etomoxir to understand if they are downstream of altered Fa supply. This could affect cristae morphology through altered Cardiolipin biogenesis.
  
  This is an excellent point, which was also raised by reviewer 1. We have now included experiments using etomoxir and perhexiline to assess their effects on TBR2/PAX6 (Figure 2 for Reviewers). As mentioned above, the results indicate that inhibiting the FAO pathway does not fully mimic the effects of MCL-1i on TBR2. However, we show that MCL-1i displaces ACSL1 from the mitochondria, a step that is upstream of CPT1 and 2. We suggest a model in which the coordinated non-apoptotic function of MCL-1 at the outer mitochondrial membrane promotes ACSL1 activity and, in the inner mitochondrial membrane, regulates mitochondrial cristae morphology. While our data point to this model, we are limited by the tools to investigate it further, but it will be a great direction for future experiments. The suggestion of Reviewer 2 that the effects on FAO could impact cardiolipin biogenesis is a very exciting possibility. However, difficult to test with the tools available.
  
  - In line 262 the authors discuss that mitochondria lose metabolic function upon MCL-1 inhibition. This claim would require additional experiments. While the authors look at lipid droplet accumulation and FAO enzymes, there are many more aspects to mitochondrial metabolic function that should be investigated. While measuring the oxygen consumption rate via Seahorse might require additional resources (optional), measurements of ATP production, ROS generation or determination of the mitochondrial membrane potential should be feasible.
  
  We fully agree with Reviewer 2's comment, which was also raised by Reviewer 1. In our revision, we will include an assessment of the mitochondrial oxygen consumption rate of NPCs treated with MCL-1i, measured using the Seahorse analyzer (mitochondrial stress test). These data are presented as Figure 3 for reviewers. Interestingly, these data show a more nuanced cellular response. While MCL-1i does not globally collapse mitochondrial respiration at baseline, the specific deficits appear in spare respiratory capacity and maximal respiration, meaning cells can sustain routine mitochondrial function but lose the ability to respond to increased energetic demand. This suggests MCL-1 loss creates a mitochondrial reserve deficiency rather than a generalized bioenergetic failure. The results with caspase inhibitors show a near-zero OCR across both 24h and 48h timepoints, and significant reductions in maximal respiration, spare respiratory capacity, and non-mitochondrial OCR. These conditions are detrimental for TBR2-positive NPCs (Figure 6) , but not for newborn neurons (Figure 7).
  
  - While the authors "propose a model in which MCL-1 associates with MICOS", they do not offer direct scientific to support this hypothesis. Co-immunoprecipitation experiments or e.g. proximity ligation assays would better support the proposed model.
  
  We agree with this statement. Preliminary, we have performed proximity ligation assays and immunoprecipitation analyses to test for this interaction (see below and ____Figure 4 for reviewers), and the results indicate an interaction, albeit very weak. In the revised version of the manuscript, we will attempt to repeat these experiments with MCL-1i.
  
  - While Fig. 7 shows representative images, quantification e.g. for the truncation of neuronal processes is missing.
  
  We will provide quantification of BIII tubulin branching, which will be included alongside the images provided.
  
  - In lines 219f. the authors state that they "observed a significant downregulation of PAX6 and EOMES at 24 hours that was not rescued by QVD co-treatment". While there is still a trend towards a downregulation, there is no statistical significance anymore. In fact, PAX6 levels almost mirror those of SOX2 which is not described as "downregulated" by the authors. In order to be more consistent, I would suggest rephrasing this part, or at least reword it to be less absolute.
  
  In the new version, we will clarify that while QVD rescued TBR2 and PAX6 transcript levels at 24h, it did not rescue them at 48h. We will also mention the downregulation of SOX2 at 48h that persists with co-treatment.
  
  - Brinkmann et al. (2025) also investigated cristae structure upon MCL-1 deletion in vivo and found no effect when MCL-1 was replaced with other Bcl-2 family members. It would be interesting to combine MCL-1 inhibition with overexpression of MCL-1 versus BCL-XL to reconsolidate some of the discrepant findings.
  
  While this is a great suggestion for future studies, there are some complications. Specifically, it is likely that the inhibitor may also target the overexpressed MCL-1 and thus, a mutant form is needed.
  
  To address this, we generated a Flag-tagged MCL-1 construct with a mutated BH3 domain, previously described by Kotschy et al. Nature 2016. We validated the construct in HeLa cells, but unfortunately the mutant protein appears to be significantly less stable than the WT construct, complicating analysis of this experiment.
  
  Minor comments:
  
  - In Supp. Fig. 1C the MCL-1 protein is shown both to run above 37kDa (upper panel) and below 37 kDa (lower panel). Could the authors please comment on why this is the case?
  
  The observed variation is caused by drift in the gel during electrophoresis. In Fig 1C, the protein ladder is on the edge of the gel, whereas in Fig 1E, the protein ladder is in the middle of the gel, and the last sample is on the edge and also exhibits edge drift.
  
  - In line 64 of the introduction the authors mention clinical trials yet do not give a citation for these trials making it hard to judge whether the content of these trials is actually related to the brain.
  
  This information is anecdotal, based on an Amgen press release.
  
  - MCL-1 as well as ACSL-1 are sometimes written without the hyphen both in the text and figures.
  
  We will carefully check the manuscript before submission.
  
  - Lines 92-94 and 106-108 essentially highlight the same existing knowledge gap. Maybe the content of these two paragraphs could be combined in order to avoid repetition.
  
  We thank Reviewer 2 for this suggestion. We will do this in the new version of the manuscript.
  
  - In Fig. 1A, the authors provide a schematic for their experimental design. While the figure legend is very thorough, some of this information (like the days of collection) could also be included in the figure itself. The same is true for schematics in the following figures.
  
  We agree with this and will incorporate the suggestion in the new version.
  
  - Fig. 2A includes a typo (analyze) but would maybe also be more suitable for the supplement figures or could even be combined with Fig. 1A as not much new content is added.
  
  We already incorporated these changes in the new version of the manuscript.
  
  - Regarding statistical analysis, could the authors please comment on why they did not consider one-sample t-tests suitable for the cases where control values were set at 1 (e.g. Fig. 4B, C for the relative expression).
  
  This is a valid suggestion. We will rerun RT-qPCR data using a one-sample t-test.
  
  - In lines 247f. the authors state that "inhibition of MCL-1 leads to [...] and disassembly of the MICOS complex as well as OPA1". This sounds like OPA1 is still cleaved upon MCL-1, which is not at all what the authors showed and further discuss. Rewording of the sentence would help in avoiding any misunderstandings.
  
  We agree with this comment and have now reworded the paragraph: "Inhibition of MCL-1 leads to structural collapse of the cristae likely due to the possible disassembly of the MICOS complex, as suggested by decreased MIC10 levels, and interruption of OPA1 cleavage, as suggested by decreased short-form OPA1, two scaffolds required for cristae maintenance."
  
  - In lines 210f. the authors state that "quantitative imaging increased the average and maximum volume of lipid droplets". While there is definitely a trend towards an increase for the maximum volume, the increase is in fact not statistically significant. This should be reflected in the wording.
  
  We have reworded this to "Quantitative imaging revealed a significant increase in average lipid droplet volume and a trending increase in maximum volume of lipid droplets."
  
  - In Fig. 6 the overlap between TBR2 and PAX6 is hard to judge when printed out. Including a zoom-in may make it easier to judge.
  
  We agree with Reviewer 2. In the new version of the manuscript, we will include panels that zoom into the cell populations we quantified. The current panels will go to a new Supplemental figure. We will also add the TUBB3 to the qPCR panel in the new version.
  
  - In Fig. 7 the color-coding is listed in the figure legend but is missing from the figure itself. If the authors could include this, as they did for the other figures, it would further improve this figure.
  
  We agree. We have specified the channel color in the new figure.
  
  - Line 238 should reference Fig. 7A, as Fig 7B does not exist.
  
  Thanks for catching this. It is already corrected
  
  - In the figure legends the authors state that biological replicates were used. Were technical replicates also performed?
  
  Yes, technical replicates were performed for RT-qPCR.
  
  Reviewer #2 (Significance (Required)):____ Significance
  
  The authors make use of a wide array of imaging techniques to further elucidate non-apoptotic roles of MCL-1. The study has the potential to offer new insights into mitochondrial biology on the level of basic research rather than translational. While the methods used offer sufficient evidence to connect MCL-1 inhibition to cristae architecture, the mechanistic underpinnings of this effect remain unexplored. Nevertheless, the study offers additional knowledge on the role of MCL-1 in human neural stem cells, whereas previous research mostly focused on cardiomyocytes or cancer cells.
  
  Reviewer #3____ (Evidence, reproducibility and clarity (Required)):
  
  Summary: ____ In this study, Gama et al. describe a non-canonical role for the anti-apoptotic protein Myeloid Cell Leukemia-1 (MCL-1) in mitochondrial cristae organization and suggest a role of MCL-1 in regulating metabolism and neuronal differentiation. Using fluorescence microscopy imaging and electron microscopy, the authors show changes to mitochondrial morphology upon treatment with MCL-1 inhibitor S63845. MCL-1 inhibition results in altered protein and transcript levels of some key proteins involved in mitochondrial cristae organization and fatty acid metabolism. While some of the findings are interesting and indeed point towards a non-canonical role of MCL-1, several key conclusions of the authors are not sufficiently supported by the data shown in the manuscript.
  
  We thank Reviewer 3 for the careful evaluation of our manuscript. We appreciate the reviewer's recognition that our study identifies a potential non-canonical role for MCL-1 in mitochondrial cristae organization, metabolism, and neuronal differentiation. As with Reviews 1 and 2, we are encouraged that the reviewer finds these observations interesting and suggestive of previously unappreciated functions for MCL-1. We agree that stronger evidence is required to firmly link MCL-1 inhibition to specific changes in MICOS organization and metabolic regulation. In the revised manuscript, we will (i) more clearly distinguish between observations and mechanistic inferences, (ii) temper conclusions where appropriate, and (iii) incorporate additional analyses and controls to better substantiate the proposed model.
  
  Major comments:
  
  The authors try to disentangle the apoptotic and non-apoptotic role of MCL-1 through addition of a caspase inhibitor. However, I am not convinced that phenotypes found under the addition of caspase inhibitor are necessarily caused by non-canonical functions independent of apoptosis. It could also be that the observed changes happen upstream of caspase activation. In addition, many of the described finding, such as CPT1 expression changes, only happen in the presence of the caspase inhibitor. If one follows the logic of the authors, changes associated by non-canonical MCL-1 functions should happen under MCL-1 inhibition and caspase inhibition, but not with MCL-1 inhibition only____. __ The reviewer is right that we expected non-canonical functions to happen under MCL-1 inhibition and caspase inhibition. Our data with QVD shows that the cell death function of MCL-1 (i.e., inhibiting cell death effectors from initiating the caspase cascade) is not the main trigger of the phenotypes we report (cristae dysregulation and fatty acid oxidation disruption), however, cells without a functional cristae and/or defects in FAO, may not be able to survive long-term. Thus, QVD treatment preserves these cells that may not survive the dismantling of such an essential structure. To confirm this, we performed immunofluorescence of cleaved caspase 3 (__Figure 5 for reviewers). These results show that, indeed, MCL-1 inhibition at the time points of our study doesn't result in increased Caspase-3 activation. We reported similar results of MCL-1 inhibition in oligodendrocyte precursor cells (Gil and Hanna et al., Glia, 2025, PMID: 41420072).
  
  The authors show no data on the viability of the cells in response to the MCL-1 inhibitor. To exclude secondary effects of the inhibitor, at least some of the results should be validated with an MCL-1 knock down.
  
  We will include this experiment in our revised manuscript. To check the effects of MCL-1 knockdown on TBR2 positive cells, we tested 5 different ASOs for MCL-1. Knockdown efficiency with ASOs was very low (on average In Figure 1, the authors show immunofluorescence data of mitochondria and nucleus staining and conclude that MCL-1 inhibition alters mitochondrial morphology. Based on the images shown in Fig. 1, I do not think that individual mitochondria can be segmentd to measure their volume and length. In addition, some metrics such as mitochondrial content are not explained in the text or methods.
  
  We can achieve mitochondrial segmentation with a SoRa Spinning Disk Confocal Microscope, which has a lateral (XY) resolution of approximately 120 nm to 150 nm and an axial (Z) resolution of approximately 300 nm-320 nm. All images are first denoised prior to sharpening using the Richardson-Lucy deconvolution algorithm. Additionally, the FIB-SEM data are consistent with the IF data (both show increase in mitochondrial volume and surface area).
  
  We agree with Reviewer 3 that we need to explain some metrics in the revised version. We will specify the meaning of mitochondrial content (count of all mitochondria in FOV, not normalized to Hoechst).
  
  In Fig. 2 B-D, the authors show TEM and FIB-SEM imaging to demonstrate alterations in the cristae architecture upon treatment with MCL-1 inhibitor. However, based on the images shown, it looks that cristae area and density is reduced under S63845 treatment in TEM images, while the FIB-SEM data come to the opposite conclusion. In addition, the quantification of cristae volume quantified as cristae volume in percentage is unclear to me.
  
  We apologize for the confusion. No conclusions about the cristae area and density were made using the TEM data, because TEM data represent a single snapshot section of a mitochondrion without a discernible orientation. Cristae from TEM were described as "aberrant" and preliminarily revealed changes in cristae and were followed up with FIB-SEM, 3D reconstruction of intact mitochondria, and quantification of volume.
  
  In the new version of the manuscript, we will specify that the cristae volume is normalized to the volume of its respective mitochondria (i.e., how much of the mitochondrial volume is attributed to cristae).
  
  The change in CPT1/2 protein levels (Fig. 4) is interesting but does not directly proof that fatty acid oxidation is altered, as concluded by the authors. For this, the authors would need to directly measure fatty acid oxidation for example using Seahorse or metabolic tracing experiments. Also, to prove that the MCL-1 inhibition affects neural differentiation through fatty acid oxidation, a rescue experiment should be performed through CPT1 overexpression.
  
  We agreed that this is an important point. We have optimized the fatty acid oxidation test using Seahorse and will make sure to include it in the revised version of the manuscript.
  
  In Figure 6, the authors show decreased intermediate progenitor cells after MCL-1 inhibition by immunofluorescence staining. I am not convinced that this can be concluded from the data shown, since the concentration of intermediate progenitor cells is very close to the noise levels. Since the MCL-1 treated cells look much less sparse, I don't think the percentages can be compared (total counts are between 2-20). Although this data might give some indication that differentiation could be impaired, the measured effect could be very well due to lower viability of the cells. The authors need to control for this or come up with a different method for measuring differentiation.
  
  The number of TBR2 is low, but we disagree with the reviewer's assessment of noise levels. We focused on cells expressing only TBR2 and rigorously examined this population of cells. The percentages are compared to account for the lower density of the MCL-1i-treated cultures, as the IPC counts are normalized to the Hoechst total cell count within the FOV. Moreover, the immunofluorescence images are complemented with RT-qPCR, which shows significant downregulation of EOMES (gene encoding TBR2).
  
  Figure 7 is missing quantification
  
  We will include this quantification in the revised version of the manuscript.
  
  Reviewer #3 (Significance (Required)):
  
  General assessment____: The manuscript reports an interesting finding, which suggest a non-canonical role of MCL-1 in mitochondrial remodeling, regulation of fatty acid oxidation and neuronal fate. While this finding would be highly interesting and relevant, the presented data do not sufficiently support this conclusion. Further experiments would have to be performed to proof causality. ____ Advance: Should the authors manage to proof their hypothesis by additional experiments, this would indeed advance the field on mitochondrial remodeling and its effect on neuronal differentiation by
  
  identifying a novel molecular player. ____ Audience: mitochondrial biology, cell biology, developmental neuroscience Own expertise: mitochondrial biology, cell biology, advanced imaging techniques
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2025.12.12.694056
www.biorxiv.org www.biorxiv.org

Specific Sensitivity to Rare and Extreme Events: Quasi-Complete Black Swan Avoidance vs Partial Jackpot Seeking in Rat Decision-Making

1
1. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  In this manuscript, the authors investigate the impact of rare and extreme events on rodents' decisionmaking under risk, in gain and loss contexts. They describe the behavior of 20 rats performing a four-armed bandit task, where probabilistic gains (sugar pellets) and losses (time-out punishments) can - in some arms - incorporate extremely large - but rare - outcomes. They report that most rats are sensitive to rare and extreme outcomes despite their infrequent occurrence, and that this sensitivity is primarily driven by extreme loss events which they try to avoid, rather than extreme gains that they seek to obtain.
  
  They finally propose a modification of standard reinforcement-learning, which features a specific sensitivity to rare and extreme outcomes and can account for the observed behavior.
  
  Strengths:
  
  The manuscript really taps into a surprisingly neglected but very relevant aspect of decision-making: the effect of rare and extreme events (REE). The authors have developed an experimental setup that seemingly allows investigation of this aspect, which is not trivial given the idiosyncratic properties of rare and extreme events.
  
  The parameters of the experimental setup seem also to be well thought off: basically, in the absence of REE, some options are objectively better than others (because, in expectation, they overall deliver more food, or minimize time-out punishments), but this ordering reverses if REE are taken into account. This allows for a clean test of the integration of REE in the rodent's decision-making model.
  
  The data is presented and analyzed in a very descriptive but exhaustive and transparent way, down to the description of individual rodent's behavior.
  
  Weaknesses:
  
  While the description and analyses of the behavioral patterns are rigorously done under the economic lens of risky decision-making, the authors' interpretation heavily relies on the assumption that rodents have built the correct model of the task during the training. Extensive details are provided about the training procedure, and the observed behavior at the end of the training, but it remains virtually impossible to disambiguate choices due to imperfect learning to choices made due to intrinsic preferences for risk or REE.
  
  As detailed in Material and Methods, the animals were progressively overtrained following standard behavioral procedures. During this process, they experienced all available options, including both positive and negative REE. We assume that repeated exposure to these REE supported learning, as would be expected for any event occurring throughout such an extended training phase. The rats ultimately displayed an asymmetric pattern of choices: they consistently avoided the Black Swan, indicating that they had learned its negative consequences, yet they did not systematically seek the Jackpot. If their behavior were driven solely by incomplete learning or by an inherent preference for risk or REE, we would expect to see the opposite pattern systematic Jackpot seeking or inconsistent avoidance of the Black Swan.
  
  By nature, gains (food pellets) and losses (time-out punishments) are somewhat incommensurable so the interpretation of the asymmetry due to outcome valence is also subject to interpretation. There might be some additional subtleties due e.g. satiety that could come from gaining REE (i.e. the delivery of 80 pellets from the Jackpot).
  
  As described in Material and Methods, we used mouse pellets (20 mg) instead of rat pellets (45 mg) to prevent satiety during Jackpot delivery (80 pellets). We also selected gains (sweet pellets) and losses (delays) that we have successfully used in previous rat decision-making paradigms, such as the rat gambling task (Adams et al., 2017; doi: 10.1523/ENEURO.0094-17) and the loss-chasing task (Breysse et al., 2021; doi: 10.1111/ejn.14895). Notably, if the Jackpot induced satiety, one would expect animals to stop seeking it yet this was not systematically observed. Nonetheless, we added a sentence to the Discussion on page 18 of the manuscript to acknowledge that we cannot fully exclude the possibility that satiety contributed to the lack of systematic Jackpot Seeking.
  
  In its current form, the paper is quite hard to digest. This is naturally the case with interdisciplinary work (here mixing economists and neurobiologists). But I am afraid that with the current frame, the paper is going to miss its target, in terms of audience.
  
  We have rewritten entirely and the english was corrected thanks to ChatGPT. We hope that the paper is now easier to digest.
  
  The proposed model seems somewhat disconnected from the behavioral patterns: while the model suggests an effect of REE at the decision stage (i.e. with specific decision weights for those rare events), this formalism seems at odds with the observation that REE (notably in the loss domain) has an impact of subsequent behavior - (Black Swans tend to reinforce Total Sensitivity to REE) which rather suggests an effect at the learning stage.
  
  We agree with the referee that this may appear surprising at first glance. However, we would first like to emphasize that the general model allows REE to influence learning—that is, to contribute to the updating of the Q-subvalues. Moreover, even when REE are incorporated only as decision weights, as is the case for most rats, this does not imply that REE are unimportant during learning. In fact, the model assumes that REE are learned once and for all when they first occur during a trial of the corresponding option. Unreported simulation exercises indicate that a more gradual learning of maximal and minimal values would likely yield similar results.
  
  Second, the Before/After analysis shows that the behavioral response to Black Swans is locally small in terms of both total and one-sided sensitivities. This suggests that such effects are likely too subtle to be captured by this class of models for most rats. We have added this clarification to the revised version (page 17).
  
  Discussion:
  
  This study convincingly demonstrates that REEs are processed rather uniquely, which makes sense given their evolutionary relevance. REE has indeed been somewhat neglected in previous research, and this study therefore opens an interesting new front on the fundamental aspects of decision under risk. The authors have devised an original theoretical and empirical framework that will be useful for the community, and the combination of economics analysis and rodent behavior constitutes a thoughtprovoking ground to think about the nature of risk preferences. The interpretation and mechanistic account of these aspects, as well as their generalizability outside the specific context of this study, remain to be strengthened.
  
  We have modified the discussion to further insist on the translational aspect of the study and its interest for various populations (page 22). We hope that the generalizability is now strengthened.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  This paper attempts to examine how rare, extreme events impact decision-making in rats. The paper used an extensive behavioural study with rats to evaluate how the probability and magnitude of outcomes impact preference. The paper, however, provides limited evidence for the conclusions because the design did not allow for the isolation of the rare, extreme events in choice. There are many confounding factors, including the outcome variance and presence of less-rare, and less-extreme outcomes in the same conditions.
  
  Strengths:
  
  (1) The major strength of the paper is the significant volume of behavioural data with a reasonable sample size of 20 rats.
  
  (2) The paper attempts to examine losses with rats (a notoriously tricky problem with non-human animals) by substituting time-outs as a proxy for losses. This allows for mixed gambles that have both gain and loss possible outcomes.
  
  (3) The paper integrates both a behavioural and a modelling approach to get at the factors that drive decision-making.
  
  (4) The paper takes seriously the question of what it means for an event to be rare, pushing to less frequent outcomes than usually used with non-human animals.
  
  Weaknesses:
  
  (1) The primary issue with this work is that the primary experimental manipulation fails to isolate the rare, extreme events in choice. As I understand the task, in all the conditions with a rare extreme event (e.g., 80 pellets with probability epsilon), there is also a less-rare, less-extreme event (e.g., 12 pellets with probability 5). In addition, the variance differs between the two conditions. So, any impact attributable to the rare, extreme event could be due to the less rare event or due difference in the variance. The design does not support the conclusions. Finally, by deliberately confounding rarity and extremity, the design does not allow for assessing the impact of either aspect.
  
  We agree with the referee that both the REE and the rare (≈10% frequency) but non-extreme outcomes are present in the relevant options. However, the rare but non-extreme reward is not large enough to make the convex option attractive and to shift choice away from the concave option. In other words, unlike REE, these outcomes do not reverse stochastic dominance in our design (as noted in Material and Methods). We have explored modified designs for human subjects in which the rare but non-extreme outcomes are removed. Preliminary results indicate that the behavioral phenotypes observed in rats also emerge in humans under these modified conditions, suggesting that REE are the primary drivers. We have added a statement to the Discussion (page 22) to clarify this point.
  
  We elaborate further in our response to point (3) below on why analyses based solely on variance are insufficient when dealing with REE. To clarify the role of rare and extreme outcomes in distinguishing convex from concave options, we provide two new columns to Table 2 in the Materials and Methods, in our reply to point (3).
  
  Finally, although a detailed analysis of rare but non-extreme outcomes lies outside the scope of this paper, the symmetric treatment of extreme and frequent outcomes can be addressed straightforwardly using strong First-Order Stochastic Dominance. Classical decision-theoretic approaches indeed satisfy this property.
  
  (2) The RL-modelling work also fails to show a specific impact of the rare extreme event. As best as I can understand Eq 2, the model provides a free parameter that adds a bonus to the value of either the two options with high-variance gains (A and V in the paper) or to the two options with high-variance losses (F and V in the paper). This parameter only depends on whether this option could have possibly yielded the rare, extreme outcome (i.e., based on the generative probability) and was not connected to its actual appearance. That makes it a free parameter that just bumps up (or down) the probability of selecting a pair of options. In the case of the "black swan" or high-variance loss conditions, this seems very much like a loss aversion parameter, but an additive one instead of a multiplicative one.
  
  We agree with the referee that the additional parameters, compared to more standard Q-learning models, specifically capture the fact that some options deliver REE while others do not. In our estimation procedure, these parameters become nonzero as soon as REE are observed for the first time for a given option. Therefore, the first step is to estimate a baseline nested model in which REEs contribute only at the learning stage (i.e., they affect the updating of Q-subvalues), while the additional parameters are constrained to zero. The next step is to compare alternative models against this baseline, allowing REEs to enter through the additional parameters. In this respect, our specification is parsimonious, especially given that very little is known about REEs in computational neuroscience. More structural modeling is certainly a promising direction for future research, and this paper constitutes a first step toward that goal.
  
  We provide the BIC, in addition to the AIC, to account for the presence of additional parameters in model selection and to ensure that the observed improvement in fit is not merely driven by their inclusion.
  
  Unlike most of the existing literature, our results extend the notion of loss aversion to extreme losses. The negative decision weight on options yielding the Black Swan can be interpreted as a differential treatment of negative REE, an issue we discuss extensively in the Discussion (page 20).
  
  (3) The paper presented the methods and results with lots of neologisms and fairly obscure jargon (e.g., fragility, total REE sensitivity). That made it very hard to decipher exactly what was done and what was found. For example, on p. 4, the use of concave and convex was very hard to decipher; the text even has to repeat itself 3 times (i.e., "to repeat" and "in other words") and is still not clear. It would be much clearer (and probably accurate) to say that the options varied along the variance dimension, separately for gains and losses. Option A was low-variance gains and losses. Option B was low-variance losses and high-variance gains. Option C was high-variance losses and low-variance gains, and Option D was high-variance losses and gains. That tells much more clearly what the animals experienced without the reader having to master a set of new terminologies around fragility and robustness, which brings a set of theoretical assumptions unnecessarily into the description of the experimental design. In terms of results, "Black Swan" avoidance is more simply known as risk aversion for losses.
  
  Because our experimental design focuses on REE, outcomes cannot be summarized only by their variance. This is well known from the large literature on so-called fat-tailed statistical distributions. Unlike the Normal distribution that is entirely characterized by its expected value and variance, fat-tailed distributions have nonzero kurtosis. This implies that a fat-tailed distribution (e.g. exponential) with the same expected value and variance as the Normal differs importantly by possessing extreme values that are much more likely in terms of frequency. To illustrate, if the distribution of pellets was assumed to be Normal with expected value set at 3.89 and variance set at 9.37 as for the convex option, the probability of getting 80 pellets would be about 2.10<sup>-16</sup>, practically zero. In contrast, this probability is smaller than, but close to 1% in our design.
  
  In Material and Methods, we clearly explain how our novel approach in terms of convexity relates to the moments of the reward distributions, including but not limited to the variance. To clarify further, we provide two new tables (Author response table 2 and Author response table 3) to be compared to Table 2 of the manuscript in which we report the first four moments (mean, standard deviation, skewness and kurtosis) of the full concave and convex gain distributions, reproduced for convenience
  
  Author response table 1.
  
  In Author response table 2 we report the first four moments when REE are truncated. Comparing convex and concave gains shows that the convex option has a smaller but still close mean compared to the concave option. In contrast, the former has larger variance, skewness and kurtosis compared to the latter. Therefore, interpreting choosing the convex option as reflecting “preference” for variance is at best incomplete.
  
  Author response table 2.
  
  First four moments of concave and convex gains when REE are removed
  
  Author response table 2 further shows that REE alone goes a long way towards explaining the differences between convex and concave options in terms of the first four moments: removing the rare and extreme value results in the concave option having now a larger mean, while the convex option still has larger variance, skewness, and kurtosis but by a smaller margin.
  
  In Author response table 3 we report the first four moments when both RE and REE are truncated, which shows that the convex and concave options differ only with respect to their mean (which is here also larger for concave).
  
  Author response table 3.
  
  First four moments of concave and convex gains when both RE and REE are removed
  
  In addition, our focus on REE implies that we go beyond mean-variance preferences that apply mostly to Gaussian distributions. It is not clear theoretically what type of utility functions would reflect preferences that combine a taste for variance, skewness and kurtosis, even though all those moments affect expected utility. See for example Phelps, C.E. “A user’s guide to economic utility functions”. J Risk Uncertain 69, 235–280 (2024) for a recent overview (on page 242, Phelps states that “In situations where risk is not normally distributed, it is ill-advised to ignore statistical parameters beyond variance, unless the deviations from normality are relatively small”).
  
  More importantly, our proposed measure of the convexity of the reward distributions, the Jensen gap, further reveals how even restricting the analysis to the first four moments is incomplete in the sense that it fails to characterize the difference between options: the fifth moment of the concave contributes more the Jensen gap than even kurtosis, while one needs to look at much higher moments to find significant contributions to the Jensen gap for the convex option. In that sense, there is no reason to restrict the analysis to variance, and even to skewness and kurtosis, to compare options, in general and in our particular setup as well. Note that introducing REE would result in convex distributions even in simplified designs, e.g. with 3-value support. Studying REE implies the need to look beyond variance, and our proposal is to use the Jensen gap as a measure of convexity. In the Material and Methods section of the paper, we did not develop an in depth analysis of Jensen gap so as to spare the reader confronted with an already rather technical paper.
  
  We thank the referee for raising the issue of whether variance is a simpler explanation of our results. To keep the main text as short as possible, we chose to refrain from adding technical complexity. We hope we made clear in our reply that the analysis cannot be restricted to variance when studying REE. We believe that Jensen gap is a useful notion in this regard. As our replies will be made publicly available, we chose not to integrate the above discussion in the main text.
  
  (4) Were the probabilities shuffled or truly random (seem to be fixed sequences, so neither)? What were the experienced probabilities? Given the fixed sequences, these experienced ("ex-post") probabilities, could differ tremendously from the scheduled ("ex ante") probabilities. It's quite possible that an animal never experienced the rare, extreme event for a specific option. It's even possible (if they only picked it on the 10th/60th choices by chance), that they only ever experienced that rare extreme event. This cannot be known given the information provided. The Supplemental info on p.55 only gives gross overall numbers but does not indicate what the rats experienced for each choice/option-which is what matters here. A simple table that indicates for each of the 4 options, how often they were selected, and how often the animals experienced each of the 6-8 possible outcome would make it much clearer how closely the experience matched the planned outcomes. In addition, by restricting the rare outcome to either the 10th or 60th activations in a session, these are not random. Did the animals learn this association?
  
  Probabilities are not random and a limited number of fixed sequences has been used, as stated in Material and Methods. We have chosen sequences that satisfy our assumptions about ex-post stochastic dominance reversal of convex over concave options when REE are added. We have added in Table S4 the choice frequencies for all four options. If the animals had learnt the 10th and 60th activation, they would exhibit a strategy in their choice that would tend to be more optimized than what is observed. For example, the options offering the possibility to obtain the Jackpot are not optimal in terms of gains for the frequent events, therefore the animals should tend to select these options only around the 10th and 60th choice. Most of their other choices should favor the options delivering the larger gains in the frequent domain. This is not what is observed. We have added this important point in the discussion (page 18).
  
  (5) The choice data are only presented in an overprocessed fashion with a sum and a difference (in both figures and tables). The basic datum (probability/frequency of selecting each of the 4 options) is not provided directly, even if it can theoretically be inferred from the sum and the difference. To understand what the rats actually do, we first need to see how often they select each option, without these transformations.
  
  As described in Material and Methods, the 4 options are combinations of 2 convex and concave sub-options for gains and losses, which is why our analysis of the behavioral data focuses on convexityrelated total and one-sided sensitivities to REE. The third dimension needed to fully characterize rats’ behavior is simply 1−ff<sub>FF</sub>, the fraction of non-Fragile choices. In addition, we also provide in Table S4 of the Supplementary Material an alternative interpretation in terms of Black Swan Avoidance and Jackpot Seeking. We have added in Table S4 the choice frequencies for all four options. Finally, all the raw data will be made available with open access and no access codes.
  
  (6) There is insufficient detail provided on the inferential statistical tests (e.g., no degrees of freedom or effect sizes), and only limited information on exactly what tests were run and how (bootstrapping, but little detail). Without code or data (only summary information is provided in the supplement), this is difficult to evaluate. In addition, the studies seem not to be pre-registered in any way, leaving many researchers with degrees of freedom. Were any alternative analysis pipelines attempted? Similarly, there were many sub-groupings of the animals, and then comparisons between them - were these post-hoc?
  
  We understand the concern of the referee for pre-registration of the referee, as an epistemic safeguard to make empirical claims more falsifiable, more transparent, and less dependent on post hoc rationalization. But the contemporary push for preregistration is often presented as an “epistemic improvement,” but in practice it functions largely as a norm of moral regulation, not a scientific necessity. The rhetoric is moralistic: preregistered research is “clean,” “transparent,” “credible,” while non-preregistered work is viewed with suspicion—even when the methodology is sound. This language is not epistemologically neutral; it enforces ought to be done, irrespective of the diversity of legitimate scientific practices.
  
  From a philosophy of science perspective, this is historically and conceptually problematic. Scientific progress has never followed a uniform, rule-based method. As e.g. Feyerabend has argued, major discoveries have emerged precisely because researchers were not bound by predetermined plans: they followed anomalies, improvised, reinterpreted data, and revised methods and hypotheses in light of new evidence — practices that a rigid preregistration ethos can suppress and that are not aligned with how genuine discovery often occurs.
  
  Even from a statistical standpoint, preregistration is far from a panacea. It reduces some degrees of freedom (mainly in confirmatory statistics), but it does not eliminate flexibility; researchers can still choose models, transformations, exclusion rules, stopping rules, etc. And more importantly: reducing flexibility is not inherently epistemically virtuous. Flexibility is often necessary to understand data properly—especially in new paradigms or first-of-their-kind experiments, which is the case for this study. Science needs exploration, opportunism, and theoretical plasticity. Preregistration is compatible with these only if it is treated as one optional tool among many—not as a universal evaluative standard.
  
  As the referee pointed out, this study “taps into a surprisingly neglected but very relevant aspect of decision-making.” Our work is therefore mainly exploratory: the experimental paradigm reveals new behavioral patterns in how rats cope with rare and extreme events, and much of our analysis is necessarily descriptive. We conduct formal inference only where it is methodologically appropriate — the short-term behavioral response to rare events (for which we now provide more details in the Material & methods section p.35) and the estimation of augmented Q-learning models, which follow a standard econometric approach (documented in the Material & Method section–see also our response to recommendation 4). These inferential results support the descriptive patterns that motivate this new line of research.
  
  (7) On p. 17, there is an attempt to look at the impact of a rare, extreme event by plotting a measure of preference for the 10 trials before/after the rare, extreme event. In the human literature, the main impact of experiencing a rare, extreme event is what is known as the wavy recency effect (See Plonsky et al. 2015 in Psych Review for example). What this means is that there tends to be some immediate negative recency (e.g., avoiding a rare gain) followed by positive recency (e.g., chasing the rare gain). Using a 10-trial window would thus obscure any impact of this rare, extreme event. An analysis that looks at a time course trial-by-trial could reveal any impact.
  
  We thank the referee for drawing our attention to the wavy recency effect documented in human experiments. We have added the corresponding reference in the Discussion (page 20). Regarding rats, the Before/After analysis reported in the paper suggests that there is no sizeable immediate recency effect for Jackpots. Even for Black Swans, the immediate recency effect we report remains modest when using a 10-trial window, and the analysis of the choice immediately following a REE does not show evidence of immediate negative recency. This casts doubt on the presence of such an effect in rats.
  
  (8) As I understood the method (p. 31), the assignment of options to physical locations was not random or counterbalanced, but deliberately biased to have one of the options in the preferred location. This would seem to create a bias towards a particular option and a bias away from the other options, which confounds the preference data in subsequent analyses.
  
  We agree that the design incorporated an intentional bias toward the anti-fragile option as a proof of concept. Nevertheless, Figure 8 demonstrates that animals substantially altered their choices between training and final testing, with a median change of approximately 35% across sessions. This indicates that behavior was driven by the structure of possible outcomes rather than by a stereotyped location-based preference.
  
  (9) Are delays really losses? This is a big assumption. Magnitude and delay are different aspects of experience, which are not necessarily commensurable and can be manipulated independently. And, for the model, how were these delays transformed into outcomes for the model? Eq 1 skips over that. Is there an assumption of linearity? In addition, I was not wholly clear if the delays meant fewer trials in a session or if the delays merely extended the session and meant longer delays until the next choice period.
  
  Consistent with established rodent decision-making paradigms (Adams et al., 2017 doi: 10.1523/ENEURO.0094-17; Breysse et al., 2021 doi: 10.1111/ejn.14895), we employed sweet pellets as gains and imposed delays as losses. Delays are operationalized as losses because they preclude the animal from engaging in reward-generating behavior; thus, increasing the delay duration proportionally increases the magnitude of the opportunity cost.
  
  (10) The paper does not sufficiently accurately represent the existing literature on human risky decision-making (with and without rare events). Here are a few examples of misrepresented and/or missing literature:
  
  Most studies on decision-making do not only rely on p > 10% (as per p. 2). Maybe that is true with animals, but not a fair statement generally. Some do, and some don't. There is substantial literature looking at rarer events in both descriptions (most famously with Kahneman & Tversky's work), but also in experience (which is alluded to in reference 19). That reference is not only about the situation when choices are not repeated (e.g. the sampling paradigm), but also partial feedback and full-feedback situations.
  
  We have corrected that statement in the main text (page 3) and we thank the referee for pointing this out.
  
  The literature on learning from rewarding experiences in humans is obliquely referenced but not really incorporated. In short, there are two main findings - firstly people underweight rare events in experience; second, people overweight extreme outcomes in experience (both contrary to description). Some related papers are cited, but their content is not used or incorporated into the logic of the manuscript.
  
  One recent study systematically examined rarity and extremity in human risky decision-making, which seems very relevant here: Mason et al. (2024). Rare and extreme outcomes in risky choice. Psychonomic Bulletin & Review, 31, 1301-1308.
  
  There is a fair bit of research on the human perception of the risk of rare events (including from experience) and important events like climate. One notable paper is Newell et al (2015) in Nature Climate Change.
  
  We agree with the referee that the related literature on REE in animal Decision Making is scant and that it is more developed in humans. We thank the referee for pointing at Mason et al. (2024), who clarify where the literature on humans stands and why combining rarity and extremity, as we also do, is important and highly relevant. We have added a new statement and references in the Introduction and Discussion (pages 3, 20, 22).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations For The Authors):
  
  (1) As said above, I think the manuscript would really benefit from a rewriting, to replace some technical terms with more readable ones, and maybe rebalance the focus from the current focus on the framework (heavily loaded with economics concepts, which will be hard to digest for the eLife readership) to a higher weight on information that is critical to understand and interpret the behavior (e.g. information about training & training behavior, etc.).
  
  We have revised the entire manuscript to improve readability and have clarified in the main text: (1) why convexity of exposures to REE could, beyond variance, be useful for experiments in other settings that our own; (2) why the associated notion of antifragility may be applicable to other settings and therefore of broader interest; (3) what was done in the training sessions compared to the final sessions.
  
  (2) From Figure 8, it seems that rodent behavior is more clustered after the training (i.e. before the sessions) than after the sessions. Could that be a sign of imperfect learning?
  
  Figure 8 mostly suggests that there is some flexibility in the choices made and that the intended initial bias towards the antifragile choice in the design of the task could be over ridden by the rats.
  
  (3) The modelling section seems incomplete. I think the authors want to tease apart where REE enters the model and should propose an alternative where REE affects the learning rather than the decision.
  
  In fact, the general model allows REE to have an effect at the learning stage only (i.e. to contribute to the updating of the Q subvalues), when the specific decision weights attached to options delivering REE are both zero. However, our analysis shows that such a model is rejected by the behavioral data for all rats. We have clarified this point in the revised version.
  
  (4) Also, parameter and model recovery exercises seem mandatory (Wilson & Collins, 2019).
  
  We thank the referee for highlighting this valuable reference in computational modeling, particularly in the context of model identification and estimation in computational biology. In the present research, we adopted an econometric perspective on model identification—especially with regard to the integration of Q-values for gains and losses. The softmax choice function is formally equivalent to a multinomial logit model, and as is well known in econometrics, identification in such models presents non-trivial challenges. The standard approach in classical Q-learning is to multiply the Q-value by an inverse temperature parameter (also known as a precision parameter in random utility models). When extending the model to include separate Q-values for gains and losses, specifying the model in an identifiable way becomes more complex.
  
  To address this issue, we considered several alternative model specifications and conducted grid-based estimation of starting parameter values. This approach allowed us to examine the shape of the loglikelihood function and assess whether the parameters are globally identified, rather than only identifiable up to a linear combination. We found that the most parsimonious and empirically identified specification in our experimental paradigm is one in which Q-values for gains and losses are summed, each weighted by distinct decision weights (see our Equation 2 in the paper).
  
  The inclusion of decision weights for REE for each option (Equation 2) is then structurally equivalent to introducing constant terms in a logit model. The identification of these parameters follows standard econometric results on discrete choice models (e.g., Davidson & MacKinnon, 2003): since we model choices among four options, three free parameters can be estimated, leaving one degree of freedom in the specification. As mentioned in the "Modelling and Statistical Analysis" section, we further guarded against the presence of local maxima by applying a two-step estimation procedure, combining two optimization algorithms with multiple sets of starting values for the baseline model (i.e., the model without decision weights for REE). We also tested the addition of a global optimization method— simulated annealing—but found that it did not significantly improve upon our two-step procedure. This is not surprising, as our preliminary investigation of model identification, based on grid searches over starting parameter values, confirmed that all parameters were identified in our simple specification. Our intuition is that simulated annealing may yield different estimates than gradientbased methods primarily in cases where the model is not theoretically identified—suggesting that the need for such global optimization techniques can be indicative of underlying identification issues in Qlearning models.
  
  Regarding model comparison, we have used penalized information criteria to account for additional parameters. Although we do not report confusion or inversion matrices for our nested models, we verified that the estimated models replicate observed behaviors across all phenotypes, as shown in the main text (see bottom left panel of Figure 5 for the Total and One-Sided sensitivities). Most importantly, we conducted 100 additional simulations of 40 artificial sessions for each phenotype using the “winning” models and the median fitted parameters. These simulated rats—playing the task 100 times over 40 sessions—offer strong evidence that the selected models are valid: they quantitatively capture the behavior of all phenotypes in terms of our key metrics, Total and One-Sided sensitivities (see bottom right panel of Figure 5).
  
  Taken together, this methodical econometric approach to model specification and estimation gives us strong confidence in the identification and robustness of our model. Overall, while Wilson & Collins (2019) provide an interesting framework for model estimation in computational biology, we believe that a more formal theoretical analysis of model identification in Q-learning models would be a valuable addition to the field—though it lies beyond the scope of the present work. In our view, computational biologists should complement simulation-based validation and empirical fit with formal methods for assessing theoretical identifiability, particularly when estimating complex choice models.
  
  Davidson, R. and J.G. MacKinnon (2003) Econometric Theory and Methods. Oxford University Press (New York).
  
  Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547
  
  Reviewer #2 (Recommendations For The Authors):
  
  (1) The paper confuses risk sensitivity and exploration in the opening lines. These are not the same.
  
  What we have in mind here is that uncertainty about outcomes is one of the main drivers of exploration, in the sense that there would be no need to explore in a counterfactual world with deterministic gains and losses. We have modified the opening lines of the paper to better reflect this dimension (page 2).
  
  (2) p. 9. "awfully long" is an unnecessary descriptor. Descriptions of methods should be more factual.
  
  The manuscript has been entirely rewritten.
  
  (3) p. 13. Most points lie on the left of the square (not right?).
  
  We thank the referee for pointing at this typo, that is now corrected in the text (page 8).
  
  (4) p. 13. Last line. "obviously" is patronizing to the readers.
  
  The manuscript has been entirely modified to address related points.
  
  (5) p. 23. The avoidance of black swans by not choosing that option sounds like a hot-stove effect (see Denrell & March, 2001). Is this evidenced here?
  
  To the best of our knowledge, the statement that “people tend to avoid activities they have had a negative experience of, resulting in a negativity bias” (from Jerker Denrell’s website) does not explicitly concern REE. Instead, it appears to refer broadly to reinforcement learning mechanisms driven by negative outcomes, irrespective of their magnitude or frequency. In our task, animals encounter both negative rare events (RE) and negative rare and extreme events (REE; Black Swans). Notably, the task design does not allow rats to completely avoid negative RE unless they cease performing the task altogether—a pattern typically seen in paradigms involving aversive stimuli such as electric foot shocks. The fact that all 20 rats maintained stable performance across the 41 sessions provides evidence against a pronounced hot-stove effect. This point has been incorporated into the revised discussion (page 20).
  
  (6) "menus" is an odd term. Better described as reward schedules?
  
  “Menu” has been replaced by “option” in the main text.
  
  (7) Why are they 20-minute sessions? I thought it was 120 trials per session? And 41 sessions? Or was this only in training?
  
  Each session ended after 20 minutes had elapsed, which led to approximately 120 trials (but not systematically). The choice of 20 minutes was made in order to limit the number of trials to prevent satiety. The total number of sessions ran with all 20 animals for the final testing was 41, an odd number but there was no justification to remove one session from the analysis. The training was much longer and is not included in the 41 sessions.
  
  (8) Really not clear why these Jensen inequalities were relevant or even calculated for these options? How is it relevant to what animals chose or experienced? They seem to be based on the generative probabilities for different options, which is not what happened in reality.
  
  We propose the Jensen gap as a general measure of convexity that relates to all moments of the probability distribution, as described in more detail in our answer to point (3) above. As such, we think it is a characterization of options with stochastic outcomes that could prove useful to other experimenters in alternative settings beyond our own.
  
  (9) Only some summary data in supplemental materials. No open data or code for recreating the experiment or analyzing the data.
  
  The data is available on Github (see page 38) and the code will be available upon request.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2021.11.01.466806v4
www.biorxiv.org www.biorxiv.org

Genotype-phenotype correlations and de novo induction of cancer stem cells in Wilms tumor initiation.

1
1. EMBOpress 02 Jun 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  We thank the reviewers for their time and constructive comments on our manuscript.
  
  Reviewer #1 recognizes the importance of the question we address (namely, the early consequences of Wilms' tumour inducing mutations on kidney development in two models for different Wilms' tumour initiating mutations) and provides useful suggestions for improvement of the manuscript.
  
  Reviewer #2 raises the concern regarding the novelty of the study. We appreciate these comments and this implies the necessity of mainly textual changes we have to do to highlight the novel aspects of our study and findings and their significance in the revision of the manuscript.
  
  Reviewer #3 offers a generally positive assessment of the data, while suggesting that the work may be interpreted primarily from a developmental perspective rather than a Wilms' tumour-focused one. In the revision there is need to better emphasize how these perspectives are closely interconnected in the context of Wilms' tumour biology.
  
  Reviewer #1 (Evidence, reproducibility and clarity (Required)):
  
  This manuscript addresses an important gap in Wilms tumor (WT) biology: what are the earliest pathogenic events following WT driver mutation induction, and how do these early developmental trajectories differ across genotypes? The authors provide a carefully staged and comparative analysis of two WT-associated genetic contexts-conditional Wt1 loss (using lineage-specific Cre drivers targeting nephrogenic (Six2-Cre) versus stromal (Foxd1-Cre) compartments, as well as a temporally controlled Wt1CreERT2 model targeting both lineages upon tamoxifen induction) and inducible LIN28B overexpression, and relate the resulting developmental phenotypes to two CSC marker paradigms derived from patient-based studies. A major strength is the precise, time-resolved description of the earliest initiating phenotypes (E12.5 and E18.5, with additional postnatal analysis for LIN28B) and the direct side-by-side comparison of how each genotype perturbs nephrogenesis. The authors conclude that Wt1 loss (especially in the nephrogenic lineage) leads to a severe developmental block accompanied by a disturbance of lineage identity ("lineage confusion"), whereas LIN28B overexpression causes a disturbed transition between uninduced and induced nephron progenitor cell (NPC) states, producing blastemal-like regions that persist postnatally. Using immunostaining for NCAM1, SIX2, CITED1, and ALDH1A2, the authors map marker combinations during normal kidney development and across mutant contexts, and propose that tumor-initiating alterations, most clearly in the LIN28B model, and more suggestively in the Wt1CreERT2 (Wt1CE) context, promote the emergence of a CSC-like population inferred to co-express all four markers (NCAM1+SIX2+CITED1+ALDH1A2+), a state not observed in normal kidneys.
  
  We thank Reviewer #1 for this correct and complete summary of our manuscript. This reviewer recognizes the current gap in our understanding of the origins of Wilms' tumors and appreciates the approach we have chosen to start filling this gap using two different mouse models.
  
  Overall, this study provides a particularly clear direct comparison of the earliest tumor-initiating events triggered by distinct WT-relevant driver alterations. While the manuscript does not yet offer a detailed molecular mechanistic framework explaining why these two mutations produce such divergent developmental and marker-state outcomes (which would further strengthen the work), the careful comparison and the conclusions drawn from it are meaningful and make an important contribution to our understanding of the developmental processes that can lead to Wilms tumor initiation.
  
  We thank this reviewer for recognizing the importance of a direct comparison of the early consequences of two different Wilms' tumour mutations. We agree we do not yet provide a mechanistic framework for these differences. Although these studies are on-going, they are outside the scope of this manuscript.
  
  *Major comment: 1. A central and highly emphasized conclusion of this manuscript is that tumor-initiating alterations induce a CSC-like population co-expressing all four markers (NCAM1, SIX2, CITED1, and ALDH1A2), and that this state is not observed during normal kidney development. Because this "quadruple-positive" population is a key mechanistic take-home message and closely linked to the overall conceptual model, the manuscript would be substantially strengthened by a direct, same-cell demonstration of co-expression of all four markers, rather than inference from consecutive sections. The authors state that they were unable to do so due to a technical limitation, namely, antibody host-species constraints that prevent co-detection of CITED1 and ALDH1A2 within the same section. *
  
  We agree that not being able to show co-expression of all 4 CSC markers is a serious limitation for the interpretation of our data. The reviewer suggests the following alternatives:
  
  *Several feasible approaches could address this limitation for example: - Identify an alternative antibody reagent from a different host species. *
  
  The 'problematic' antibodies are the ones staining for ALDH1A2 and CITED1, which are both Rabbit IgG. Alternative antibodies for ALDH1A2 are all raised in rabbit, so these will not solve this problem. For CITED1 we have now identified a biotin-conjugated antibody which could be used in additional co-staining. We propose to test this antibody for the revision of this manuscript.
  
  *- RNAscope / smFISH for in situ single-cell co-detection. *
  
  We are aware of these techniques as alternative for antibody staining. However, we have no experience with these techniques, nor do we have access to the required technologies. After discussions with collaborators with much experience in this technique, we realized the combination of the potential extensive optimization and costs does not make this a suitable alternative for the limited samples we have available.
  
  *- Single-cell RNA-seq (scRNA-seq) to test whether a bona fide quadruple-positive transcriptional state exists. *
  
  This could be an option but is itself a huge project and therefore outside the scope of this manuscript. We note that the known scarcity in single cell data might still complicate the detection of each marker in individual cells, especially for low-expressed TFs like Six2 and Cited1.
  
  *Overall, resolving this technical limitation would markedly increase confidence in one of the manuscript's most important claims and strengthen the proposed genotype-phenotype/CSC-marker framework
  
  *
  
  _As discussed above, we propose the t_ry the biotin-conjugated CITED1 antibody__
  
  It is somewhat unexpected that the Six2-specific Wt1 deletion appears to produce a more severe phenotype than the tamoxifen-inducible Wt1CreERT2 approach, which is intended to target a broader Wt1-derived lineage (both nephrogenic and stromal). The Discussion offers several plausible, non-mutually exclusive explanations for this observation (e.g., timing, recombination efficiency/mosaicism, and the rescue contribution of "escaping" wild-type cells). It would be helpful to support at least one of these explanations experimentally. For example, the authors could quantify the extent of "escape" (percentage of non-recombined cells within the lineage) across embryos/timepoints to validate that mosaicism is indeed the cause of the milder phenotype. *
  
  We can address this experimentally by making use of the tdTomato Cre reporter that was included in our model which allows us to follow the fate of mutated and non-mutated cells in the lineage. We propose to combine Six2 antibody staining with the tdTomato signal to quantify the percentage of cells that has maintained Six2 expression and is therefore likely an escaping cell/nephron.
  
  Minor comments 1. Please clarify whether the difference shown in Fig. 2C is statistically significant, and report n, error bars/variation, the statistical test used, and p-values (if applicable).
  
  These details will all be added.
  
  The authors note the presence of some SIX2+; tdTomato+ cells in Foxd1GC control kidneys. Given the expected stromal restriction of Foxd1 lineage labeling, please clarify the likely explanation and, if possible, indicate how frequent this is.
  
  *
  
  The reviewer here points to the important question regarding the origin and potential overlap between the stromal and nephrogenic lineages. This is not only an important but highly relevant question for origin and biology of Wilms' tumours, but also for normal kidney development. Kobayashi et al (2014) reported some contribution of the Foxd1 lineage in the Six2 lineage. Also Magella et al (2018) found some signs for this, as did a recent pre-print (Haghighitalab et al. 2026). There is even data suggesting that (part of) the renal stromal is derived from the paraxial instead of intermediate mesoderm in chicken (Guillaume et al. 2009) with some supportive data from mouse development as well (Levinson et al. 2005). The latter is especially interesting given the commonly found ectopic muscle differentiation in WT1-mutant Wilms' tumours (Miyagawa et al. 1998; Schumacher et al. 2003; Gadd et al. 2012). However, if a common, potentially Foxd1+/Six2+ double positive, progenitor exists, it will in the normal developing kidney be present before E11.5 and therefore the data in our current manuscript, or the unpublished scMultiome data, is not informative for this. We propose to discuss this in detail in the Discussion of the manuscript, and speculate on its relevance for Wilms' tumours.
  
  It is somewhat unexpected that Six2-specific Wt1 deletion appears to produce a more severe phenotype than the tamoxifen-inducible Wt1CreERT2 approach, which is intended to target a broader Wt1-derived lineage. The Discussion offers several plausible, non-mutually exclusive explanations (e.g., timing and/or recombination efficiency/mosaicism and the contribution of "escaping" cells). it would be helpful to support at least one of these explanations experimentally, for example by quantifying the extent of "escape" across embryos/timepoints and tamoxifen dosing.
  
  This was addressed above.
  
  *4. A careful proofreading pass is needed to ensure text-figure consistency, particularly for arrow annotations. For example, the Results text refers to "Fig. 1F, arrows," but arrows are not apparent in that panel. Likewise, the Results text mentions a "white filled arrow" in Fig. 2H, whereas the figure appears to show only open arrows. Please align the wording with the annotations actually shown in the figures. *
  
  We apologize for these errors and thank the reviewer for pointing them out. These, and all other textual and graphical errors, will be corrected in the new version of the manuscript.
  
  __Reviewer #1 (Significance (Required)): __
  
  Overall, this study provides a particularly clear direct comparison of the earliest tumor-initiating events triggered by distinct WT-relevant driver alterations. While the manuscript does not yet offer a detailed molecular mechanistic framework explaining why these two mutations produce such divergent developmental and marker-state outcomes (which would further strengthen the work), the careful comparison and the conclusions drawn from it are meaningful and make an important contribution to our understanding of the developmental processes that can lead to Wilms tumor initiation.
  
  We thank the reviewer for this comment, and like to emphasize that this is precisely the scope we intended with the current manuscript.
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  *Wilm's Tumor, a pediatric kidney cancer, is associated with gain or loss of activity of a number of genes including the loss of activity of the nucleic acid binding protein WT1 and gain of activity (enhanced expression at the mRNA level) of the RNA binding protein Lin28 which negatively impacts the maturation of the miicroRNA let-7, elevating levels of let-7 targets. Previous mouse studies have examined the impact of loss of Wt1 throughout within the nephron progenitor and interstitial cell compartments in capping mesenchyme that is thought to be the source of the tumor and of broad elevated expression in all kidney progenitors. *
  
  *In this manuscript, the authors have refined the loss of Wt1 to nephron or stromal progenitors and compared the phenotype to loss of Wt1 in both lineages examining cultured kidneys over a 72 hr period, in addition to uncultured kidneys examined at e18.5. A similar analysis was performed on Lin28 mutants. The analysis itself consisted of video imaging, limited immunostaining and histochemistry. *
  
  Reviewer #2 provides, in our opinion, a very limited overview of the contents of our manuscript. Our work presented here shows:
  
  A detailed analysis of effects of Wt1 loss or activation of LIN28B in the following systems:
  
  5 embryonic kidneys
  
  5 embryo kidneys
  
  P19 postnatal kidneys (for the LIN28B model)
  
  In vitro cultured kidneys.
  
  Time-laps analysis of in vitro cultured kidneys
  
  In the case of the Wt1 knockout this was studied in nephrogenic, stromal, and the combination of nephrogenic and stromal lineages
  
  Whereas our previous work (Berry et al. 2015) focused on different stages of nephron development, we now focus on the different lineages.
  
  For the first time we study the different marker sets for Wilms' tumour cancer stem cells in their developmental context. Important take-home messages for this are:
  
  The two published maker sets behave different in the normal developing kidney, and no cell types or developmental stages exist in the normal developing kidney that expresses all four markers
  
  In contrast, after either of the two Wilms' tumour mutations are induced, we have strong, though not yet conclusive, evidence that this event induces cells that are positive for all four CSC markers, suggesting these quadruple-positive cells could be the functional CSCs. This mutation-dependent appearance of the CSCs would be a complete different mechanism for the origin of CSCs than believed for, for instance, leukemia and colorectal cancer, where an existing cell type with stem- or progenitor cell characteristics which already express the CSC markers picks up the tumour initiating mutations and thus starts behaving as CSC. The cascade our data suggests for the Wilms' tumour CSCs is much more complex.
  
  To our knowledge this is the first direct and side-by-side comparison of the early effects of different Wilms' tumour mutations. This analysis clearly shows the differences in underlying biology for these two situations, and this can have important consequences for interpretation of patients data (which was historically almost always generated without knowing the initiating mutation) and opens the possibility of mutation-specific therapeutic possibilities and requirements. This is funcamentally different from the current patient stratification based on clinical outcome (favorable vs non-favorable histology) or very general molecular markers with clear biological consequences (like chr 1p status).
  
  With respect to the mutation-dependent accumulation of CSC markers, although in both Wt1 and LIN28B models this seems to be happening, for the LIN28B model this seems to be the result of a simple developmental block, whereas for the Wt1 mutants this appears to be a lineage conversion phenotype. This is again something that has to our knowledge never been suggested for the origin of CSCs and even in the context of normal kidney development is almost unprecedented.
  
  We optimize the use of the Wt1CreERT2 driver to target different lineages in the developing kidney using different timepoints for tamoxifen treatment. Not only does this have technical use, it also illustrates the complex role of Wt1 in the earliest stages of kidney development.
  
  Although the data presented are descriptive and do not yet provide a complete molecular mechanism, we believe they offer novel, unexpected and important insights that merit publication. We acknowledge that these aspects may not have been sufficiently clear in the original version of the manuscript, and therefore not being picked up by the reviewer. In response to Reviewer #2 comments, we propose a thorough rewrite of the Discussion of the manuscript to emphasize these aspects more.
  
  *While wholly qualitative and largely observational and descriptive, the limited data are of good quality and the conclusions drawn are reasonable. *
  
  We thank the reviewer for their compliments on the quality and conclusions of the data. While we acknowledge the reviewer's characterization of the study as quantitative and descriptive, we respectfully do not consider this to diminish its suitability for publication. We believe the dataset provides substantial and meaningful insights (definitely not limited), and we have clarified and expanded upon the novel aspects and significance of our findings as outlined above.
  
  *For the Wt1 study, most interesting would be in the loss of Wt1 from the NPC lineage. Clearly, there is already a significant phenotype at the time of study (E12.5) hence there is no strong insight into the earliest effects of Wt1 loss and how this might contribute to tumor formation. Quite what happens to these cells phenotypically is unclear given the limited set of markers used to look at the cells. Specific removal of Wt1 from the stromal lineage generates a milder phenotype, indicating a role for Wt1 there, but without a mechanistic analysis of the resultant products, the underlying mechanisms remain unclear. *
  
  As discussed in our response to reviewer #1, we agree on the lack of mechanism in the current study but emphasize here as well that although this is the topic of the on-going follow-up studies this is outside the scope of the current manuscript. We refer to the same response for our proposal for additional experiments for the revised version of the manuscript.
  
  *Wt1 removal from both lineages generated a phenotype less severe than removal from nephron progenitors (and previous data on "double lineage removal" with a Nestin1 cre), an indication that the genetic approach was not up to the task. *
  
  Respectfully, we would like to emphasize the practical challenges associated with the use of genetically modified mouse models for developmental (and other) studies. We doubt there are many Cre drivers that do exactly what they were intended to do, do only that, and at full 100% efficiency. Many Cre drivers are, when originally described, only described for the cell type they were intended for, and any other activities or limitations are missed or ignored. One could rightfully argue that is bad science, but unfortunately this is often the reality and the starting point for many in vivo analyses. And these are only the complications regarding the behaviour of the Cre driver, and does not even touch on issues like the biological processing of tamoxifen, and the stability of already existing mRNA and protein of the gene of interest in the context of, in this case, a rapidly developing organ. Simply dismissing technical complications as 'not up to the task' is in our opinion not the way forward for studying the origin of diseases.
  
  What is important, and what we demonstrate, is the realization of the limitations of a system, test them and where possible take them into account in the interpretation of data. In this case, instead of hiding the incompleteness of the Cre activity, we actually demonstrate this using retained staining of Wt1 and discuss this in the context of the different phenotypes. We have carefully tried not to overinterpret our data, and note that this reviewer does not give any specifics where this could be affecting our manuscript.
  
  We also like to stress that in the context of Wilms' tumour development the incomplete activity of this Cre driver could even increase the relevance of this model, since the early stages of Wilms' tumourigenesis in the (future) patient happen in a few mutant cells in the context of a further normal developing kidney. The effect of the normal cells in our model that we speculate about could also be important in the patient, we just don't have the technical possibilities to test this yet.
  
  *In some sense, one could regard this work as a pilot study, looking to optimize expensive and time-consuming mouse experiments to maximize insight (ie choose optimum model, address most informative time points, decide on analytical approaches). As a stand-alone paper, the work may not significantly advance our understanding of the topic. *
  
  As argued above, in our opinion this does not do justice to the work we describe in our manuscript.
  
  For example, can simple loss of Wt1 tells us anything about Wt? Yes Wt1 is lost in a subset, but even in these there are additional genetic mutations.
  
  Of course even in WT1-mutant tumours there will be additional mutations found in the tumour. In fact, it has been known for a long time that WT1-mutant Wilms' tumours select for oncogenic mutations in β-catenin with a surprising preference for specific mutations affecting Ser45. However, it is clear that in these tumours the loss of WT1 is the first, rate-limiting step (Fukuzawa et al. 2004; Li et al. 2004; Zirn et al. 2006; Uschkereit et al. 2007). These β-catenin mutations are selected for in an already WT1-mutant context. If we want to understand the full biology of the WT1 mutant tumours including the β-catenin mutation, we will first need to understand the effect of only losing WT1 because that is what provide the selective pressure for the next step (oncogenic mutation in β-catenin). The work described here is an essential first step in that.
  
  For Lin12, there is no significant advance beyond the studies of the Daly lab. *
  
  As argued above, this is not correct. The following aspects were not covered in the original paper describing this model:
  
  The in vitro analysis of control and LIN28B embryonic kidneys, including the time-lapse analysis demonstrating how the phenotype develops over time
  
  The expression of the Wilms' tumour cancer stem cell markers and how these change as a result of the LIN28B activation
  
  The direct comparison to the Wt1 loss phenotypes, and the demonstration these different mutations lead to fundamentally different biological phenotypes despite both eventually being classified as Wilms' tumours.
  
  I have no useful suggestions for improvement which would require a completely different approach to the problem from the start.
  
  We respect this reviewer's opinion, but based on the above we do not agree and maintain a different interpretation.
  
  Reviewer #2 (Significance (Required)):
  
  *The authors set out the goal in the introduction - to obtain a better understanding of the origins of Wilm's tumor. There doesn't appear to be an insight of cancer relevant significance beyond earlier studies. *
  
  Our work studies the very first steps in the development of Wilms' tumours. It will never be possible to study this in the (future) patient as these happen around wk 8-10 of pregnancy. By instead analyzing this in mouse models we show fundamental biological differences between different Wilms' tumour inducing mutations which is for sure relevant for patients, the interpretation of patient data (or more the difficulties with interpreting patient data if the initiating mutation or tumour class is not known). Moreover, the data provides new insights in the Wilms' tumour cancer stem cells, a preferred target for any therapy, and suggests the combination of all four known markers might be required to identify and study the true WT CSC. In our opinion such findings provide extremely relevant insights for the field.
  
  *To a readership now/too used to analysis at genome scales (genomic, transcription), this study might appear modest. *
  
  While we agree that genome-wide approaches can provide valuable insights, this doesn't mean that work that doesn't use them cannot provide important insights nor does it mean that every piece of work that does use them provides any new insights. We respectfully emphasize that the merit of a study should be assessed based on the data presented and their interpretation, rather than on the techniques that were used to obtain them.
  
  The target audience is unclear.
  
  Our target audience for this manuscript is everybody who is interested, for whatever reason, in the biology of Wilms' tumours.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  *Wilms tumor arises from disrupted kidney development. Progenitor-like populations and cancer stem cell (CSC) fractions have been described in patient tumors, but how specific mutations alter embryonic programs to generate these states remains unresolved. *
  
  *Pop et al. model genotype-phenotype relationships during kidney embryogenesis. Using Six2- and Foxd1-driven Cre lines, they test the effects of Wt1 loss-of-function and Lin28b gain-of-function in nephron and stromal progenitors. Through explant imaging, histology, and immunofluorescence, they define mutation-specific effects on ureteric branching, cap mesenchyme organization, stromal composition, and nephron differentiation. *
  
  *Lineage-restricted Wt1 deletion produces distinct outcomes depending on whether nephron progenitors, stromal progenitors, or both are targeted. Lin28b overexpression causes delayed nephrogenesis and lobular organization resembling human Wilms tumor morphology, with expansion of blastemal-like populations. *
  
  This is a correct summary of this part of our data.
  
  These genetic removals of Wt1 and overexpression of Lin28b are useful for the field in understanding where and how Wt1 functions and whether Lin28b could be a model for Wilms' tumor.
  
  We agree that our data on Wt1 loss focusses on the role of Wt1 in normal kidney development, how its loss disrupts normal kidney development and how this could be important for Wilms' tumourigenesis. This includes but goes beyond being only relevant for the function of Wt1, it informs on the biology of WT1-mutant Wilms' tumours.
  
  There is in our mind no doubt whether the LIN28B model is a model for Wilms' tumours. Activating mutations in LIN28B are found in human patient tumours, and already in the original publication of this model (Urbach et al. 2014) it was convincingly shown that the phenotype in the kidneys after less than 3 weeks (when the animals have to be culled animal welfare reasons) represents early stages of Wilms' tumours. Our data presented here confirms this, and extends it with respect to the behavior of the CSC markers and the comparison to the Wt1 loss phenotypes.
  
  Whether the use of previously defined markers NCAM and Aldeflour serve the authors well or is a distraction is to be determine but it is unclear how useful these have been for understanding WT biology thus far. The authors describe these in the developing kidney in explants and in vivo.
  
  *Overall, the data support the view that distinct mutations generate different forms of lineage derailment but it is unclear how this links to Wilms tumor. It is better suited to dsescribe the role of an interesting protein Wt1 in kidney development and lineages therein. Connecting it to tumor biology would require further scrutiny of tumors. *
  
  Since CSCs are, according to the cancer stem cell model, the cells in a tumour that should preferentially be targeted, the exact identification of the CSC markers is directly important for the treatment of tumours. Our data analyzes two different sets of CSC markers, we show these cells label non-overlapping cell types in the normal kidney but that after mutation induction their expression changes and potentially become co-expressed in a single cell type (see our response to Reviewer #1 for more details on this). Identifying the developmental origins of CSCs in a tumour that is the direct result of disturbance of normal embryonic development (Hohenstein et al. 2015; Li et al. 2021) can be used as an entry point into understanding the biology of these tumours. Based on this we argue that although our analysis is on embryonic kidneys, their implications are highly relevant for the actual tumours and their treatment. We propose to further emphasize this in the Introduction and Discussion of our manuscript.
  
  *The study shows that removal of Wt1 in the stromal compartment has distinct phenotypes, which could be important for Wilms tumor biology as this is an poorly understood part of this tumor. *
  
  As already discussed in our response to Reviewer #1, we agree this is a potential important and poorly understood part of Wilms' tumours, directly for WT1 mutant tumours which are stromal-predominant, but potentially also for other tumours. We propose to further address this in the Discussion of the manuscript.
  
  *Major comments: *
  
  This manuscripts uses elegant genetics to scrutinize the role of Wt1 and Lin28b. These stand out as difficult to conduct experiments and are of high value. *
  
  We thank this Reviewer's appreciation for the design, challenges and value of our data.
  
  In contrast, the section on ALDH1A2 and ALDEFLUOR activity is less integrated with the developmental framework.
  
  We discussed our reasons for focusing on the normal developmental context of the cells expressing the CSC markers in the previous section. Since the originally described CSC marker was activity for the AldeFluor enzymatic assay (Pode-Shakked et al. 2013) which we could not use on sections or kidney rudiments, we had to conclusively identify which ALDH isozyme is responsible for this signal in this context. There is much inconsistency about this in the literature, and whichever isozyme is important in these tumours might not be the causative factor in other tumours where AldeFluor labels the CSCs. We therefore use previously published microarray data from the group that originally identified the NCAM1/ AldeFluor combination as Wilms' tumour CSCs to identify ALDH1A2 as the culprit in this cancer type. With this knowledge we could move our analyses to antibodies, allowing co-staining with the other markers. Note that if the signal in these CSCs would have been the result of ALDH1A1 or ALDH1A3 which we show are expressed in the developing ureteric bud, the implications of this for the biology of the tumours would be totally different. We propose to discuss this aspects and its importance in more detail in the revised manuscript.
  
  *Much is unclear here e.g, antibody validation, rationale for performing these assays in explants rather than in vivo tissue, and the shift in Aldh1a2 staining pattern between E12.5 and E18.5, including reported nuclear localization.
  
  *
  
  We need to correct the reviewer on this remark, part of our data is using in vivo samples (E12.5 and E18.5) as well as cultured kidney rudiments. We will clarify which technique we use in the legends of the figures. We prefer to use this combination of techniques for several reasons: 1) the additional 3D information obtained from kidney rudiments can help with identifying specific developmental stages in the developing kidney; 2) due to the different fixation more antibodies work reliably in cultured rudiments than on paraffin frozen sections; 3) this is an important extra factor in the validation of antibodies; and 4) the possibility of culturing kidney rudiments on a time-lapse imaging system allows us to study phenotypes over time (this also greatly reduces the number of animals we need to study multiple timepoints in a developing system, an important aspect for the 3Rs). A good example of this in the timelapse data shown for the nephrogenic Wt1 knockout. The extreme outwards migration of the mutant cells (we show this using the tdTomato reporter) could only be identified in timelapse experiments, but is fully consistent with the sections of the corresponding E12.5 and E18.5 in vivo sections.
  
  We have no explanation for the shift to nuclear localization for ALDH1A2. We are not aware of any other publications showing this. We cannot rule out this is a technical artifact but based on all other expression data obtained with this antibody and their consistency with other publication we don't think this is very likely.
  
  *It is unclear how the manuscript is strengthened by this component. NCAM1 is referenced in the context of Wilms tumor CSCs, but unlike the rest of the manuscript which is mechanistic, it is unclear whether NCAM1 represents a mechanistic node in tumor initiation or merely a surface marker used for cell isolation? If NCAM1 functions just as a proxy for a progenitor-like state rather than a driver of tumor biology surely Wilms tumors will be full of progenitors or blastemal cells and many surface markers. It is unclear what strong evidence shows NCAM1 to be useful, this distinction should be stated. *
  
  Cancer stem cells are defined based in functional characteristics, i.e. the capability of reconstituting a complete tumour with all of its complexity after transplantation in immune-compromised mice. The markers are usually, indeed, merely proxy markers for a specific cell type in the tumour with this functional capacity. The same can be said in this case for the AldeFluor activity, it is used as CSC marker for many cancer types but we are not aware of any data on a functional role for this pathway in any of them. It would be a really interesting experiment to combine our models with an additional conditional knockout for Ncam1 or Aldh1a2 to see if the phenotype we describe here changes. The genetics of such an experiment with so many alleles are however horrendous, would come with an enormous surplus of animals and would take too long for the average project.
  
  The developmental framework presented argues that mutation-specific lineage derailment underlies tumor formation. Marker identity alone does not define pathogenesis. Perhaps reorganize this section to align it with the lineage-confusion model or removing it altogether would make the manuscript punchier?
  
  *
  
  We propose to rewrite these parts to make this more clear.
  
  *
  
  The manuscript is highly focused on the nephrogenic compartment yet removes Wt1 from the stroma as part of one of the main lines of experiments. At several occasions, stromal changes are described qualitatively but using quantitative terms. As such, the manuscript currently comes across as having a bit of a black box where we cannot see the stroma beyond H&E stains. Could there additional antibody stains for stromal markers e.g., Pdgfra, Pdgfrb, or Meis1 to better visualize this compartment and perhaps enable quantification of changes?*
  
  We agree this lack of additional stromal markers is a limitation of the current manuscript. Our reason for so far not including these was our doubts on the usefulness and relevance for the complete renal stroma of many commonly used markers. The scarceness of detailed studies on the developing stroma was a big part of this doubt. Some preliminary tests show that Meis1 is not exclusively found in the developing stroma of the mouse kidney but is also expressed in early stages of the nephrogenic lineage, and is therefore not a good marker for this purpose. Pdgfra and Pdgfrb however seem to be expressed throughout the complete stroma and not in the other lineages. __We propose to analyze these two additional markers for the revised manuscript. __
  
  *Minor comments: *
  
  *Page 4, Lines 89-95: Remove the repeated sentence beginning "Although best known as a transcription factor...". *
  
  *Page 8, Line 164: Arrows referenced in Figure 1F are not visible. *
  
  *Page 8, Lines 164-166: The sentence may refer to Figure 1G; this figure is not otherwise cited. *
  
  *Page 18, Lines 413-414: (Pode-Shakked et al., 2013) is cited twice. *
  
  *Figure 2C: Error bars are missing. Indicate number of biological replicates. *
  
  *Gene nomenclature should be consistent throughout the manuscript. A mouse protein/gene is Six2/Six2. *
  
  *Use precise language when referring to protein detection rather than "expression." *
  
  *Standardize corticomedullary orientation across figures. *
  
  *Page 7, Lines 160-161: Provide immunostaining supporting WT1+/Tdtomato− stromal identity. Co-staining with Foxd1 would clarify lineage assignment. *
  
  *At E18.5 in the Six2-driven Wt1 mutant, WT1 signal is absent despite earlier stromal WT1+ cells. Clarify the fate of these cells. *
  
  *Comment on the lower recombination efficiency observed in Wt1CE at E11.5. *
  
  *Page 14, Lines 321-322: Determine how long CITED1 persists in WTCE mice. Co-staining with later differentiation markers would clarify whether progenitor retention coexists with nephron maturation. *
  
  Page 15, Lines 352-353: Clarify whether the sentence describing blastemal-like regions should reference Figure 5D.
  
  We thank the reviewer for these correction and other minor comments. We will address them in the revised manuscript. With respect to the remark regarding the gene nomenclature, until recently we were also under the assumption that mouse proteins only have the first character as capital. However, to our surprise we recently realized the official mouse nomenclature states that the protein (but not the gene) is in fact in all capitals. We refer for this to section 1.5.2 at https://www.informatics.jax.org/mgihome/nomen/gene.shtml.
  
  References.
  
  Berry RL, Ozdemir DD, Aronow B, Lindstrom NO, Dudnakova T, Thornburn A, Perry P, Baldock R, Armit C, Joshi A et al. 2015. Deducing the stage of origin of Wilms' tumours from a developmental series of Wt1-mutant mice. Dis Model Mech 8: 903-917.
  
  Fukuzawa R, Breslow NE, Morison IM, Dwyer P, Kusafuka T, Kobayashi Y, Becroft DM, Beckwith JB, Perlman EJ, Reeve AE. 2004. Epigenetic differences between Wilms' tumours in white and east-Asian children. Lancet 363: 446-451.
  
  Gadd S, Beezhold P, Jennings L, George D, Leuer K, Huang CC, Huff V, Tognon C, Sorensen PH, Triche T et al. 2012. Mediators of receptor tyrosine kinase activation in infantile fibrosarcoma: a Children's Oncology Group study. J Pathol 228: 119-130.
  
  Guillaume R, Bressan M, Herzlinger D. 2009. Paraxial mesoderm contributes stromal cells to the developing kidney. Dev Biol 329: 169-175.
  
  Haghighitalab A, Nosrati F, Dehghani-Ghobadi Z, Sayed M, Ahn C, Hu Y-C, Chung E, Lim H-W, Park J-S. 2026. A knock-in Six2Cre line reveals transient interstitial potential in nephron progenitors. bioRxiv: 2026.2002.2004.703893.
  
  Hohenstein P, Pritchard-Jones K, Charlton J. 2015. The yin and yang of kidney development and Wilms' tumors. Genes Dev 29: 467-482.
  
  Kobayashi A, Mugford JW, Krautzberger AM, Naiman N, Liao J, McMahon AP. 2014. Identification of a Multipotent Self-Renewing Stromal Progenitor Population during Mammalian Kidney Organogenesis. Stem Cell Reports 3: 650-662.
  
  Krishna A, Meynert A, Dolt KS, Kelder M, Mesropian A, Ewing A, Brouwers C, Claassens JW, Linssen MM, Sheraz S et al. 2026. Mutational scanning reveals oncogenic CTNNB1 mutations have diverse effects on signaling. Nat Genet 58: 366-375.
  
  Levinson RS, Batourina E, Choi C, Vorontchikhina M, Kitajewski J, Mendelsohn CL. 2005. Foxd1-dependent signals control cellularity in the renal capsule, a structure required for normal renal development. Development 132: 529-539.
  
  Li CM, Kim CE, Margolin AA, Guo M, Zhu J, Mason JM, Hensle TW, Murty VV, Grundy PE, Fearon ER et al. 2004. CTNNB1 mutations and overexpression of Wnt/beta-catenin target genes in WT1-mutant Wilms' tumors. Am J Pathol 165: 1943-1953.
  
  Li H, Hohenstein P, Kuure S. 2021. Embryonic Kidney Development, Stem Cells and the Origin of Wilms Tumor. Genes (Basel) 12.
  
  Magella B, Adam M, Potter AS, Venkatasubramanian M, Chetal K, Hay SB, Salomonis N, Potter SS. 2018. Cross-platform single cell analysis of kidney development shows stromal cells express Gdnf. Dev Biol 434: 36-47.
  
  Miyagawa K, Kent J, Moore A, Charlieu JP, Little MH, Williamson KA, Kelsey A, Brown KW, Hassam S, Briner J et al. 1998. Loss of WT1 function leads to ectopic myogenesis in Wilms' tumour. Nat Genet 18: 15-17.
  
  Pode-Shakked N, Shukrun R, Mark-Danieli M, Tsvetkov P, Bahar S, Pri-Chen S, Goldstein RS, Rom-Gross E, Mor Y, Fridman E et al. 2013. The isolation and characterization of renal cancer initiating cells from human Wilms' tumour xenografts unveils new therapeutic targets. EMBO Mol Med 5: 18-37.
  
  Schumacher V, Schuhen S, Sonner S, Weirich A, Leuschner I, Harms D, Licht J, Roberts S, Royer-Pokora B. 2003. Two molecular subgroups of Wilms' tumors with or without WT1 mutations. Clin Cancer Res 9: 2005-2014.
  
  Urbach A, Yermalovich A, Zhang J, Spina CS, Zhu H, Perez-Atayde AR, Shukrun R, Charlton J, Sebire N, Mifsud W et al. 2014. Lin28 sustains early renal progenitors and induces Wilms tumor. Genes Dev 28: 971-982.
  
  Uschkereit C, Perez N, de Torres C, Kuff M, Mora J, Royer-Pokora B. 2007. Different CTNNB1 mutations as molecular genetic proof for the independent origin of four Wilms tumours in a patient with a novel germ line WT1 mutation. J Med Genet 44: 393-396.
  
  Zirn B, Samans B, Wittmann S, Pietsch T, Leuschner I, Graf N, Gessler M. 2006. Target genes of the WNT/beta-catenin pathway in Wilms tumors. Genes Chromosomes Cancer 45: 565-574.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2025.11.19.689177
www.biorxiv.org www.biorxiv.org

Complimentary vertebrate Wac models exhibit phenotypes relevant to DeSanto-Shinawi Syndrome

1
1. Public_Reviews 02 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors generated mouse and zebrafish models for DeSanto-Shinawi Syndrome, caused by loss-of-function variants in the WAC gene. Using these vertebrate systems, they demonstrate conserved craniofacial and social-behavioral phenotypes that parallel human clinical features, along with deficits in GABAergic markers. They observe increased seizure susceptibility and male-biased brain volumetric changes in Wac mutant mice. Together, these findings begin to define the biological consequences of Wac haploinsufficiency and provide valuable resources for future mechanistic studies.
  
  Strengths:
  
  WAC is a high-confidence neurodevelopmental disorder gene and one of the genes identified by large-scale exome sequencing efforts, including the Satterstrom et al. (2020) autism spectrum disorder cohort. This study establishes the first vertebrate Wac models, addressing a major gap in the understanding of DeSanto-Shinawi Syndrome, and provides a framework for studying other syndromic forms of autism. The models generated will be impactful and useful to the community to study and understand DeSanto-Shinawi Syndrome.
  
  The cross-species analysis is important and well executed, and reveals both conserved and divergent phenotypes. The behavioral and anatomical assays are rigorously executed and well-controlled, and the inclusion of RNA-sequencing analyses adds valuable insights into the mechanisms underlying brain function in Wac mutants. Notably, the RNA-seq data reveal upregulation of several clustered protocadherins, genes central to neuronal identity and cell-cell interactions, which are known to be regulated by dynamic developmental regulation of chromatin architecture. This observation provides an intriguing hint that could link Wac function to higher-order chromatin organization and neuronal connectivity.
  
  Weaknesses:
  
  The evidence is solid, but the study remains incomplete in its mechanistic depth and molecular interpretation. The authors compellingly describe behavioral, anatomical, and transcriptomic phenotypes associated with WAC loss, yet do not explore how WAC mechanistically regulates chromatin or transcription. Given prior evidence that WAC interacts with the RNF20/40 ubiquitin ligase complex and promotes histone H2B ubiquitination and transcriptional elongation, the paper would benefit from a discussion of these functions as a potential link between Wac haploinsufficiency and the observed changes in neuronal gene expression. Similarly, the authors mention WAC's WW and coiled-coil domains but do not consider how these domains could mediate nuclear interactions or recruitment of transcriptional cofactors that shape gene regulation and chromatin organization in neurons.
  
  We agree that many mechanisms underlying how both animal model phenotypes and human symptoms that are caused by the Wac gene still need to be worked out. Due to the need to generate a great deal of data to first describe these models in this manuscript this will be expanded upon later. In lieu of this, we plan to follow up with mechanistic papers later to fully address the gap that remains. We have now added a paragraph in the discussion to bring up these important points regarding the roles of Wac during transcription and how its protein domains might be involved in these processes.
  
  The transcriptomic analysis is rich but largely descriptive. Although the upregulation of clustered protocadherins is particularly intriguing, these findings are not validated or localized to specific neuronal populations. The study would be strengthened by independently validating the most significant RNA-seq changes, such as protocadherin gamma genes, using in situ hybridization methods to confirm the spatial and cellular specificity of expression changes.
  
  We have greatly expanded the analyses of the bulk RNA-seq data, including a more rigorous look into the differences in gene expression between sexes, which has additionally revealed males to be more impacted by Wac loss of function. We have also added new western blot data for pan protocadherin alpha, which is now validated to be upregulated in the cortex (new Figure 7I and 7J). We are holding back any additional data from this report as we have single nucleus RNA-seq data that will be reported on in follow-up papers with targeted conditional deletion models.
  
  Finally, while the behavioral and MRI results add valuable breadth, their interpretation would be improved by clearer reporting of sample sizes, statistical corrections, and effect sizes to support claims of sex-specific and regional brain volume differences.
  
  Some additional details have been added to the methods section. In addition, we have now provided sample sizes assessed in each figure legend.
  
  Reviewer #2 (Public review):
  
  The authors describe the first deep neurological characterization of WAC mutation in two vertebrate species (zebrafish and mouse). They examine these at various levels, guided by the work in humans that has associated a heterozygous WAC mutation with DeSantos Shinawi Syndrome (DESSH). Therefore, they investigate the animals for a variety of phenotypes, following a template for what is seen when characterizing a new mouse/fish model of a developmental disability gene. Investigations include analysis of skull and jaw for abnormalities(both species), MRI of brain structure(in mice), electrophysiology(mice), assessment of signaling pathways (by Western blot, in mice), cell counts (both, more in mice), transcriptomics (mice), and behavior (both).
  
  Generally, this describes an important first characterization of the consequences of the mutation. Most of the studies appear well-conducted and reasonably powered, thus solid or convincing. However, there are a few places where the data presentation could be improved for clarity, and a few concerns about some choices in analytical approach for a couple of the experiments, where improved statistical approaches could improve their sensitivity and/or better rule out false positives, and thus the support of some of these claims is currently incomplete. There is also some lack of clarity about the rationale for some decisions regarding the fish genetics. Nonetheless, this is an important and useful first characterization of many phenotypes of these lines. Such experiments form a baseline for future mechanistic studies in the same lines and a platform to test approaches to reverse phenotypes.
  
  Individual claims and their strength & weaknesses:
  
  (1) The authors developed mouse and zebrafish models of WAC deletion
  
  They used the existing KOMP floxed WAC line to generate a null allele. For the mouse, there is a Western showing that it is indeed null for the protein. The fish data is less robustly validated - they don't confirm the allele in null at the protein or RNA level, and fish have two paralogs (waca and wacb), and this paper only characterizes one of these. So this evidence is less clear. The evaluated mice are heterozygous (Het), similar to patients, while the fish appear to be evaluated as homozygous mutants.
  
  We agree with the reviewer’s comments on zebrafish genetics. Since antibodies against zebrafish Wac proteins are not available, we could not examine protein levels in zebrafish. We predicted frameshift mutations due to DNA analyses in waca and wacb KO zebrafish. We made waca KO, wacb KO, and waca/wacb double KO zebrafish. waca/wacb double KO zebrafish showed a lethal phenotype, similar to homozygous mice mutants. Since wacb KO zebrafish did not show any detectable phenotype we do not report those here. However, we now show examples of the wacb and dKO zebrafish in Figure S1. Since waca KO zebrafish showed craniofacial and behavioral phenotypes that are comparable to mice Het and human patients, they are focused on in this report.
  
  (2) The authors show that both species show altered craniofacial features
  
  These data appear well powered, and the findings are robust.
  
  We appreciate this confirmation.
  
  (3) Each model altered GABAergic neurons
  
  In mice, the authors stained with PV antibodies and saw a decrease in cells positive for this staining. A second marker, Lhx6, does not show a difference, suggesting this might be a change in PV expression rather than cell number. They could maybe look into the literature to see if this loss of just the protein also occurs in other models. Overall, the sample size here is a bit smaller than other parts of the paper (n=3), and the methods on the cell counts were less clear, so it is not as clear that this finding is as robust. The authors counted several other broad classes of cells, and those appear normal. Interestingly, there might also be some TBR1 mislocalization in layer 6 that might be significant with added power.
  
  Thank you for these suggestions. Yes, other models also show this lack of PV expression even when MGE-lineage interneurons are present at normal levels. We mention in the discussion a previous study on the ASD gene CTNNAP2 that showed this. We also agree that there is a trend going on in the Tbr1 population. We assessed another WT and Het pair for Tbr1 laminar distribution and were able to determine that these changes held up and are now significantly different; the person counting these numbers was blind to the genotypes. Finally, we added more details to the methods to describe how the counting was performed.
  
  The fish data is based on an in situ hybridization for GAD. The measure shown is the width of the positive area in the forebrain. This measure is not one I have seen much before, and has potential to be driven by something unrelated to GABA (e.g., if the whole forebrain were simply a bit smaller). So this analysis could use a couple of other approaches (density of signal?) and/or a control probe for some other brain gene showing the measure is normal, and thus it is not just a size issue.
  
  To compare altered GABAergic neurons in mice and zebrafish, we tried to isolate zebrafish PV genes and examined their expression by whole-mount in situ hybridization, now included Figure S3 but found no differences. However, we could not find any zebrafish PV gene useful for GABAergic neurons. We chose to examine gad1b expression in the positive area of the forebrain in WT and waca KO zebrafish and then found differences in the brain area with gad1b expression. Since WT and waca KO brain sizes are generally the same we believe this measurement is reasonable to make this conclusion and have added text to the results section to justify.
  
  (4) Mice were more susceptible to the seizure-inducing agent PTZ
  
  These data appear well powered, and the findings are robust. The authors also did a fair amount of useful electrophysiology that was all normal, but appeared to be well executed.
  
  Thank you, we appreciate this confirmation.
  
  (5) Mice had changes in brain volume that interact with sex
  
  The authors conducted an MRI on a good number of mice and reported a slight increase in global volume just in males. Sample size is fair, but the statistical approach here may be better if it puts males and females in the same model (to boost power and explicitly test for sex by genotype interaction that they report), and there is some chance that the brain region level differences that they report could include some false positives. They tested many regions, and it is not clear whether or not they corrected for the number of tests. Often, an FDR correction would be used in such imaging studies. It may be that only the most robust regional findings will survive those corrections. It is interesting data either way, but the analysis could be improved.
  
  Given the 80 regions (bilaterally) that we used and the number of mice, i.e. 6-7, we are underpowered to robustly undertake FDR types of corrections. In the data presented we used t-tests between sex and regions to illuminate putative regional changes. However, we did revisit our MRI data and found three data sets where the results were not normally distributed. We thus changed our statistical test to Mann Whitney for male retrosplenial cortex, male parietal cortex and female corpus callosum, which are now reflected in the figures and differential statistics noted in figure legends.
  
  (6) Several behaviors are altered in the mice as well
  
  These studies were fairly well-powered (n=15,16), and they found several positive and negative results, including alterations in memory and sociability in both species. There is a minor statistical flaw in the three-chamber analysis (they don't actually compare the Hets directly to the wildtypes in their statistical testing - a common mistake in neuroscience that should be addressed. But the data look like they will probably still be significant when correctly analyzed. In the supplement, the authors could do a bit more with the data they have to look at hyperactivity (i.e., show total motion in open field, not just time in center vs. periphery), and adding sex to their model might improve sensitivity for genotype effects.
  
  Thank you for these suggestions. We have done several things to address this behavioral paradigm. First, we added more n’s and also switched from comparing the mouse vs. object to just comparing genotypes as a variable. In addition, we switched to quantifying a discrimination index, described in Phiilips et al., 2019 PMID: 31112129 for our measurement. These new data are shown in Figure 3A. Open field total distance traveled has now been added to Figure S2A. For all other measurements, we did first assess for sex differences but found none and thus compiled both sexes for the graphs.
  
  (7) Some biochemical signaling pathways are altered in the brain
  
  These are n=4 immunoblots, and show altered phospho ERK, but no changes in other signaling events predicted from prior WAC literature like H2B ubiquitination. They appear well done, and the authors share the full blots in the supplement.
  
  Thank you, we appreciate this confirmation. Since Wac is an adaptor protein we needed to test these reported molecular changes in neurons that were previously only reported in cell lines and drosophila. We were not surprised that some of these previously reported changes would not be the same in brain cells. However, it is possible that these changes might arise in more discrete brain regions or at different times during development, which will be tested in our future conditional knockout models.
  
  (8) WAC deletion also alters gene expression in the brain
  
  These studies were well-powered for RNAseq, with 10 and 14 samples, using neonates (P2), just the forebrain. The sequencing quality metrics all looked good, and the approach to analysis was okay. It would be stronger to again include sex in the model, rather than separate by sex. There were some typos in this part of the paper that made part of the conclusions unclear, but the RNAseq nicely confirmed the mutation of the mice, and discovered many differentially expressed genes, consistent with the role of this gene as a regulator of transcription. The presentation could be expanded to make more use of the data. Overall, though, this is a useful first characterization of the transcriptome in the line.
  
  Thank you for the suggestions. We have greatly expanded our assessments of the RNA-seq data. Upon analyzation of the data we found many differences between males and females and now show combined and sex-separated data. Our new data isolate several more extreme and some unique changes in males that are better shown as stand alone figure panels. In addition to these edits, we have also reworked all the text in this section of the results for better reading.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The cause and timing of lethality in the homozygous Wac knockout should be reported or discussed. Investigating Wac homozygous knockout embryos, if viable at early stages, could provide valuable insight into the developmental origins of the neuroanatomical and behavioral phenotypes described in the heterozygous animals. Even a brief histological or transcriptomic characterization of embryonic brains would strengthen the mechanistic understanding of Wac function during neurodevelopment.
  
  We agree and have collected embryos as early as embryonic day 12.5 from multiple litters but never detected a knockout. We have added this text to the animal methods sections to let readers understand effort had been done to determine when death occurs. While we don’t currently explore this further in mice we now include zebrafish waca; wacb double knockouts. Notably, while we were able to generate a few of these mutants, most died. However, some zebrafish were aged long enough to observe lethal deficits in heart formation and swim bladder development, suggesting that early loss of Wac could impact these critical organs that leads to death.
  
  (2) A better description of the data reported in Supplementary Tables 3 through 5 is needed. Supplementary Table 3 does not report any statistically significantly differentially expressed genes in the FDR column, and Supplementary Table 5 reports only two, and the reader should understand what the columns are indicating.
  
  We have now added figure legend text to the supplementary file to explain each Table mentioned here.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Page 3, last paragraph. The description of wacb is confusing. I recommend that the authors provide the unshown data they mention and also further explanation of the breeding scheme and result. Indeed, if wacb is homozygous lethal, does that make it more like the mouse WAC gene, and thus potentially the more relevant paralogue to study? Are both waca and wacb expressed in the same tissues? How does that compare to mouse and human WAC expression? Such figures about gene expression (even when adapted with permission from public resources like Allen brain atlas or GTEX) are common in this sort of paper, as they can be helpful to understand when and where the gene is thought to act. For waca vs. wacb, they may help determine which gene is more relevant to the brain (for example, if only one is expressed in the brain).
  
  First, this is a great question and we have now added whole mount in situ for the waca and wacb genes as Figure S1. These data show low to no wacb expression in brain regions while waca is highly expressed there. Since the waca mutants showed phenotypes relevant to DESSH but wacb mutants did not, this correlates with observed expression patterns without fully excluding wacb from any role. Thus, we also made waca/wacb double KO zebrafish that showed a lethal phenotype, similar to homozygous mice mutants. Only a few waca; wacb double knockouts survived a little through development and are now shown in Figure S1. Since wacb KO zebrafish did not show any detectable phenotype on their own, we did not include the data since there are already several figures/tables in this manuscript. However, the waca KO zebrafish did show phenotypes similar to humans with DESSH and are the ones we focused on.
  
  (2) Why did the authors cross the mice into the outbred CD1 background? Usually, most labs keep the lines on an inbred background. Was there a particular rationale here? I am not saying that they could not outcross them. It is just a bit puzzling why. Perhaps a sentence of explanation in the methods section would be warranted.
  
  This is a great question and we have now added text to the animal methods section. Many labs that study development, especially on genes critical for survival/life like the Wac gene, use a more robust strain like CD-1. By doing this, we have a better chance of evaluating mutants at more mature ages and getting enough progeny to do more reproducible studies.
  
  (3) A typical first experiment in a new knockout (fish or mouse) is to establish that the deletion does indeed result in a loss of RNA and protein. In the absence of this, the rest of the paper cannot be as confidently interpreted.
  
  We did this for the mouse model and found reduced protein expression in the constitutive Het, however this datum is part of the western blots in figure 5. We now mention this in the early results section that protein levels were reduced in the Hets but maintain that the presentation of the western blot is better suited in Fig. 5 to compare to the other western blots. For zebrafish this was attempted but was more difficult. Available antibodies don’t work in zebrafish. RNA expression was attempted in both models and due to Wac being a critical gene for life, there are checks in place to upregulate faulty and normal RNA in the waca model. We screened for frameshift mutations in multiple KO lines and confirmed it by genomic DNA sequencing. In making many KOs and large-scale mutagenesis in zebrafish, we usually depend on phenotype-genotype segregation in Mendelian inheritance for many generations.
  
  (4) Are these new lines indeed knockouts? I did find a WAC western as part of a later figure for the mouse. The authors may want to mention that earlier, or present at least that data right away. What about in the fish? Is there a way to confirm at the RNA or protein level that it is indeed a null allele?
  
  Yes, as mentioned in the above response we have now mentioned our Wac western blot results early when introducing the mouse mutants and the issues with doing this in fish are presented above as well.
  
  (5) Why are fish used that are KO while mice are Hets? Are WAC homozygous mice not viable? This should be mentioned. Regardless, the rationale for examining heterozygous mice and homozygous mutant fish should be provided. Each kind of experiment is useful, but they are interpreted in different ways. Hets will genocopy the patients, who are generally hets, while KOs are often useful for a study of the essential roles of the genes, even if they are not really modeling the patient gene dose.
  
  Wac homozygous mice in our hands are embryonic lethal, now mentioned in the animal methods section, but we found early on that the Hets mimic several human DESSH patients. In zebrafish it is more complicated. We analyzed waca and wacb hets in zebrafish but found no phenotypes. This could be in part due to some complementation between the waca and wacb genes. It is also possible that a full waca KO could resemble a human DESSH individual since wacb may complement somewhat, even though deleting wacb entirely does not have a measurable phenotype. We have added more text to the discussion to explore these complexities. We also made waca/wacb double KO (dKO) zebrafish but they showed lethal phenotype, similar to homozygous mice mutants and suggesting some complementation by the wacb gene even though alone it did not exhibit phenotypes.
  
  (6) Figure 3A: It does not appear that the authors are directly statistically comparing the two groups (genotypes) that they are drawing conclusions about. This is an unfortunately common mistake in the neuroscience literature across papers. There is a nice older review about it here. https://pubmed.ncbi.nlm.nih.gov/21878926/. To draw conclusions about the differences between the mouse genotypes, they need to compare the two genotypes directly with a statistical test. See Nygard et al for a recommended approach, like comparing social preference indexes
  
  (https://onlinelibrary.wiley.com/doi/abs/10.1002/aur.2154).
  
  Thank you for this information. Previous reviewers at a different journal asked for this particular evaluation. We have now made changes to address the assessment, and graphs now reflect comparisons of genotypes instead of a single genotype between time with a mouse or object. We have also moved to using a social discrimination index to compare the genotypes, similar to the study mentioned.
  
  (7) MRI - it is a bit weird to separate the male and female brains just for the MRI. Was there a premise from human data to do so? If not, the authors should probably pool them. If they are concerned there are sex effects (or, more likely, a sex by genotype interaction) I recommend that they use a two-factor ANOVA and simply put both sex and genotype into the model. This will also have the advantage of increasing their statistical power for genotype effects a bit. If their current results are robust, they will still show up as a significant sex x genotype interaction.
  
  All data in the manuscript initially compared the sexes to each other. We have now added this text to the animal section of the methods: For MRI, some zebrafish behaviors and now the RNA-seq data, sex was a difference and due to this observation, sex was (or now is) presented independently for these measurements. We now state that if no sex differences were observed the data were pooled.
  
  (8) Also, did the authors correct for multiple testing in the MRI analysis? Since they are testing many regions, there is a risk of false positives if they do not. This could be confounded further by their splitting the data by sex, thus doubling the number of tests.
  
  As noted above we did not do multiple corrections given the large number of regions and low number of replicates.
  
  (9) How many images per animal were analyzed for the cell counts? This detail is absent from the methods and would help with evaluating the robustness of these findings. What other approaches were used to make sure the counting was unbiased?
  
  We analyzed 3-4 images per animal for counts and counted hundreds of cells per image. In addition, the person counting was blinded to avoid any bias. These details have now been updated in the methods.
  
  (10) As with the MRI, for the DEG analysis, I recommend the authors simply put sex and genotype into the same model as two factors (with an interaction), to increase their sensitivity to genotype effects, as well as be able to report on robust genotype x sex differences, if there are any. They may also consider testing the model with and without excluding the three outlier animals on their PCA. It may be that the noise of those outliers is detracting from their sensitivity for DEGs somewhat.
  
  We greatly expanded our analyses and found more robust and unique changes in males that are now added to Figure 7 and supplemental files. After considering the data, decided to highlight the sex differences separately.
  
  (11) A few more relatively simple things could readily be done with the RNAseq data to add some depth and interpretation. For example, do the hits here overlap other published IDD/autism DEG lists from mouse knockouts studies of genes like FoxP2, Chd8, Dnmt3a, Myt1l, Tcf4, etc? Do autism genes show up in the lists of hits here? And if so, more than expected by chance? Can they provide some visualization of their GO results in the main figure?
  
  When we looked into the sex differences more we found that only the males showed significant upregulation of other autism risk genes increase that was previously unappreciated when the sexes were assessed together. Yes, several autism genes do show up but is heavily biased to males. Our main Figure 7 and new supplemental files show new GO term analyses and provide additional data looking not only autism but other factors.
  
  (12) It appears the IMPC has phenotyped this mouse somewhat, including craniofacial abnormalities. They also report on some blood cell differences. Anyway, if no one has written about that data yet (as it was generated in the context of a big consortium effort), their guidelines may allow you to include some of their data as Supplementary Figures here with proper attribution. It might help to at least summarize useful findings from there in your discussion.
  
  Due to the large number of figures/tables already in this report we don’t think this will be helpful. However, we do refer readers to the consortium in the animal methods section so they can explore data already generated by the IMPC.
  
  (13) Minor/Typos:
  
  (a) Figure 2K: I am confused by the description of three genotypes in the legend, but only two in the panel?
  
  Corrected.
  
  (b) I found it a little distracting that some results figures were embedded in the introduction.
  
  We have moved the figures further in the manuscript to start in the results section.
  
  (c) I don't understand this sentence: "Due to reduced sample size, sex-stratified DE was performed without model corrections at FDR < 0.1, 7 and found genes significantly upregulated and downregulated, respectively;" The sample size here seemed robust, so I am not sure what they were referring to? Are there missing numbers form this sentence? What is the 7? I think there are enough typos here that I am not sure how to evaluate this claim. Thus, the writing and clarity of this part could be improved.
  
  This section had several typos that have now been corrected.
  
  (d) "Marwan Shinawi, (unpublished results)" is a bit atypical of a citation. Are these results being reported with his permission? If so, then it should say 'personal communication' (if the journal permits this - some do not). If not, they should not report someone else's unpublished results without their explicit permission. It might upset some people to have their results presented this way.
  
  We have changed unpublished results to personal communication. Marwin Shinawi is an author on this manuscript and has approved of everything we have reported.
  
  (e) In all figures, consider shape or color coding for sex, even when pooling the data (e.g, the data points in the behavior figures).
  
  This is a good idea but since we found no difference when analyzing the data we don’t see how this extra work will make a difference. Since we now mention that sex differences were only presented as separate graphs when observed in the methods we think this should be acceptable.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.26.595966v5
www.biorxiv.org www.biorxiv.org

The neurotrophin DNT-2 via the Toll-2 receptor regulates neuronal survival and morphology during visual system development

1
1. EMBOpress 01 Jun 2026
  
  in Review Commons
  
  Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Referee #1
  
  Evidence, reproducibility and clarity
  
  Summary
  
  Alshamsi et al. investigate the role of Drosophila neurotrophins (DNTs) and their Toll receptors in regulating neuronal apoptosis during optic lobe development.
  
  The authors provide compelling evidence that different DNTs and their Toll receptors are expressed in optic lobe neurons, and their activity regulates neuronal survival during optic lobe development. They further show that disruption of DNT/Toll signaling impacts neuronal morphologies.
  
  Comments
  
  ABSTRACT
  
  "Over-expression of DNT-3 (spz-3) and DNT-2 (spz-5) could rescue natural occurring cell death, whereas their loss of function caused cell death, showing that DNT-3 and DNT-2 can, and are required to, promote cell survival during optic lobe development."
  
  I find it more appropriate to say that OE prevents naturally occurring cell death because it inhibits a normal physiological process."rescue" would only be correct if there were an experimental or genetic loss (e.g., deletion of a survival factor) and you are restoring normal survival levels
  
  "Importantly, DNT-2 is expressed in Mi1 neurons and Toll-2 in connecting L1 neurons. We show that DNT-2 functions in concert with Toll-2, as Toll-2 RNAi knock-down prevented the rescue of apoptosis by DNT-2 over-expression and all Toll-2+ neurons were lost in DNT-2 mutants."
  
  I find this sentence very difficult to follow
  
  i suggest moving "Importantly, DNT-2 is expressed in Mi1 neurons and Toll-2 in connecting L1 neurons."
  
  to the next sentence. "by specifically investigating the Mi1 (DNT-2+) and L1 (Toll-2) synaptic partners", alterations in DNT-2 or Toll-2 expression levels impaired connectivity of L1 neurons at the M1 medulla layer and altered dendritic morphology of L1 neurons
  
  "As DNT-3 (spz-3) and DNT-2 (spz-5) are expressed in the medulla and they could influence both lamina and medulla neurons, this suggests that their function maintaining cell survival could enable the stabilisation or alignment of connected neurons across medulla columns."
  
  influence what? this is very vague and needs a temporal understanding of when neurons die, synapses are formed, and consider the phenotypes of the mutants and RNAi experiments.
  
  INTRODUCTION
  
  "Neuronal survival is maintained by neurotrophic factors secreted in limited amounts by target cells, leading to the survival of only those neurons that receive trophic support (Levi-Montalcini, 1987, Davies, 2003)."
  
  Perhaps the authors can be more precise, e.g. One mechanism by which Neuronal survival is regulated is through neurotrophic factors secreted in limited amounts by to be synaptic partners and adjacent cells
  
  "In this context, if neurotrophism is fundamental for nervous system development, it could have been enabled by evolutionarily conserved molecular mechanisms."
  
  I think the authors want to suggest that given the fundamental and widespread role of neurotrophism in nervous system development, it remains unknown if it relies on evolutionarily conserved molecular players.
  
  "Neurotrophins - NGF, BDNF, NT3, NT4 - are the main growth factors maintaining neuronal survival in the vertebrate nervous system (Levi-Montalcini, 1987, Lu et al., 2005). Importantly, they can also promote cell death, depending on context (Lu et al., 2005). They can promote cell survival via their Trk receptors and ERK and AKT downstream and via p75NTR and NFB downstream, or cell death via p75NTR, Sortilin, and JNK signalling instead (Lu et al., 2005). mechanisms."
  
  The authors should avoid the repeated use of "they"
  
  "There are six spz and nine Toll paralogous genes in Drosophila, which could play distinct functions. In fact, at least full-length DNT-1 and Toll-1 can promote cell death instead, and at least Toll-6 can promote either cell survival or cell death, depending on context (Foldi et al., 2017, Singh et al., 2025, Zhu et al., 2008) . Importantly, mature DNT-1 and DNT-2 with Toll-6 and Toll-7 are required for and can promote neuronal survival during circuit formation in the embryonic ventral nerve cord (McIlroy et al., 2013, Zhu et al., 2008)."
  
  I find this paragraph difficult to follow. I suggest the following editing, which the authors might want to consider:
  
  "There are six spz and nine Toll paralogous genes in Drosophila, which could play distinct functions. In fact, at least full-length DNT-1 and Toll-1 can promote cell death instead, while and at least Toll-6 can promote either cell survival or cell death, depending on context (Foldi et al., 2017, Singh et al., 2025, Zhu et al., 2008) . Importantly, mature DNT-1 and DNT-2 with Toll-6 and Toll-7 are required necessary and sufficient to promote for and can promote neuronal survival during circuit formation in the embryonic ventral nerve cord (McIlroy et al., 2013, Zhu et al., 2008).
  
  "During this time (24-50h APF), connectivity between photoreceptors, lamina and medulla neurons is established; this is followed by medulla neurons connecting to lobula neurons;"
  
  I find this sentence misleading, if not incorrect. If by connectivity the authors mean synaptogenesis, for all that is known, synaptogenesis has been shown to occur from from mid-pupal development (P50) onwards If by connectivity, the authors mean the targeting of specific neuropiles and layer organization, it is also incorrect that lamina and medulla organization precedes the connectivity between medulla and lobula neurons. These processes are all concurrent. Can the authors please clarify?
  
  "and by 72h APF cell death has greatly diminished and synaptogenesis completes connectivity patterns, in preparation for adult eclosion at 96h APF (Millard and Pecot, 2018, Melnattur and Lee, 2011, Hadjieconomou et al., 2011, Kurmangaliyev et al., 2020) "
  
  Kurmangaliyev et al., 2020 is probably not an appropriate citation here as it mostly deals with transcriptional programs of circuit assembly in the developing optic lobe
  
  "Thus, 24-48h APF is a critical period to maintain necessary lamina and medulla neurons alive in the optic lobe."
  
  Perhaps the authors want to revisit this sentence and explicitly say that 24-48h APF is a period where apoptosis defines cell numbers
  
  "The development of the Drosophila visual system has been well described (Holguera and Desplan, 2018, Melnattur and Lee, 2011, Millard and Pecot, 2018, Hadjieconomou et al., 2011, Behnia and Desplan, 2015)."
  
  I find that Behnia and Desplan, 2015 is not appropriate, as it is a review that describes the characterization of neuronal circuits underlying visual modalities in the fly brain, and not their development. The following reviews dealing with different aspects of neurogenesis, neuropile development and circuit formation are likely more relevant: Bakshi et al Current Opinion in Neurobiology 2025 Malin et al PNAS 2021 Ngo Dev Bio 2017
  
  "R7 and R8, together with lamina neurons target to medulla layers M6 and M3, respectively, organizing into medullar columns that respond to the same point in visual space and maintain retinotopy."
  
  I find this sentence misleading because lamina neurons do not target M6 and only L3 targets M3.
  
  "Medullar interneurons also form connections across multiple layers, where each layer represents different visual features (Fischbach and Hiesinger, 2008, Millard and Pecot, 2018)."
  
  Here, Behnia and Desplan, 2015, Matsliah et al Nature 2024, Borst and Groschner , Annu. Rev. Neurosci. 2023, and even Schnaitmann et al J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2020 are better references that review feature detection and circuit organization in the optic lobe.
  
  "Neurons within the lobula complex integrate signals from the medulla and project to the optic glomeruli in the central brain and motor outputs to enable appropriate behavior (Behnia and Desplan, 2015, Borst et al., 2020, Courgeon and Desplan, 2019b)."
  
  I find Courgeon and Desplan, 2019b is not very appropriate here, as it reviews the coordination of neural patterning in the Drosophila visual system. More adequate and relevant manuscripts and reviews are Wu et al. elife 2016, Tanaka and Clark, 2022, Lapoetke et al. Neuron 2022 , even Zhao et al. elife 2024
  
  "Spz-5 is well known as Drosophila neurotrophin-2 (DNT-2), and as Spz-3 has been proposed to have neurotrophin functions which we expand on and demonstrate here, we refer to Spz-3 as DNT-3 (Zhu et al., 2008, Coutinho-Budd et al., 2017, Sun et al., 2024, Ballard et al., 2014, Ulian-Benitez et al., 2017)."
  
  This last paragraph of the introduction seems out of place, and partly redundant with page 3. Perhaps the authors would like to finish the introduction with a paragraph highlighting the major findings and conceptual advance of the manuscript? This seems to be a good and natural way of following their previous sentence "Here, we asked whether neurotrophin family ligands encoded by DNTs (spzs) and their Toll receptors could regulate cell survival during neural circuit formation, in the Drosophila pupal optic lobe."
  
  RESULTS
  
  I suggest the following edit: To ask whether DNTs (spzs) are expressed in the pupal optic lobe, we generated T2A-Gal4 driver lines for spz-1,3,4,5 fly lines , crossed them to 10xUASmyrGFP or 20UAS6xmCherry reporter flies, and analysed resulting progeny optic lobes during development and in the adult with anti-GFP antibodies, as required (Figure 1A).
  
  Also, If the lines were also crossed with mcherry, mentioning anti-GFP antibodies is incomplete.
  
  "spz-1MIO2318-T2A>myrGFP and spz-1MIO2318-T2A>6xmCherry revealed expression in a few centrifugal neurons in the lobula complex that projected to the lamina,"
  
  At which stage?
  
  "subsequently medulla neurons and abundant arborisations into the lobula complex and medulla."
  
  Subsequent to what? I am sorry, but I don't understand this description.
  
  "Expression from spz4MI5678 -T2A->myrGFP was not detected until 72h APF" Data before 72h APF is missing Where was it expressed? Which cell types? Which neuropiles?
  
  "And then was found in the medulla and lobula complex and followed by the trachea." At which stage? What was followed by the trachea? I think the authors mean that at later staged (in the adult) expression was restricted to the trachea in both medulla and lobula
  
  "spz-3-T2A>6xmCherry (hereby named DNT-3) was highly expressed in non-neuronal retinal cells and medulla neurons;"
  
  At what stage?
  
  "And subsequently in the trachea and possibly glia."
  
  The authors could and should explain how they reach this conclusion. Given that no cell type specific markers were used, this identification was likely based on morphological features.
  
  "Finally, DNT-2-T2A>6xmCherry (spz-5) was found in medulla neurons, which could be tentatively identified as Mi1 medulla neurons (Nern et al., 2025) by 48h APF, and this pattern was maintained."
  
  The authors suggest these neurons are Mi1 based on what? Also, Fischbach's Cell Tissue Res (1989) seminal paper is probably worth mentioning.
  
  "Abundant cells expressed spz-1 in the lobula complex (Figure 1B, left) and medulla (Figure 1B, right), and DNT-3 (spz-3) and DNT-2 (spz-5) in the medulla (Figure 1B) during optic lobe development. DNT-3 (spz-3) and DNT-2 (spz-5) were expressed in distinct non-neuronal cells in the retina, seen in Multi Colour Flip Out (MCFO) clones (Figure 1C)."
  
  This sentence is misleading. These experiments allow the authors to conclude that spz+ neurons innervate these neuropiles. It does not allow the authors to conclude that spz molecules localize to the neuropiles. The authors should revise these claims in the main text and relevant figure legends
  
  "To visualize the distribution of Tolls in the optic lobes during pupal development, we used the GAL4 lines previously described (Li et al., 2020), driving expression of the reporter myrGFP (Figure 2A)."
  
  Same comment as above
  
  "Using MCFO clones as well as myrGFP, we could identify some of the Toll-8+ neurons as Lawf1, feedback neurons projecting from the medulla to the lamina (Figure 2B), and L2 and L4 lamina neurons (Figure 2B, C); Toll-6+ cells to include L3 and L4 lamina neurons (Figure 2B,D); and Toll-2+ neurons as L1 lamina neurons which target to M1 and M5 medulla layers and L3 lamina neurons that project to M3 (Figure 3B,E) (Hakeda-Suzuki and Suzuki, 2014, Behnia et al., 2014)."
  
  Hakeda-Suzuki and Suzuki, 2014, Behnia et al., 2014 are not the most appropriate references. Instead, I suggest the authors should cite Fischbach's Cell Tissue Res (1989).
  
  "Overall, the expression in the scRNAseq dataset (Kurmangaliyev et al., 2020) of the spz ligands and Toll-8 (also known as Tollo) data were less consistent with the cell biology data, whereas the expression of Toll-1, -2 and -6 confirmed cells seen with the cell-biology based reporters"
  
  It is perhaps more accurate to refer to the cell-biology based reporters as translation reporters, which is what T2a based Gal4 drivers are.
  
  "Most particularly, Toll-2 mRNA (synonym 18w) was found in L1 and L3 lamina neurons over time, plus also in L5 at 24h APF, and Toll-6 mRNA was found in L2, L3, L4 over time, plus also in L1 at 24h (Supplementary Figure 1-6)."
  
  This sentence is difficult to read and could be considered poorly written for several reasons including: - "plus also" is repetitive. "Plus" and "also" serve the same function. - Inconsistent punctuation. The lack of commas before "and Toll-6 mRNA..." makes the sentence feel unbalanced. -Vague time reference. "Over time" is imprecise. It's unclear whether it means during development, at multiple timepoints, or something else. Also regarding scRNAseq analysis: - the authors mention "We compared our reporter-based profiles with published scRNAseq datasets of the optic lobe through development (Kurmangaliyev et al., 2020, Ozel et al., 2021)" however the Ozel dataset doesn't seem to be used.
  
  Also, from the Material and Methods section:
  
  "The data were imported as a Seurat object, and cells corresponding to specific timepoints (e.g., 24 h, 36 h) were subsetted based on the provided metadata. Dimensionality reduction was carried out using principal component analysis (PCA), followed by Uniform Manifold Approximation and Projection (UMAP) embedding computed on the first 30 principal components. Cluster annotations provided by the original authors were used for all cluster-level analyses and visualisations."
  
  The Kurmangaliyev dataset is already processed. I am probably missing something here, but is not obvious to me why the authors performed PCA again
  
  "To conclude, at the time of naturally occurring cell death (0-48h APF), Toll-1 is highly expressed throughout the optic lobe; Toll-2, -6 and -8 are expressed in the medulla; Toll-8 and Toll-6 are prominently expressed in the lobula complex, and Toll-6 and Toll-2 are prominent in the lamina."
  
  This is misleading. These experiments allow the authors to conclude that toll+ neurons innervate these neuropiles. It does not allow us to conclude that toll molecules localize to the neuropiles. The authors should revise these claims in the main text and relevant figure legends
  
  "To ask whether DNTs can promote cell survival in developing optic lobes, we over-expressed DNT-2 (spz-5) and DNT-3 (spz-3) and visualized dying cells with the apoptotic marker anti-Dcp1 at the peak of naturally occurring cell death (24h APF)."
  
  These experiments were done using Toll8-Gal4 and nsyb-Gal4 drivers. What's nsyb-Gal4 expression during development? Is the expression of this driver consistent with the conclusions drawn from these experiments?
  
  "To test whether DNT-2 could promote cell survival during optic lobe development, we over-expressed full-length DNT-2FL or cleaved DNT-2CK in all neurons with nsybGAL4. This reduced the incidence of Dcp1+ apoptosis in the lamina and outside the lamina too (Figure 3C-D and Supplementary Figure S9C,D)."
  
  How do the authors explain that DNT-2CK reduced the number of Dcp1+ cells?
  
  "We generated DNT-3 (spz-3) loss of function mutants by P-element mobilization. DNT-2 and DNT-3 loss of function mutants caused considerable cell debris in the medulla and lobula complex, which compromised the analysis in this region, so we focused on the lamina."
  
  Perhaps, rather than stating that the mutants caused considerable cell debris, the authors could say that the mutants displayed considerable cell debris
  
  More importantly, I have concerns with the data from these experiments (Figure 3). Dcp1 signal volume intensity using Imaris. In all panels (A,C,E) the segmented images do not match the raw DCP1 staining, raising concerns on how much can one rely on this quantification. Could this be because the Dcp1 staining shown is a single z plane and the segmentation is a 3d rendering? The authors should carefully and robustly explain this discrepancy which is present in all images where Dcp1 signal volume intensity was quantified.
  
  Also, could the authors explain why the quantifications in Figure 3B and 3C differ by an order of magnitude (10×) from those in panel 3D? Please look at the WT control, there is a 10X difference in signal volume intensity.
  
  "Toll-2pTVGAL4 flies are heterozygous mutant for Toll-2, and, remarkably, in combination with DNT-2 homozygous mutants resulted in semi-lethality, revealing a functional interaction between these two genes.
  
  I cannot entirely follow this conclusion. I understand the authors propose that the combination of partial loss of Toll-2 and full loss of DNT-2 affects viability, more than either mutation alone. Is this what they mean? Can the authors comment on the viability of DNT-2 mutants?
  
  "Macrophages loaded with HisYFP and distributed mostly between the retina and lamina could be observed across these samples (Figure 4C), suggesting they had engulfed dead cells" How do the authors identify these YFP+ cells as macrophages?
  
  "Together, these data show that DNT-2 functions as a ligand for Toll-2 to maintain the survival of neurons in the lamina, medulla and lobula complex during optic lobe development." While the results from Figure 4 showing that DNT-2 acts as a ligand for Toll-2 to support neuron survival are solid ( in particular panels C-F), it doesn't necessarily mean all neurons die directly due to loss of Toll-2 signaling. It is plausible that Neurons that express Toll-2 die because they lose critical survival signals. The death of these Toll-2-expressing neurons could then cause a cascade effect, where neighboring or connected neurons die indirectly due to loss of trophic support, disrupted circuits, or secondary damage. So, the observed cell death in multiple regions may be a combination of direct effects on Toll-2-positive neurons and indirect effects on other neurons. "In the Drosophila pupa, connectivity of lamina to medulla neurons takes place at 30-48h APF, and between medulla and lobula complex at 60-70h APF (Kurmangaliyev et al., 2020, Millard and Pecot, 2018, Pecot et al., 2014, Hadjieconomou et al., 2011)." I have the same comment as mentioned above regarding the timing of connectivity. If by connectivity, the authors mean the targeting of specific neuropiles and layer organization, it is incorrect that lamina and medulla organization precedes the connectivity between medulla and lobula neurons. These processes happen concurrently. Can the authors please clarify?
  
  "Importantly, the expression of synaptic markers starts at 24h, peaks at 60h APF and spontaneous neuronal activity takes place at 48h APF, meaning that at least some neural circuits are already connected by this point (Kurmangaliyev et al., 2020)."
  
  The correct placement of the reference to Kurmangaliyev et al. is after "peaks at 60h APF " A reference to Akin et al and Bajar et al when referring to PSINA is missing.
  
  "Thus, the period of naturally occurring cell death overlaps with connectivity" I think the authors mean that the period of cell death is concurrent with the development of synaptic connectivity.
  
  "L1 neurons normally project along columns that can be labelled with mAb24B10, and target to layers M1 and M5 of the medulla." The authors should mention that 24b10 labels the photoreceptors, providing a spatial reference to identify medulla columns
  
  "Interestingly, Toll-2RNAi knock-down did not alter the phenotype caused by DNT- 112FL overexpression, and impaired targeting to the same extent as each genetic manipulation alone (Figures 5B,D)."
  
  How do the authors interpret these results? And perhaps the authors would like to explain the rationale of overexpressing DNT-2fl in L1 neurons, that do endogenously express it.
  
  "To conclude, these data show that DNT-2 and Toll-2 are required for appropriate connectivity of L1 neurons to target Mi1 medulla neurons at M1 medulla layer."
  
  The authors characterize neuronal morphologies but do not directly assess connectivity using synaptic markers. While defective morphologies are likely to impact connectivity, the conclusion that DNT-2 and Toll-2 are required for appropriate connectivity should be tempered. The authors should revise their wording to reflect that their data support morphological defects rather than direct evidence of altered synaptic connectivity.
  
  DISCUSSION
  
  "In fact, throughout animal development, between 50% (e.g. in Drosophila) and 80% (e.g. in vertebrates) are lost to naturally occurring cell death"
  
  "of neurons" is missing before "are lost"
  
  "Consistently with these findings, we have shown that the survival of L1 neurons depends on DNT-2 functioning together with Toll-2."
  
  The authors state that "the survival of L1 neurons depends on DNT-2 functioning together with Toll-2." It seems that what they intend to convey is that DNT-2 acts as a ligand for Toll-2. The text should be clarified to explicitly indicate this ligand-receptor relationship rather than implying a cooperative function.
  
  "These data demonstrate that DNT-2 and Toll-2 function together in visual system development."
  
  Since the authors did not use tub-GAL80 or another temporal control to restrict gene expression specifically to development, the observed phenotypes could reflect combined developmental and adult effects. Throughout the text, the authors should revise their wording to acknowledge this limitation.
  
  "Finally, interference with the normal levels of DNT-2 and Toll-2 also impaired axon targeting and dendritic morphology, consistently with the coupling between cell survival with connectivity."
  
  The authors state that interference with normal levels of DNT-2 and Toll-2 "impaired axon targeting and dendritic morphology, consistently with the coupling between cell survival and connectivity." This statement seems tautological, as neurons that die cannot form connections. The authors should clarify whether they are referring to a specific mechanistic link beyond this obvious consequence.
  
  "Our findings are consistent with prior reports that had shown the maintenance of cell survival to be required during neural circuit formation."
  
  This statement seems tautological. It is generally expected that neurons must survive in order to contribute or be part of neural circuits. The authors should clarify if they are highlighting a specific mechanistic insight beyond this obvious requirement.
  
  "In the medulla, Dm8 medulla neurons are produced in excess and are eliminated during connectivity to their R7 inputs (Courgeon and Desplan, 2019a). This is enabled by the cell surface molecular tags DIP in yDm8 binding Dpr11 in yR7, during synaptic matching (Courgeon and Desplan, 2019a)."
  
  The statement that "Dm8 medulla neurons are produced in excess and are eliminated during connectivity to their R7 inputs" is both unclear and inaccurate. It is not evident what is meant by "during connectivity." Moreover, Courgeon and Desplan (2019a) show that Dm8 neurons undergo cell death before or by P40, prior to synaptogenesis. The authors should correct this statement and clarify the timing and mechanism of Dm8 neuron elimination.
  
  "Importantly, the maintenance of cell survival takes place during connectivity, and enables synaptic matching between connecting neurons."
  
  It is unclear what is meant by "during connectivity." Moreover, both Courgeon et al. (2019a) and Xu et al. (2018, 2022) show that these neurons (e.g., Dm8, Dm12, Dm14) undergo cell death before or by P40, prior to synaptogenesis. The authors should clarify the timing and mechanism of cell survival and revise this statement accordingly.
  
  "By contrast, it has also been proposed that apoptosis plays a minor role in cell number control during visual system development, depending instead on cell proliferation and spatial patterning through Dpp/BMP signalling (Malin et al., 2024). However, those findings were based on events taking place at the larval third instar wandering stage, when proliferation and spatial patterning are prevalent, whereas apoptosis peaks in pupa."
  
  It seems that the authors are trying to suggest that different mechanisms control cell numbers at different developmental stages: during larval neurogenesis (L3), cell numbers are regulated primarily by proliferation and spatial patterning, whereas in the pupal stage, neuronal survival via apoptosis plays a key role. If this is the intended point, it should be stated more clearly, as the current comparison to Malin et al. (2024) is confusing and does not make this distinction explicit.
  
  "However, Toll-2 mutant MARCM clones generated in the pupa result in a dramatic loss of lamina neuron dendrites and aberrant axonal navigation in the medulla, as well as widespread neuronal loss (Li et al., 2020)." This statement is puzzling. Neurogenesis occurs during the larval stage until P15, and MARCM requires progenitor cell division. The authors should clarify how MARCM clones were generated during pupation and provide the relevant experimental details in the Materials and Methods. "Importantly, connectivity between L1 and medulla neurons takes place between 20-48h APF, during the period of naturally occurring cell death, and spontaneous activity in the optic lobe takes place at 48h, meaning at least some circuits are connected by then (Kurmangaliyev et al., 2020)."
  
  The authors state that "connectivity between L1 and medulla neurons takes place between 20-48h APF," but no reference is provided for this timing. To my knowledge, no study has directly demonstrated this, so the authors should either provide supporting evidence or revise this statement. Additionally, citing Kurmangaliyev et al. (2020) for spontaneous activity in the optic lobe is not appropriate for this point, as PSINA was originally described by Orkun Akin.
  
  "When altering DNT-2 or Toll-2 levels, L1 axonal terminals in the medulla were misrouted, rather than being confined to a single column. This is reminiscent of the phenotypes caused by alterations in Dscam and Fez levels (Millard et al., 2007, Peng et al., 2018)."
  
  The authors note that altering DNT-2 or Toll-2 levels causes L1 axonal terminal phenotypes reminiscent of phenotypes caused by changes in Dscam and Fez levels (Millard et al., 2007; Peng et al., 2018). However, they only reference these previous studies without discussing whether there could be a shared mechanism. While these comparisons are interesting, the manuscript would benefit from either a deeper discussion of potential mechanistic links or a clear statement that the comparison is purely phenotypic.
  
  "As DNT-2 is secreted in medulla neurons and Toll-2 is expressed along neurons that connect in the medulla (e.g. L1, Mi1, Tm3, Dm9, T4), DNT-2 could help keep connecting neurons together during dynamic cellular events in development."
  
  This sentence is poorly written and vague. It is unclear what "keep connecting neurons together" means mechanistically. Likely, DNT-2 is secreted by postsynaptic medulla neurons (e.g., Mi1), whereas Toll-2 is expressed in neurons innervating the medulla (e.g., L1, Mi1, Tm3, Dm9, T4). The authors should rephrase this sentence to clearly convey their mechanistic and cellular interpretation.
  
  FIGURES
  
  Figure 1 Arrowheads point to what? OL orientations should be described in the figure captions
  
  Figure 5 The title of Figure 5 ("Altering the levels of DNT-2 and Toll-2 modifies L1 axon targeting at medulla M1 layer") is misleading. The correct layer targeting is preserved; what changes is the pattern of unicolumnar innervation. When DNT-2 or Toll-2 levels are altered, L1 neurons innervate multiple columns rather than maintaining their normal single-column specificity. The title should be revised to reflect that the defect is in columnar specificity rather than layer targeting.
  
  Supplementary Figures S1 to S6 Combined UMAPs showing the expression of spz-1, -3 (DNT-3),-4 and -5 (DNT-2) and Toll-1, -2 (18w), -6 and -8 (Tollo) in distinct cells over time.
  
  Information about which UMAP corresponds to which time point is missing
  
  MATERIALS AND METHODS Genetics. Please see S1 Table for the list of the stocks used and Table S3 for full genotypes for each experiment. Table S3 is not the full genotypes for each experiment. This information is partly available in the Source Data excel file
  
  Significance
  
  During development, neurons are initially produced in excess. One mechanism by which the final neuronal numbers are refined relies on trophic support, which maintains the survival of necessary neurons, while excess neurons are eliminated.
  
  In the Drosophila optic lobe, a wave of apoptosis occurs during pupation, peaking at a critical period thought to be essential for establishing final neuronal numbers and supporting proper neural circuit formation. However, the mechanisms underlying this developmental process remain poorly understood. In this manuscript, Alshamsi et al. investigate this wave of apoptosis by examining the role of Drosophila neurotrophins (DNTs), which are encoded by spätzle (spz) paralogous genes and signal through Toll receptors to regulate neuronal survival during brain development.
  
  The authors use translation reporters to demonstrate that DNTs and Toll receptors are differentially expressed across various neuronal types innervating all optic lobe neuropils during development. They then focus on DNT-3 and DNT-2, which they show to be necessary for controlling neuronal numbers, likely by maintaining neuronal survival during pupal stages.
  
  Notably, the results reveal a previously uncharacterized interaction between DNT-2 and Toll-2. The findings suggest that DNT-2 acts as a neurotrophic factor produced by medulla intrinsic neurons, binding to the Toll-2 receptor expressed in other neurons innervating the medulla. By examining Toll-2+ L1 neurons, which are postsynaptic in the lamina and presynaptic in the medulla, the authors provide compelling evidence that DNT-2/Toll-2 signaling regulates L1 neuronal numbers.
  
  Interestingly, the authors also show that disruption of DNT-2/Toll-2 signaling affects L1 axonal and dendritic morphologies. However, the extent to which changes in neuronal survival and neuronal morphology are mechanistically or cellularly linked is not addressed. These findings are consistent with previous reports showing that DNTs and Toll receptors regulate neuronal survival in embryonic, larval, and pupal ventral nerve cords, as well as in the adult. Importantly, DNTs and Tolls can also promote cell death, highlighting their dual role in controlling neuronal number and circuit formation.
  
  While the data is for the most part solid, I have concerns regarding the execution, interpretation of certain results and the conclusions drawn. Additionally, references to previous work are often incorrect or incomplete; I provide several examples below along with non-exhaustive suggestions for improvement. Finally, the manuscript would benefit from careful text revision and improvements to figure presentation, for which I also offer non-exhaustive guidelines.
  
  Overall, I would recommend this manuscript undergo revision before its publication and I would be happy to reassess a revised version that addresses the comments above.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2025.07.18.665476
www.biorxiv.org www.biorxiv.org

Microenvironmental arginine restriction sensitizes pancreatic cancers to polyunsaturated fatty acids by suppression of lipid synthesis

1
1. Public_Reviews 01 Jun 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this study, the authors set out to define how arginine availability regulates lipid metabolism and to explore the implications of this relationship in pancreatic ductal adenocarcinoma (PDAC), a tumor type known to exist in an arginine-poor microenvironment. Using a combination of rigorous genetic and metabolomic approaches, they uncover a previously underappreciated role for arginine in maintaining lipid homeostasis. Importantly, they demonstrate that arginine deprivation sensitizes PDAC cells to ferroptosis through lipidome perturbations, which can be exploited therapeutically via co-treatment with aESA and ferroptosis inducers (FINs). These findings have meaningful implications for the field. They not only shed light on the metabolic vulnerabilities created by nutrient restriction in PDAC, but also suggest a practical avenue for combination therapies that exploit ferroptosis sensitivity. This is particularly relevant in the context of pancreatic cancer, which is notoriously resistant to conventional treatments. The methods employed are broadly applicable to other nutrient-stress contexts and may inspire similar investigations in other solid tumor types.
  
  Strengths:
  
  One of the major strengths of the study is the use of complementary and well-controlled approaches-including metabolomic profiling, genetic perturbations, and in vivo models-to support the central hypothesis. The experiments are thoughtfully designed and clearly presented, and the conclusions are, for the most part, well supported by the data. The findings provide mechanistic insight into nutrient-lipid crosstalk and identify a potential therapeutic strategy for targeting arginine-deprived tumors.
  
  We thank the reviewer for their positive assessment of our manuscript.
  
  Weaknesses:
  
  A key weakness of the study lies in the mechanistic connection between arginine levels and SREBP1 activation. While the authors show that arginine restriction leads to reduced SREBP1 expression, the magnitude of this effect appears modest relative to the substantial changes observed in the lipidome. The study would benefit from a deeper analysis of SREBP1 regulation-particularly whether nuclear translocation or activation is affected. This could be addressed by examining the nuclear pool of SREBP1, using either subcellular fractionation or improved immunofluorescence imaging in both cell lines and tissue samples.
  
  We thank the reviewer for this comment and in our revised manuscript have undertaken several new studies to assess how the nuclear pool of SREBP1 is regulated by arginine starvation. We further identified one mechanism by which arginine starvation suppresses SREBP1 protein levels, namely GCN activation. We believe these additional studies strengthen the manuscript and appreciate the reviewer suggesting these studies.
  
  Another area where additional context would strengthen the manuscript is in the transcriptomic profiling of PDAC cells cultured in a tumor interstitial fluid mimic (TIFM). While the study emphasizes lipid-related pathways, highlighting the most significantly upregulated and downregulated pathways in Figure 1B would give readers a broader perspective on how arginine restriction reprograms the PDAC transcriptome. For instance, because polyamines are downstream of arginine and are known to influence lipid metabolism, it would be worth discussing whether these metabolites contribute to the phenotypes observed. Similarly, an evaluation of whether Dgat1/2 expression is altered could help delineate the full scope of lipid metabolic rewiring.
  
  We thank the reviewer for suggesting this change to our manuscript and we now provide much more extensive analysis of our transcriptomic analyses in Figure 1 – Figure supplement 1, which we think will make our manuscript more useful to readers.
  
  Finally, it is worth noting that the KPC mouse model used in this study is based on conditional deletion of p53, which leads to faster-growing tumors and a distinct tumor microenvironment compared to models harboring the p53^R172H point mutation. Including a brief discussion of this distinction would help readers contextualize the translational relevance of the findings.
  
  We have revised the manuscript to include a discussion of this point.
  
  Reviewer #2 (Public review):
  
  This study by Jonker et al. examines how the metabolic adaptations to the microenvironment by pancreatic ductal adenocarcinomas (PDAC) present vulnerabilities that could be used for therapeutic purposes. The evidence supporting the claims of the authors is mostly solid, and the multiplicity of models used, as well as the combination of in vitro and in vivo work, are appreciated, but some conclusions would benefit from additional substantiation. This work would be of interest to biologists working on the impact of microenvironment and metabolism in cancer, and especially those investigating pancreatic cancer.
  
  We thank the reviewer for their positive assessment of our manuscript.
  
  In this study, the authors use mostly "doublings per day" as an indicator of cell death, notably for Figures 4 to 6. However, proliferative arrest (or a decrease in the proliferative rate) is not necessarily synonymous with cell death. It might be nice to complement these experiments with a true measure of cell death (e.g., PI uptake).
  
  We thank the reviewer for this important comment and have performed extensive additional experiments to measure cell death directly via viability markers in addition to our indirect measurements of cell number at the start and end of experiments. We believe these additions strengthen our claims that PUFAs cause arginine starved PDAC cells to undergo ferroptotic cell death.
  
  The composition of Tumor Interstitial Fluid Medium (TIFM) was published previously, but nonetheless a reminder of the composition of this medium in a Supplemental file of this study might be helpful. In particular, at the start of the Results section, the nature of serum/lipids in the different media should be specifically noted, especially given that the subsequent focus of the work is on lipids/SREBP. It is known that differences in the extracellular availability of lipids can profoundly alter de novo lipid biosynthesis pathways.
  
  We thank the reviewer for this comment. We have edited the text to provide additional context on the composition of TIFM, especially lipid availability. We further have provided a supplemental file with the composition of TIFM. We hope this will make the manuscript more useful and readily interpretable for readers.
  
  Reviewer #3 (Public review):
  
  This important study investigates the impact of nutrient stress in the tumor microenvironment (TME), focusing on lipid metabolism in pancreatic ductal adenocarcinoma (PDAC).
  
  Understanding TME composition is crucial, as it highlights cancer vulnerabilities independent of intracellular mutations, particularly because PDAC tumors are often exposed to limited nutrient availability due to reduced perfusion.
  
  By utilizing a medium that mimics the nutrient conditions of PDAC tumors, the authors convincingly show that TME nutrient stress suppresses SREBP1, leading to reduced lipid synthesis, with low arginine levels identified as a key driver of this suppression. Importantly, mice with arginine-starved pancreatic tumors respond to a polyunsaturated fatty acid-rich diet. This discovery uncovers a synthetic lethal interaction in the tumor microenvironment that could be leveraged through dietary interventions.
  
  The conclusions of this paper are mostly well supported by data; however, below are some aspects that could be further clarified.
  
  We thank the reviewer for their positive assessment of our manuscript.
  
  This study uses PDAC cells from the LSL-Kras G12D/+ ; Trp53 ; Pdx-1-Cre PDAC model. The authors convincingly demonstrate that the cell-extrinsic stimuli of low arginine availability suppress lipid synthesis and thus exert a dominant effect over the cell-intrinsic oncogenic Ras mutation, which is known to enhance fatty acid synthesis. Could the effect of low arginine on lipid synthesis be specific for certain mutations in PDAC? It would be interesting to investigate or discuss whether different mutations show the same SREBP1 reduction caused by low arginine levels, and whether these low SREBP1 levels can be ameliorated by arginine re-supplementation. Here, Jonker et al. show that human PDAC cells cultured in TIFM have reduced SREBP1 levels (Figure 1 - Figure supplement 1C). It would be further supportive of their conclusions if the authors could show that arginine re-supplementation is sufficient to restore SREBP1 levels in human PDAC cells.
  
  We thank the reviewer for this comment. In response, we have now shown that arginine supplementation increases SREBP1 levels and fatty acid synthesis in human PDAC cells (Figure 2 – Figure supplement 2). Further, we have also updated the manuscript to discuss that using the LSL-Kras G12D/+; Trp53; Pdx-1-Cre PDAC model limits our ability to assess how genetic differences influence the response to arginine starvation. We additionally discuss the genetic diversity of the human PDAC cell lines used in these studies, which do include different oncogenic mutations. We believe that these results provide some data that the findings we have made regarding arginine deprivation and SREBP in our genetically defined murine PDAC cell line are applicable to human PDAC cells with more diverse oncogenic lesions.
  
  The authors demonstrate that mPDAC cells cultured in RPMI and subsequently implanted into an orthotopic mouse model exhibit reduced expression of SREBP target genes when compared to in vitro cultured mPDAC-RPMI cells. This finding is in line with the observation that culturing PDAC cells in TIFM downregulates SREBP target genes compared to PDAC cells cultured in RPMI. However, caution is needed when directly comparing mPDAC-RPMI cultured cells to those in the orthotopic model, as the latter may include non-tumor cells and additional factors that could confound the results. The authors should explicitly acknowledge this limitation in their study.
  
  We thank the reviewer for this important caveat and we have revised to text to address this point. Importantly, we note that for all comparisons between in vitro and in vivo cultures, we carefully sort malignant cancer cells from orthotopic tumors prior to analysis. We believe this approach mitigates the impact of stromal contamination on our analyses.
  
  The in vivo evidence demonstrating that PUFA-rich tung oil reduces tumor size is compelling. However, the specific in vitro findings regarding its impact on doubling rates per day, particularly in the context of arginine-dependent PUFA supplementation, require further explanation. To enhance the robustness of their data and conclusions, the authors could consider conducting additional cell viability and proliferation assays. Moreover, it would be valuable to assess whether the observed effects on doubling rates per day remain significant after normalizing the data to the initial doubling time prior to PUFA supplementation. This is in particular important regarding the statement that "Addition of arginine significantly decreases sensitivity to a-ESA" as these cells already start with a higher doubling rate prior to a-ESA treatment.
  
  We thank the reviewer for this important comment and have performed additional experiments to measure cell death directly via viability markers in addition to our indirect measurements of cell number at the start and end of experiments. Furthermore, to address the issue of different rates of cell growth in cultures affecting the response to perturbations, we also used growth rate corrected metrics (PMID: 27135972) to ensure that affects of perturbations on cell growth and viability are not confounded by the baseline proliferative kinetics of the cells under various media conditions. We believe these additions strengthen our claims that arginine starvation sensitizes PDAC cells to PUFAs.
  
  Overall, this paper presents a compelling study that significantly enhances our understanding of the PDAC tumor microenvironment and its complex interactions with the tumor lipid metabolism.
  
  We again thank the reviewer for their positive assessment of our manuscript.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  In this study, the authors employ rigorous genetic and biochemical (metabolomic) approaches to uncover a previously unappreciated role for arginine in regulating lipid homeostasis. They further demonstrate the relevance of this pathway in pancreatic tumors, a solid tumor type often characterized by limited access to extracellular arginine. The authors present compelling evidence that arginine deprivation creates a metabolic liability, rendering tumors more susceptible to lipidome perturbations. This vulnerability can be therapeutically exploited through co-treatment with aESA and FIN to induce ferroptosis. Overall, the conclusions are convincing, the manuscript is well-written, and the figures are clearly presented.
  
  We again thank the reviewer for their positive assessment of our manuscript.
  
  The key weakness of the study lies in the mechanistic link between arginine levels and SREBP1 expression. While the data support the authors' argument, the observed changes in SREBP1 expression following arginine restriction appear modest relative to the more pronounced changes in the lipidome. To strengthen this connection, the authors may consider performing cellular fractionation to focus their analysis on the nuclear (active) pool of SREBP1. Improved immunofluorescence imaging and quantification of nuclear SREBP1 levels in tissues would also provide additional support for their model.
  
  We thank the reviewers for this helpful comment. To strengthen this study, we both examined the nuclear levels of SREBP1 in TIFM cultured cells and worked to identify the mechanistic link connecting arginine levels of SREBP1 expression.
  
  First, we found that arginine starvation does not lead to nuclear exclusion of SREBP1. We believe this finding strengthens our conclusion that arginine starvation regulates SREBP1 at the level of protein expression. We do agree with the reviewer that the change in SREBP1 protein level is modest, but we do show the effects of arginine on PDAC cell lipid metabolism are SREBP1 dependent (Figure 3O-P, Figure 5F, Figure 5 – Figure supplement 2D). Thus, we interpret these data that even the relatively modest change in SREBP1 protein levels are sufficient to cause large changes in the output of this transcription factor and the cellular lipidome.
  
  Second, we determined if the arginine-responsive GCN2 signaling pathway, which is known to regulate SREBP1, could contribute to the suppression of SREBP1 observed in PDAC cells. We found that GCN2 signaling is activated in PDAC cells in TIFM culture by arginine starvation and is active in animal tumors. We further found that activation of GCN2 is in part responsible for suppression of SREBP1, which is consistent with prior literature describing a role for GCN2 activation in suppressing SREBP1 translation (PMID: 17276353). Thus, while other mechanisms are at play in transducing arginine starvation to reduced SREBP1 protein levels, we have identified one mechanism (activation of GCN2) by which arginine starvation suppresses SREBP1, leading to the lipidomic changes we observed upon starvation of this amino acid.
  
  In addition, it would be helpful for the authors to highlight the most significantly upregulated and downregulated pathways in Figure 1B to give a more comprehensive view of transcriptomic changes in PDAC cells cultured under TIFM conditions. For example, since polyamines are downstream of arginine and known to regulate lipid metabolism, could some of the observed effects be attributed to changes in polyamine levels? Similarly, do arginine levels affect the expression of Dgat1 or Dgat2?
  
  We have added an additional Figure supplement to Figure 1 that include a comprehensive list of up- and downregulated gene sets in PDAC cells cultured in TIFM via GSEA analysis. We also added additional KEGG metabolic pathway analysis via GATOM (PMID: 35639928). We hope these additions will be useful for readers and point their attention to other metabolic pathways that are significantly altered by nutrient stress, such as the TCA cycle and oxidative phosphorylation, beyond those related to lipid metabolism that we investigated here.
  
  From this analysis, we did not specifically note strong changes in the expression of polyamine metabolic enzymes or DGATs.
  
  Finally, the KPC model used in this study involves conditional deletion of p53, which is known to produce tumors with a faster progression and a distinct tumor microenvironment compared to the more commonly used p53^R172H knock-in model. Including this point in the discussion would help contextualize the findings.
  
  We thank the reviewers for mentioning this limitation of our study. In the results section of the test, we now included a discussion of the limitations of the mouse model used in the discussion of the work. We also highlight in the text now that in addition to our studies using the murine p53 deletion model that our studies make use of human PDAC lines that contain p53 mutations. We believe that these results provide some data that the findings we have made regarding arginine deprivation and SREBP in our genetically defined murine PDAC cell line are applicable to human PDAC cells with more diverse oncogenic lesions.
  
  Minor comments to improve clarity:
  
  (1) In Figure 3C, it would be helpful to annotate the PE-linked TG for clarity.
  
  We do not understand exactly what PE-linked TGs refers to. We note in Fig. 3C that ether-linked triglycerides are labeled in orange and annotated as O-TG and vinyl ether-linked triglycerides are labeled in grey and annotated as P-TG.
  
  (2) Is Figure 3P mislabeled? Both conditions are labeled as +Arg / -lipid.
  
  We thank the reviewers for pointing out this mistake in the figure and have updated it to correctly label these samples as sgSREBP1 and sgNTG transduced PDAC cell lines.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Figure 1B: Misspelling in Y axis "Normalized enrichment score".
  
  We thank the authors for catching this mistake and have corrected this error.
  
  (2) Figure 1B: Could the authors elaborate on why they decided to focus specifically on these three hits, which are not the most downregulated genes (the "top hits") appearing in the GSEA?
  
  We chose to focus on lipid metabolism as multiple transcriptomic analysis tools, namely GSEA and GATOM, which specifically focuses on enrichment in KEGG annotated metabolic pathways, highlighted lipid synthesis as being the most transcriptionally regulated metabolic pathway in TIFM. To make this apparent to readers, we added an additional Figure supplement to Figure 1 that includes a comprehensive list of up- and downregulated gene sets in PDAC cells cultured in TIFM from GSEA and GATOM analysis. We hope these additions will make the logic for our focus on lipid synthesis clear and will be useful for readers in highlighting other metabolic pathways that are significantly altered by nutrient stress, such as the TCA cycle and oxidative phosphorylation.
  
  (3) Figure 1: It might improve the clarity of the text if the three pairs of murine cell lines (mPDAC1, mPDAC2, mPDAC3) were introduced in a bit more detail in the main text and not just in the figure legend.
  
  We have added more detail describing the three mouse cell lines used in the main text.
  
  (4) Figure 1E: The authors may wish to comment on why they chose to perform transcriptomic analyses with the mPDAC3 derived models, and not mPDAC1 or mPDAC2, given that mPDAC3 appears to exhibit the most distinct phenotype of the three, according to the results presented in Figure 1 J-L.
  
  The transcriptional analysis described in Fig. 1E was performed on a previously acquired dataset using mPDAC3 cell lines (PMID: 37254839), which is why this line was used. We have revised the text to make it clear that this transcriptional analysis uses pre-existing data from a previous publication.
  
  (5) Figure 1L: The authors may wish to clarify why they only show relative palmitate to assess global fatty acid biosynthesis in these cell lines. There is a decrease in labeled palmitate of mPDAC3 cells cultured in TIFM in comparison to the cells cultured in RPMI media, showing a decrease in the lipid biosynthesis of these cells in these conditions. However, there also seems to be lower palmitate levels in the TIFM-cultured mPDAC3 cells specifically, in comparison to their mPDAC1 and mPDAC2 counterparts. Why is that? Could the authors comment on this result?
  
  We thank the reviewers for this helpful observation. In Figure 1L (now Figure 1N), we wanted to show how culture conditions (RPMI/TIFM) affected both the total amount of palmitate in PDAC cells but also the fraction that is labeled (i.e. arising from de novo synthesis). We think this provides more information for readers by allowing them to assess both changes in pool size of palmitate and changes in the fraction of palmitate that is synthesized. We like this presentation as it shows clearly that while total palmitate levels behave differently across cell lines (with TIFM culture reducing levels in mPDAC1-2 but increasing levels in mPDAC3) the amount of palmitate that is synthesized de novo is decreased in all three cell lines when cultured in TIFM. To highlight this, we also present the fraction of palmitate that is labeled in Fig. 1O.
  
  We are unsure why TIFM culture reduces total palmitate levels in some PDAC cell lines, while others are able to maintain total palmitate pools. We assume that TIFM cultures increase lipid uptake to compensate for lack of synthesis, and potentially differences in lipid scavenging capacity between the lines could explain this difference. We are currently working on experiments to test these hypotheses and will present the results in a future study.
  
  (6) Figure 2 - Figure Supplement 1A: It would be informative and appreciated to know which nutrients are actually represented and correspond to certain points on the graph, in particular for the ones that are the most differentially present in the two different media.
  
  We have now updated this graph to highlight key metabolites that are most differentially abundant between the two media. We also now provide as a Supplementary file the composition of TIFM, which provides readers with all the information needed to understand which metabolites are differentially abundant in TIFM and any media they wish to compare.
  
  (7) Figure 2 - Related to Figure supplement 1D: It would be useful to know how or why arginine was selected for further investigation from the subset of amino acids. The authors could elaborate on this, by showing or highlighting the data that drew attention to this amino acid initially.
  
  We thank the reviewers for this note. We have tried to make Figure 2 – Figure supplement 1 more clear as to how arginine was selected for further investigation. We have updated the figure to improve clarity for the comparisons of different media that enabled us to identify differences in amino acids between RPMI and TIFM as driving the difference in lipid metabolism. We have also highlighted in Figure 2 – Figure supplement 1A that arginine is the most differentially abundant amino acid and editing the text to explain the logic that this high degree of differential abundance is why we focused on arginine amongst all the amino acids as a likely candidate for regulation of SREBP1.
  
  (8) The legends for Figures 2G and 2H could be improved, i.e., making clearer that 2H shows incorporation in the circulating fatty acids, unlike 2G.
  
  We have updated the figure with improved labeling as the reviewer suggested to denote which panels correspond to which sample type.
  
  (9) Figure 3E and 3G: The heatmaps displayed here show that the addition of arginine to TIFM culture medium restores fatty acid synthesis; however, it appears that the nature of the lipids synthesized in this condition may differ from the ones synthesized in RPMI cultured conditions.
  
  We have added additional text highlighting that arginine supplementation to TIFM and RPMI culture led to induction of different SREBP1-target genes, but that both lead to activation of fatty acid synthesis and desaturation genes, which contributes to the focus of our study on de novo synthesis of saturated and monounsaturated fatty acids in the study.
  
  (10) Figure 3O: The SREBP1 immunoblot still seems to show some residual bands for the cells transduced with SREBP1 targeting sgRNAs, therefore, the authors may want to be more nuanced and present this model as a KD, instead of a KO, as mentioned in the text?
  
  We agree with the reviewer’s suggestion, and we have changed the text to describe these as knockdowns rather than full knockouts.
  
  (11) Figure 3P: Is it possible that there is an error in the legend of the figure (Lipids + for the first bar and - for the second one?). The figure could also be improved by a legend that explains what the different colored bars represent.
  
  We thank the reviewers for pointing out this mistake in the figure and have updated it to correctly label these samples as sgSREBP1 and sgNTG transduced PDAC cell lines.
  
  (12) Figure 4: The authors are stating in Figure 4 - Figure supplement 1A-F, that argininerestricted mPDAC cells are not sensitized to xCT or GPX4 inhibitors that trigger ferroptosis and that therefore SREBP1 suppression by arginine restriction in the TME does not sensitize PDAC cells to ferroptosis inducers. However, this does not appear to be so clear with the data shown. This might be due to the limitations associated with the population doubling measurements instead of the lethality measures noted above. Likewise, later it is proposed that arginine restriction sensitizes both mPDAC cells and human PDAC cells to α-ESA induced ferroptosis. These results would benefit from a direct measure of cell death. Related to the above point, it would be useful to better understand why cells cultured in arginine-deprived TIFM do not appear to be sensitized to ferroptosis inducers, but these same cells die from ferroptosis when treated with α-ESA. It would be useful to present some thoughts.
  
  We thank the reviewers for bringing up this important point. To the reviewers first point, we repeated xCT and GPX4 inhibitor treatment experiments to include both growth corrected (PMID: 27135972) proliferation assays and Sytox-based viability assays. In both cases, we did not find consistent sensitization to xCT or GPX4 inhibitors across multiple PDAC lines when cultured in TIFM. In contrast, we found consistent sensitization to PUFA treatment across multiple murine and human PDAC cell lines cultured in TIFM. Together, this analysis suggests that arginine starvation specifically sensitizes PDAC cells to PUFAs, but not other ferroptosis inducers.
  
  We agree with the reviewer that this is an interesting and unexpected observation. We do not have a mechanistic understanding as to why this is the case. However, we believe this is quite interesting and suggests that PUFAs maybe a better method of inducing ferroptosis in certain conditions than other ferroptosis inducing approaches. We have added text to the discussion to highlight this interesting and unexplained observation.
  
  (13) Figure 6: The authors mention that α-ESA is used here at sublethal doses, which do not affect viability or proliferation, but this is not shown in either the main or supplementary data. These data should be provided somewhere. It might also be nice to mention in the main text (not just in the legend) the dose of α-ESA used for the combination treatments.
  
  We thank the reviewers for this helpful suggestion. To illustrate that α-ESA is used at a sublethal dose, we altered each panel to be on a linear rather than logarithmic x-axis, therefore including the DMSO control arm for each ferroptosis inducer in combination with α-ESA. We hope this now clearly illustrates that this dose α-ESA is not perturbing cell growth or viability in these assays.
  
  (14) Figure 6B: Fer-1 treatment does not seem to rescue the phenotype very clearly. This could again be because cell death is being conflated (to degree) with effects on proliferation, and Fer-1 is not expected to affect cell proliferation. Again, measuring cell death directly would be better than measuring population doublings.
  
  We thank the reviewers for this helpful comment. To address this concern, we have added Sytox-based viability assays to figure 6. These assays indicate that Fer-1 treatment rescues the viability of PDAC cells treated with ferroptosis inducers, α-ESA, or the two in combination.
  
  Reviewer #3 (Recommendations for the authors):
  
  General notes:
  
  (1) It would be easier for the reader if one condition were consistently placed in the same position throughout the graphs. For example, RPMI results should always appear first and TIFM second. Currently, this is inconsistent throughout the manuscript (e.g., Figure 1 - Figure Supplement 1: RPMI is first and TIFM second; Figure 2 - Figure Supplement 1: TIFM is first and RPMI second).
  
  We thank the reviewers for this note. We have updated the figures to remain consistent in their ordering throughout the manuscript.
  
  (2) Please briefly explain the differences between PDAC1-3 and clarify why most follow-up experiments were conducted using PDAC1. Presumably, this was because PDAC1 showed the most robust effect on fatty acid synthesis.
  
  We have added additional text in the results section of the manuscript describing the different murine PDAC lines used in this study. We performed most studies with mPDAC1 as this line has robust differences in fatty acid synthesis between culture conditions. However, murine PDAC lines recapitulate the transcriptional subtype diversity of PDAC (PMID: 29364867), so we critically repeat key experiments in multiple mPDAC lines to determine if a given finding is translatable to other PDAC subtypes.
  
  (3) Are only SREBP1 protein levels affected or are SREBP1 RNA levels also decreased in low arginine TME?
  
  We appreciate this important comment. We have added SREBP1 RNA levels to Figure 1 to show that RNA levels do not differ between conditions, whereas protein levels of SREBP1 change significantly.
  
  (4) What was the rationale for investigating lipid metabolism even though it was not the top changed metabolic gene signature? It would be interesting to briefly discuss which pathways were the most enriched.
  
  We chose to focus on lipid metabolism as multiple transcriptomic analysis tools, namely GSEA and GATOM, which specifically focuses on enrichment in KEGG annotated metabolic pathways, highlighted lipid synthesis as being the most transcriptionally regulated metabolic pathway in TIFM. To make this apparent to readers, we added an additional Figure supplement to Figure 1 that includes a comprehensive list of up- and downregulated gene sets in PDAC cells cultured in TIFM from GSEA and GATOM analysis. We hope these additions will make the logic for our focus on lipid synthesis clear and will be useful for readers in highlighting other metabolic pathways that are significantly altered by nutrient stress, such as the TCA cycle and oxidative phosphorylation.
  
  Further comments:
  
  (1) Figure 1 Supplement 1A: It is not clear which SREBP target genes are significant. Please indicate this more clearly.
  
  The analysis in this section was done on expression level of all the indicated genes between groups (tumor/normal) rather testing for significance of individual genes between the two groups. We have updated both the text and the figure legend to clarify this as the statistical analysis that was performed.
  
  (2) Figure 1J and 2C: The Western blot loading control (Actin) does not appear equal across all samples. It would be helpful to include a quantification normalized to the Actin loading control.
  
  We have included quantification of each western blot to help interpret these immunoblots.
  
  (3) Supplementary Figure 2: How often has this experiment been performed? The TIFM results appear to consistently show the same values. If this is the case, it needs to be labeled appropriately.
  
  Thank you for pointing out that how we presented the data was confusing as to how the experiment described was performed. Initially, we performed multiple separate experiments to identify arginine starvation as the TIFM-driver of SREBP1 suppression. To compare across all the separate media conditions, we performed one experiment with all the relevant media conditions together, which is the experiment that is described in the manuscript. Thus, there was one set of control TIFM/RPMI conditions to which we compared all of the different media conditions. As we initially presented the data, it appeared as if we had performed multiple experiments in which the TIFM/RPMI controls had exactly the same behavior, which is not the case. We have updated the data presentation in this figure to make it clear that this was the experimental design for the data presented.
  
  (4) Figure 3P: Please add a legend for this panel.
  
  We thank the reviewers for point out this mistake in the figure and have updated it to correctly label these samples as sgSREBP1 and sgNTG transduced PDAC cell lines.
  
  (5) Figure 4 - Figure Supplement 1: Please review the legend carefully. The legend currently includes only circles, but some of the graphs (A and F) display squares.
  
  Thank you for catching this mistake. We have updated the panels and legends for this figure so they are concordant.
  
  (6) Figure 4D: The effect of a-ESA treatment on the doubling delta of arginine-treated versus non-treated TIFM cells looks similar. It looks like the difference is because cells treated with arginine start at higher doubling values from the beginning. I would suggest looking at the delta and subsequently tone down the statement: "Addition of arginine significantly decreases sensitivity to a-ESA."
  
  Thank you for this helpful comment. To avoid any confounding effects of differences in basal growth rate between mPDAC cells grown in different media, we have converted all of our data to GR values as described in (PMID: 27135972) which enables us to take into account the basal growth rates of cultures when calculating the effects of treatments/perturbations on culture growth and viability. We hope this addition makes the effect that arginine has on α-ESA sensitivity clear beyond the impact that arginine has on basal growth rate.
  
  In addition, we also measured the viability of α-ESA treated mPDAC cells with and without supplemental arginine (current Fig. 5E) by Sytox-exclusion assay. We believe this new data supports the claim that arginine makes PDAC cells resistant to the addition of exogenous PUFAs.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.10.642426v2
May 2026
www.biorxiv.org www.biorxiv.org

Constraints on the G1/S transition pathway may favor selection of multicellularity as a passenger phenotype

1
1. Public_Reviews 29 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Ducrocq et al. present research exploring the genetic link between simple multicellular group formation (ace2Δ/ace2Δ) and its interaction with cell-cycle progression mutants (e.g., cln3Δ/cln3Δ), demonstrating that this combination can provide fitness benefits during fluctuating resource conditions, resulting in a rapid increase in the fraction of multicellular cell-cycle mutants over unicellular yeast without selection for multicellular size. Because both the multicellular phenotype and the regulatory link enabling faster escape from the stationary phase are controlled by the Ace2 transcription factor, this work demonstrates that multicellularity can arise as a side-effect of a completely independent fitness advantage unrelated to the benefits of group formation itself. As a "passenger phenotype," multicellularity could thus emerge for other selective reasons, potentially facilitating a later transition to more entrenched multicellularity if novel conditions arise where group formation becomes directly beneficial.
  
  Strengths:
  
  This work is novel and exciting for research exploring the very first steps of the transition from unicellularity to simple multicellularity. This is particularly significant because the formation of multicellular groups is almost always assumed to come at a cell-level fitness cost due to reduced reproductive fitness compared to remaining unicellular. This cell-level fitness cost generally needs to be outweighed by the benefits of multicellular group formation (e.g., large size escaping predation) for the multicellular phenotype to be stable, which is true for a large number of cases studied in the literature, where the multicellular phenotype can only evolve over unicellular competitors under strong selection for multicellular groups. However, this study presents an interesting case of a genetic and environmental condition under which individual cells (forming simple multicellular clusters) can actually have higher reproductive fitness than unicellular yeast. This demonstrates that the assumed cost at the single-cell level does not always apply. In summary, this work represents a unique example contrary to common assumptions regarding the costs of multicellular phenotypes, showing that simple multicellular phenotypes can evolve and remain stable without requiring strong selection for multicellular size or other benefits of group formation.
  
  The claims and interpretation of the results align well with the data presented. This is due to the careful and straightforward experimental design testing predictions with a clear, stepwise methodology, ruling out alternative explanations and providing support for the proposed link between the mutations (ace2, cln3, and others), their impact on faster exit from quiescence, and thus earlier entry into reproduction in fresh media, resulting in higher fitness in the snowflake yeast phenotype compared to unicellular yeast.
  
  Weaknesses:
  
  The authors show that the same multicellular phenotype with higher cell-level fitness due to faster exit from the stationary phase can also be observed with alleles found at other loci in non-laboratory yeast strains, implying that the results are likely not specific to a peculiar case genetically engineered in laboratory strains, but that similar phenotypes may be present in nature. However, this remains to be explored further by examining the natural ecology of commercially available or wild yeast isolates and their genomes. This is by no means a weakness of this study and, therefore, not necessarily something the current work can improve. It does mean, however, that the relevance of these findings for early multicellularity in yeast, and even more so for nascent multicellularity in distinct taxa, remains to be explored in the future. Until then, it is difficult to make strong claims about how applicable these results would be for non-laboratory yeast and other taxa. Regardless, this work does its part by representing a very exciting finding.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Here, the authors attempt to demonstrate that a simple model of multicellularity - snowflake yeast - exhibits key ecologically relevant changes in the regulation of the cell cycle. By examining the effects of the ace2 mutation in environments where multicellularity is not directly selected for or against, and combining it with mutations in key cell cycle regulators, they hope to show that mutations driving simple multicellularity can be selectively favored due to their effects on the release from quiescence rather than their effects on multicellularity itself.
  
  Strengths:
  
  The experiments performed are extensive and thorough. The yeast genotypes examined are judiciously chosen, so as to map out a functional model of the relationship between alterations to cell cycle control and changes to multicellularity phenotypes. Multiple possible interactions are examined, with the causal link and model of the relationship between the multicellular passenger phenotype and the selectable quiescence-release phenotype being well-supported. There are extensive controls demonstrating the separation between the 'passenger' multicellular phenotype and the cell cycle regulation phenotypes examined, including haploid/diploid strains with different multicellular phenotypes but similar cell cycle regulation phenotypes, and phenocopy strains in which downstream enzymes are deleted rather than key central regulators.
  
  Weaknesses:
  
  My only concerns about these results relate to the focus on selection on cell cycle control being examined in a model of multicellularity with key core cell cycle mutations rather than in a wild-type background, as this is a somewhat artificial system.
  
  I believe, however, that the authors convincingly make their case that this work on the multicellular phenotypes of yeast represents a potent proof-of-concept that simple multicellularity can be driven into existence or selected for as a passenger phenotype due to pleiotropic effects of mutations under selection from real-world ecological pressures. They are able to connect this phenotype back to known mutations of particular cell cycle regulators (RB) in other multicellular lineages and demonstrate that ecologically relevant changes to the cell cycle are connected to multicellular phenotypes. As a proof of concept of the connection between these phenotypes, rather than a study of a particular event in the past of a living lineage, it makes a strong case.
  
  A longstanding question in the field of multicellularity is the selective pressures that can drive simple multicellularity into existence and then act on simple multicells to drive their increased size and complexity. This work brings to the table tangible evidence of the possibility that, instead of being selected for on its own, simple multicellularity can be a side-effect of selection on other key phenotypes.
  
  This separates the question of the origins of multicellularity and the forces that drive its further evolution. This separation can reframe how the field is studied, especially in the context of the apparent dichotomy between dozens of origins of 'simple' multicellularity across the tree of life and a few origins of 'complex' multicellularity in the history of Earth. Especially in light of other evidence that multicellularity is connected to changes in cell cycle regulation, I believe that this is an important insight that will alter the way we think about the origins of this key evolutionary transition.
  
  We thank the reviewers for their insightful comments on our work.
  
  We agree with reviewer #1 that further experiments would be needed to figure out how the observations done on lab strains can apply to yeast in various ecological conditions and particularly in the wild. We here provide a proof of principle that multicellularity selection can arise as a side-effect. It obviously does not prove that it took place during yeast evolution, but we would like to emphasize that resource fluctuations are very common in ecological conditions, making it highly likely that the environmental conditions necessary for the selection of the side effects described have arisen.
  
  We agree with reviewer #2 that our work on yeast strains is “somewhat artificial” as often the case with model organisms under laboratory conditions. Importantly though, we showed that the effect found with the cln3 knock-out mutation can be phenocopied by overexpression of WHI5 (encoding the yeast equivalent of Rb). We propose that variations in the levels of cell cycle regulators during evolution may have played a role in multicellularity selection as a side effect. We agree that this is merely a hypothesis to explain the selection of multicellularity (just like predator escape) and that there is no direct evidence that this occurred in the history of the lineage. Nevertheless, our work provides a first evidence that such a selection of multicellularity as a side effect could be possible, and gives a framework to understand how multicellularity can persist in the wild, even when it is not the primary target of selection.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  As mentioned in my public review, I very much appreciate this work, its interpretation for early multicellularity as an example opposite to the assumed cost of multicellular phenotypes, and the robust design behind the premise and claims. Therefore, my suggestions below are mostly aimed at improving the readability and data presentation.
  
  (1) In the abstract, Lines 24-27 (the last sentence): This statement is worded too generally and therefore reads as too strong. I think the authors' work provides an example that multicellularity itself does not need to be beneficial all the time - this is really exciting and makes sense! However, there is a substantial body of work showing the origin and maintenance of multicellularity for its direct benefits. Relative to that body of work, this represents a special case, and therefore, while we should definitely reconsider the view that "multicellularity always comes at a cell-level fitness cost," we cannot overgeneralize these findings. Please consider reframing this statement.
  
  Done, now line 25 (addition of “in some cases”)
  
  (2) Line 48 (Introduction): "This mostly concerns two major regulators, RB and Cyclin D." Which organisms are you referring to? Please specify.
  
  Done.
  
  (3) In the Introduction, there are at least three sentences that need citations: L57-58, L59-60, and L65. For instance, I do not know what makes CLN3 the yeast functional equivalent of RB, and I wanted to verify this claim, but no references are cited. Please ensure citations are provided throughout the manuscript.
  
  Done: ref 11,12 and 13 were added
  
  (4) This is my main request regarding data collection and presentation. The authors share some microscopy images of mutant strains in Figure 2 for different purposes (e.g., Figure 2B compares the fraction of budded cells between two genotypes). However, I would appreciate seeing a collected microscopy figure showcasing the phenotypes of all genotypes that went into competition experiments, including the planktonic (WT lab strain) yeast, either where they appear or in a supplementary figure, all presented with the same magnification and scale to make them comparable. Because cell size, shape, and multicellular phenotype are all key aspects of the competition experiments, being able to see all those genotypes/phenotypes would prepare the reader to make predictions about the fitness assays and other experiments.
  
  Done Supplementary Figure 1 B-E were added
  
  (5) Related to my previous point, I would appreciate seeing cell size measurements for the different genotypes (both single cells of planktonic genotypes and single cells forming multicellular clusters). Cell size is a key trait that directly impacts the results shown in the paper, and summary statistics comparing them would be helpful for interpreting the results.
  
  Done Supplementary Figure 1 F was added
  
  (6) In competition experiments, the authors mix unicellular and multicellular yeast clusters at 50/50 and measure the fraction of a phenotype of interest (usually the % of snowflake). It took me a while to understand what is being counted under the "% snowflake yeast" category. This is because, while each cell in unicellular yeast should be counted as one unit, one can count a snowflake yeast composed of 50 cells as 50 units or as 1 unit. Please clearly state what is being counted for the Y-axis labeled "% of snowflake yeast" (or relabel those Y-axes in plots to make this clear).
  
  Done: Added in figure legend 1A and Y-axes of competition figures
  
  (7) I recommend editing the genotype labels in figures (see, for instance, Figure 1B, C, D). In Figure 1B, the bars are labeled as "CLN3/CLN3 co-culture" or "cln3Δ/cln3Δ co-culture," etc. These are actually co-cultures of SF vs. PK (with or without a CLN3 copy). Please consider using more representative labels that will be easier for readers to understand.
  
  Done: this has been changed in all concerned figures
  
  (8) In the Results, L225, you begin referring to AMN1368D as AMN1. I suggest using the full allelic form throughout the text so it will be clear each time that you are referring to that specific allele, as I was confused about whether you were discussing the allele or the gene AMN1 itself.
  
  This has been changed throughout the text.
  
  (9) Discussion, Lines 250-252, states that this is a "situation that is likely to happen very often under ecological conditions." Are there any examples you can cite?
  
  Done, as also requested by reviewer #2 (now line 256-7)
  
  (10) Lines 272-275 contain a strong, general statement suggesting that co-evolution of cell cycle regulation and multicellularity could be more general (which is acceptable as speculation). However, the suggestion that this co-evolution could have "started very early in the evolution of eukaryotic cells" is too speculative. I would recommend sticking with the alternative, suggesting that the link between the two phenotypes may be a case of convergent evolution.
  
  Done
  
  (11) Lines 278-279 are both vague and too bold. The text mentions a link between cancer and multicellularity and then extends this link through cell cycle regulators. Without explaining the connection between cancer and multicellularity and then trying to link it to cell cycle regulators, all in a few words without background, this sentence is too vague. Please consider deleting this or spending more time clearly explaining the link, which would at best still be speculative.
  
  These speculative sentences were removed.
  
  (12) First, I wanted to note that I highlighted Lines 284-287, as this passage is clearly written and provides a nice argument. I also wonder if you could mention that your work shows simple multicellular cluster formation should not always come at a cost, contrary to the general assumption in the literature, and add a few citations to support that claim. This would highlight how significant this work is within the broader multicellularity literature.
  
  Changed in discussion (now line 242-4 with additional references 30 and 31)
  
  (13) I recommend labeling the genotype of your "quintuple mutant" in Figure 3. You can refer to it as the quintuple mutant in the text, but I had to go back and forth to see what those mutations were when trying to think about potential genetic interactions. Even the legend of Figure 3 does not specify the genotype and refers to it only as the "quintuple mutant."
  
  Now explicitly stated in the title of the figure
  
  Reviewer #2 (Recommendations for the authors):
  
  I find the presented research to be of high quality, with very important implications. I have suggestions for improvement of the manuscript, but they are largely stylistic, with one paper that I believe deserves citation regarding the proteins involved. I see little need for additional experiments or analysis, just a clearer description of the results and their significance.
  
  (1) Line 62: Yeast CLN3 definitely performs the same role as cyclin D in the cell cycle, but has an unclear phylogenetic relationship with the rest of the cyclins. See Cross, Buchler, & Skotheim 2011 ("Evolution of networks and sequences in eukaryotic cell cycle control"). This reference also covers the functional relationship between RB and Whi5, referred to in nearby sentences, as does Medina, Walsh, and Buchler 2019 ("Evolutionary innovation, fungal cell biology, and the lateral gene transfer of a viral KilA-N domain").
  
  The reference has been added
  
  (2) Line 69: Is the question whether the evolution of G1/S regulation favoring multicellularity the question, or the two of them being connected such that the evolution of one can affect the other?
  
  It is clearly the first of the two questions.
  
  (3) Line 73: Comma after Ace2.
  
  Done
  
  (4) Line 76: It would be clearer to specify that snowflake and ACE2 yeast were co-cultured without settling selection or other selection that explicitly favors multicellularity, unlike in experiments where multicellular evolution is observed, as in Ratcliff publications.
  
  This is now specified.
  
  (5) Line 80: Specify which phenotypes observed for ace2 mutants are observed, specifically, both the multicellularity and the release from quiescence.
  
  Done
  
  (6) Line 146: This observation should be noted as another indication that the multicellular phenotype is not behind the selective pressure, because it is so different between unicells and multicells.
  
  Overall, you have very strong evidence that this is the case, and emphasizing this would benefit the paper!
  
  Done.
  
  (7) Line 151: specify that you are maintaining yeast in proliferation in coculture.
  
  Done.
  
  (8) Line 181: This is another key experiment showing that the multicellular phenotype is not the causal reason for the change in quiescence. It might make things clearer to bring all these confirmatory experiments together, particularly the haploids and the sonicated single cells.
  
  This is now clearly stated line 195.
  
  (9) Line 225: The choice of referring to the non-laboratory strain as the 'AMN1' wild type default may be confusing to readers, who may treat the genetic background you are using as the ground truth wild type. I recommend throughout the paper always specifying the allele's amino acid to avoid any confusion.
  
  The genotype is now clearly presented throughout the text.
  
  (10) Line 238: I would continue to specify that the multicellular phenotype has no selective advantage, specifically when no selection for size is applied.
  
  See added sentence Line 242-4 (revised version)
  
  (11) Line 243: I would say that the evolution of cell cycle regulation may interact with the multicellular phenotype.
  
  This was changed (now line 248)
  
  (12) Line 244: Strike 'indeed' and the 'the' before AMN1 and ACE2.
  
  Done
  
  (13) Line 252: Suggest some ecological conditions under which quiescence exit is likely, such as boom and bust or moving from rotting fruit to rotting fruit.
  
  Done
  
  (14) Line 267: Are you suggesting that the specific genes AMN1 and ACE2 had particular effects on actual organisms in the past, or that it represents a broad pattern of evolution in which multicellularity could be more broadly related to exit from quiescence? I believe it is the latter, and I think that should be clearer.
  
  Modified as suggested
  
  (15) Line 280: In this paragraph, I think that the point being made could be slightly clearer - if I am not mistaken, you are making the distinction between the appearance of multicellularity and its refinement under selection, and that the former may be more common than previously believed, given this proof of concept. I think this can be made clearer. Furthermore, it is worth noting that all experiments that show effects of the multicellular phenotype are in mutant backgrounds, and explaining why this is still relevant to wild organisms. It might be taken by some as indicating that the multicellular phenotypes are not relevant to a wild population, but the connection to known RB mutations in known multicellular lineages and the fact that it is connected to a very key aspect of cell cycle regulation, I think, overcomes this issue, and this should be made clear.
  
  Our study reveals a genetic link between multicellularity and Whi5 and Cln3, two important G1/S cell cycle regulators. Similar genetic interactions have been observed in phylogenetically distant species, reinforcing the idea that the interplay between cell cycle regulation and multicellularity is a general feature and not a mere artifact of mutant background.
  
  The neutral fitness effect of multicellularity in wild-type backgrounds is particularly of interest. By being maintained as a side effect of selection on fundamental cellular processes, the neutral effect of multicellularity may have provided “an evolutionary scheme” for its repeated emergence throughout the tree of life. As such, the "passenger selection" hypothesis fits well with the observations of phenotypic reversibility and facultative multicellularity, despite varying and specific selective pressures. Our work thus gives a framework to understand how multicellularity can persist in the wild, even when it is not the primary target of selection.
  
  (16) Line 314: What promoters are they driven by?
  
  Specified
  
  (17) Line 336: What was the culture volume, and the volume transferred?
  
  Specified
  
  (18) Line 362: How was the proportion of blue-stained cells scored? Manually, or with an imaging software cutoff?
  
  Specified
  
  (19) Figure 1: I think that the full genotypes of each strain should be specified, either in the legend or the key of the figure, rather than always specifying the ACE2 genotype and other mutations separately.
  
  Done as requested by reviewer #1
  
  (20) Figure 2E, 2F: Same as Figure 1, regarding genotypes.
  
  Done
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.20.671217v3
www.biorxiv.org www.biorxiv.org

ARHGEF6-dependent cytoskeletal regulation underlies a conserved program of forebrain interneuron development

1
1. Public_Reviews 29 May 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.
  
  Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.
  
  There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.
  
  Strengths:
  
  The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.
  
  Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.
  
  There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.
  
  We thank the reviewer for their positive and thoughtful assessment of our manuscript. We appreciate their recognition of the technical breadth of the study, including the integration of mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models. We are also grateful that the reviewer highlights the value of our cross-species approach, as a major goal of the study was to determine whether ARHGEF6 loss produces convergent developmental and cellular phenotypes in both mouse and human systems.
  
  Weaknesses:
  
  Despite the strengths mentioned above, the study has some conceptual and experimental weaknesses that reduce its impact. The mechanistic insight is limited, as the research does not directly establish how ARHGEF6 regulates downstream signaling pathways.
  
  We appreciate the reviewer’s constructive comment. We agree that, although our data establish a phenotypic link between ARHGEF6 loss and interneuron development, they do not directly dissect the molecular mechanisms underlying the observed defects. Our interpretation that the mutant phenotype involves dysregulation of cytoskeletal dynamics is based on the directly observed defects in actin polymerization and organization in neural progenitor cells and neuronal growth cones respectively, and is consistent with the abnormalities observed in neurite morphology and neuronal migration. This interpretation is further supported by the established role of Arhgef6 as a regulator of the small Rho GTPases Rac1 and Cdc42. Previous evidence shows that Arhgef6 loss reduces the activity of both GTPases and deregulates the expression of the cytoskeletal regulators Pak1–3, Limk1, and Cofilin in the mouse brain (Ramakers et al., 2012). Moreover, spine abnormalities in Arhgef6-knockdown ex vivo slice cultures can be rescued by expressing the active form of Pak3, a downstream effector of Rac1 and Cdc42 (Node-Langlois et al., 2006). Together, these findings support a model in which the loss of the protein affects development through cytoskeletal dysregulation, likely involving altered Rho GTPase signalling. We nevertheless agree that further experiments would be required to establish a direct causal relationship between ARHGEF6 loss, Rho GTPase activity, cytoskeletal dysregulation, and the interneuron phenotypes described here. We will therefore revise the manuscript to clarify that this mechanistic link remains an interpretation supported by our data and the literature, rather than a direct demonstration within the present study.
  
  Also, there is insufficient evidence for interneuron specificity; although the central claim is that ARHGEF6 plays a selective role in interneurons, the data do not adequately exclude the possibility that the observed effects reflect broader neuronal defects. The study lacks critical controls across cell types, as several phenotypes observed in organoids and progenitors, including apoptosis, reduced neuronal output, and altered morphology, could also affect multiple neuronal populations without being directly tested.
  
  We agree that the current data do not exclude the possibility of alterations in other neuronal lineages, specifically the excitatory lineage. With regard to this, we would like to emphasize that the investigation of excitatory cell phenotypes was beyond the scope of the present study, as this aspect has previously been examined by Ramakers et al., 2012 and Node-Langlois et al., 2006, particularly in the context of hippocampal pyramidal cells, which are among the few cell types showing consistent expression of the gene in the adult mouse brain (Allen Brain Atlas; Yao et al., 2021). In this context, it is interesting to note that, in Ramakers et al., 2012 (Figure S1), MAP2 immunostaining of hippocampal formations revealed comparable distribution and intensity of neuronal cell bodies and dendrites throughout the hippocampus of both wild-type and Arhgef6-KO animals. With regard to morphological maturation of excitatory cells, whereas we observe a simplification of interneuron morphology in both mouse and human models, Ramakers et al., 2012 reported increased dendritic arborization complexity in hippocampal pyramidal cells. With regard to migration, a direct comparison with excitatory neurons would be intrinsically difficult, as excitatory and inhibitory neurons undergo highly distinct migratory processes and are therefore not directly comparable. We greatly appreciate the reviewer’s comment, as it gives us the opportunity to better discuss the relationship between our findings and previous studies in the Discussion. We will revise the manuscript and avoid implying that the phenotype observed is exclusive to interneurons.
  
  Furthermore, the data are predominantly descriptive, with many results remaining correlative and failing to establish causal relationships.
  
  We agree that our study primarily establishes a phenotypic framework and does not fully resolve the causal hierarchy among altered survival, migration, cytoskeletal morphology, and intrinsic excitability. We will revise the manuscript to make this limitation explicit, avoiding statements that imply direct causality beyond the data presented.
  
  Some more comments:
  
  (1) Given that ARHGEF6 is a guanine nucleotide exchange factor for Rac1 and Cdc42, the absence of direct measurements of GTPase activity or downstream signaling represents a significant gap. The interpretation that the observed phenotypes are mediated through specific cytoskeletal pathways, therefore, remains inferential.
  
  We appreciate the comment. The interpretation that our phenotype involves dysregulated cytoskeletal dynamics is based on the observed defects in actin polymerization and F-actin organization in neuronal growth cones and is consistent with the abnormalities in neurite morphology and neuronal migration. We will explicitly state in the Discussion that, since we did not directly measure Rac1 and Cdc42 activity levels in our models, our hypothesis regarding the involvement of this molecular pathway in the establishment of the observed phenotype therefore remains inferential, despite being supported by the current literature.
  
  (2) The manuscript repeatedly interprets the findings as interneuron-specific. However, several key observations are not demonstrated to be restricted to IN. Without direct comparison to excitatory neurons or other cell types, it is difficult to conclude that ARHGEF6 plays a selective role in interneurons rather than a more general role in neuronal development. The well-done analysis of the transcriptomic dataset is not sufficient to claim IN specificity. This issue is particularly important for the interpretation of the human organoid experiments, where reductions in SOX2⁺ progenitors and NEUN⁺ neurons, as well as increased apoptosis, could reflect global developmental defects. Similarly, in the mouse experiments, the reduction in GAD67⁺ cells is compelling, but it is not shown whether other neuronal populations are also affected.
  
  As previously mentioned, we understand the reviewer’s concern regarding the specificity of the observed phenotypes in interneurons and agree that the claims should be tempered. However, it is important to note that the interpretation of the human organoid experiments should be reconsidered. The use of specifically ventralized MGE-like organoids allowed us to assess the cell-autonomous nature of defects such as the reduction in inhibitory progenitors’ neuronal output, the increased apoptosis, and the morphological abnormalities of inhibitory neurons. We will acknowledge in the Discussion the limitations of the study with regard to assessing the cell-autonomous nature of the observed migration defects.
  
  (3) The study provides a strong phenotypic description but limited causal resolution. For example, migration defects, altered growth cone morphology, and reduced branching are all consistent with impaired cytoskeletal regulation, but the links between these phenotypes are not directly established. Likewise, while the electrophysiological data convincingly show reduced firing in interneurons, the connection between altered cytoskeletal dynamics and intrinsic excitability is not explored.
  
  The observed migration defects, altered growth-cone morphology, and reduced branching are consistent with impaired cytoskeletal regulation. However, we acknowledge that the mechanistic links among these phenotypes remain to be directly demonstrated. Similarly, although our electrophysiological data show reduced firing in ARHGEF6-KO interneurons, the present study does not provide direct evidence linking impaired excitability to altered cytoskeletal dynamics. In the latter case, we think that the underlying mechanisms should be further investigated at the subcellular level, particularly with respect to cytoskeleton-mediated intracellular trafficking and localization and distribution of ion channels. One limitation of the present study, which may have masked electrophysiological alterations associated with differences in membrane composition (current Figure S1D–H), is that different interneuron subtypes with distinct intrinsic properties were pooled together in the analysis. We will expand the Discussion to address these limitations.
  
  (4) Several aspects of data presentation could be improved. In multiple figures (e.g., Figure 1A, D; Figure 4 and Video S1, 2), the images are difficult to interpret due to high cellular density, limited magnification, or lack of clear annotation. In some cases, it is not fully clear how quantifications were performed or which regions were analyzed. Improving the visual clarity with arrows, boxes, and high-magnification inserts of the data would strengthen confidence in the conclusions.
  
  We would like to thank the reviewer for pointing this out. We agree that some images and videos would benefit from clearer annotation. In the revised manuscript, we will add high-magnification insets, arrows or boxes highlighting the relevant regions/cells, and clearer descriptions of the quantified regions. We will also improve legends and video labels to indicate genotype, region, and tracked cells.
  
  Reviewer #2 (Public review):
  
  The authors investigate the impact of the deletion of the small GTPase regulator ARHGEF6 on the development and physiology of interneurons. Using public databases, they first show that ARHGEF6 is enriched in interneurons or in areas that give rise to them, both in development and adulthood, in humans and mice. Using a complete KO mouse previously reported, and using a GAD67-GFP reporter mice line, they show that in the adult mouse cortex and hippocampus, there is a notorious reduction GFP+ cells. These mice show increased apoptotic cells at different timepoints and areas of the brain during development. In the developing cortex of ARHGEF6-KO mice, there are fewer IN in all layers of the developing cortex, and cells present processes not correctly oriented. IN from the hippocampus in culture show reduced excitability and impaired neurite branching. The authors then established isogenic hiPSCs lines to study ARHGEF6 deletion in human cells and differentiated ventral forebrain neurons, to find interneuron-related and non-related phenotypes. Most importantly, human interneurons grown in organoids show reduced branching and altered growth cone morphology. The authors claim that the novel interneuron phenotypes found in these models can explain, in part, the human intellectual disabilities associated with mutations in this protein. The study is well conducted and opens new avenues of research not only for the role of small GTPases regulation in early nervous system development, but also for how interneuron deficiencies impact a wider range of intellectual disability syndromes found in humans.
  
  We appreciate the reviewer’s positive evaluation of our manuscript and their recognition of this work’s potential to expand the focus of intellectual disability research on the development and function of the inhibitory system. We are particularly encouraged that the reviewer highlights the strength of our combined mouse and human cellular models, as well as the relevance of the interneuron-related phenotypes we identify across systems.
  
  However, most conclusions of the present version would be strengthened after considering the following comments:
  
  Major comments:
  
  (1) The reported biological processes evaluated at different developmental stages may be directly or indirectly related to ARHGEF6 function itself. As a model of a hereditary disease, full organism gene deletion is valid, since the human patients suffer from that condition as well. However, to investigate the roles of a protein, complete deletions may not be very accurate since they can give rise to phenotypes that are only indirectly related to the protein function itself. Most conclusions of the present manuscript should either be discussed in this regard or add evidence for a direct role of the protein. One such evidence is typically performed with acute knockdowns in culture, or in developing brains by in utero electroporation. For example, Figure 1C shows that the principal excitatory neurons in the hippocampus do not express ARHGEF6. However, most electrophysiological and behavioral evidence of defects in ARHGEF6-KO mice arises from evaluating these cells (Ramakers et al., 2012). I am not suggesting that either previous or actual evidence is wrong. But I believe readers would benefit from a clear distinction (or add caution notes) between a functional consequence of the deletion (that can be months away and in other cells than the actual molecular defect) and a true cell biological function of the protein under study. In favor of the authors, this is a concern with most conclusions derived from KO organisms.
  
  We agree with the reviewer that phenotypes observed in constitutive knockout models may, in some contexts, reflect indirect or compensatory consequences of long-term gene loss. Conditional and/or inducible knockout or knockdown approaches can certainly help dissect the nature of the observed defects and better define the effects of gene ablation at different developmental stages or in specific cell types. However, in the context of our study, it is important to note that the experiments performed in ventralized MGE-like organoids allowed us to assess the cell-autonomous nature of very early developmental defects in the inhibitory lineage, in isolation from other cell types. These defects include reduced neuronal output from inhibitory progenitors, increased apoptosis, and morphological abnormalities in inhibitory neurons. Therefore, the phenotypes reported here are less likely to reflect effects originating in, or indirectly caused by, cell types that do not express Arhgef6.
  
  With regard to Figure 1C, we state in the Results that “among excitatory populations, only CA3 pyramidal neurons and mossy cells exhibited expression levels comparable to those observed in inhibitory clusters (Figure 1D, Table S2),” thereby not neglecting the potential effect of the lack of a functional protein in these populations.
  
  (2) Figure 1E-G H I. All conclusions are made with a GAD67-GFP reporter, which is a very powerful and reliable tool for large-scale screening. All the conclusions of the paper would be strengthened if some immunohistochemical staining in the same areas of specific markers for interneurons would be added as supporting complementary evidence.
  
  We appreciate the insightful comment of the reviewer. Additional validation using established interneuronal markers will further strengthen the GAD67-eGFP analysis. We will perform complementary stainings (e.g., PVALB and CCK) and quantifications and include these data as a Supplementary Figure.
  
  (3) Cell death in development: It is surprising that the high amount of TUNEL staining during development does not translate into gross histological changes in the adult brain (studied elsewhere). Can authors discuss possible explanations?
  
  We appreciate the thoughtful consideration of our findings. We think that possible explanations include partial compensatory mechanisms during development, which may mitigate the long-term anatomical consequences of increased cell death. In addition, the phenotype may be restricted to specific neuronal populations or developmental windows, thereby producing functional alterations without necessarily resulting in overt macroanatomical defects. Thus, although increased developmental cell death may contribute to altered circuit assembly and neuronal output, it may not be sufficient to produce gross histological changes detectable at the adult brain level.
  
  (4) Section 4 (Figures 2F-J) - The authors present this staining as an analysis of migration. Normally, migration studies are performed with a "pulse-chase" paradigm, where a single cohort is labeled and then followed over time (normally by in utero electroporation of a fluorescent protein). Tissue is then fixed at different time points, and migration can be followed. On the contrary, the evidence is from a single point, in an experimental setting in which all Gad67 IN are stained, and hence, one cannot imply a defect in migration. The differences between WT and ARHGEF6-KO are obvious and interesting; it is just that they cannot be solely attributed to a problem in migration.
  
  Also, a true phenotype of migration in the current setting should have found that the cells that failed to migrate are accumulated in deeper layers. My impression is that the changes in IN per layer are easier explained by total cell number, rather than migration. Perhaps evaluating earlier timepoints could clarify this.
  
  We appreciate the reviewer’s suggestion to implement an additional time point in the in vivo migration analysis. Since an earlier in vivo time point would most likely not reveal migration-related defects, as most cells would still be confined to the ganglionic eminence (Liaci et al., 2022), we will include analyses performed at a later developmental time point as supplementary evidence. We will also revise the wording to clarify that the fixed-tissue data show altered distribution and orientation of GAD67-eGFP-positive interneurons, which are consistent with impaired migratory behavior when considered together with the in vitro live-imaging data. At the same time, we will acknowledge that reduced interneuron survival and/or neuronal output may also contribute to the observed phenotype.
  
  (5) It is known that ARHGEF6 deletion produces severe F-actin phenotypes in neurons. Have the authors confirmed in their hippocampal cultures GAD67 cells ALSO have these phenotypes? Stress fibers in somas, growth cones, and actin patches along neurites.
  
  We did not directly assess F-actin organization in GAD67-eGFP murine primary cultures. Direct analyses of F-actin organization, growth-cone morphology, and cytoskeletal organization were performed only in the human system. To further assess this phenotype, we will perform phalloidin staining on GAD67-eGFP brain sections to evaluate F-actin organization in interneurons in vivo.
  
  (6) Section 4. The authors present data for deficient migration of the GFP-labeled interneurons. Is it possible to assess, in the same sections, whether other cell types are also affected? Although the hypothesis that ARHGEF6 deletion will have an impact in IN is well rooted in expression data, by assessing other cell types, one can even include a positive control or evidence for a cell-autonomous phenotype.
  
  We thank the reviewer for their thoughtful suggestions. We agree that extending the analysis to additional cell types would provide further insight into the specificity of the phenotype; however, a comprehensive evaluation of all neuronal populations falls beyond the scope of this research. The use of ventralized MGE-like organoids enabled us to examine whether key defects were cell-autonomous, including the reduced neuronal output of inhibitory progenitors, increased apoptosis, and abnormal inhibitory-neuron morphology.
  
  (7) ARHGEDF6 deletion has an important impact on organoid development (size, shape, etc). Have the authors analysed whether these organoids produced fewer interneurons?
  
  We would like to clarify that the organoids analyzed in the study are ventral MGE-like organoids and therefore the reduction in neuronal output (current Figure 4K) primarily reflects the ventral/interneuron lineage in this model.
  
  (8) In assembloids, the differences in migration parameters are very small between WT and ARHGEF6-KO, which reinforces that perhaps what is observed in the different layers of cortex during mouse development is likely not entirely due to migration, as concluded.
  
  We agree that the migration parameters in assembloids should not be interpreted in isolation. We will revise the text to emphasize that the reduction in the number of interneurons observed in the adult brains is part of a broader pattern that also includes altered neuronal output and reduced viability.
  
  (9) To properly weigh the present evidence -interneuron deficits- using the ARHGEF6-KO model, authors should include a deeper discussion in light of much work that has been done using these mice. How does the finding of a diminished IN population in the brain of these mice explain the large amount of electrophysiological and behavioral evidence produced before with these animals? Perhaps the most important work to discuss these aspects is the initial ARHGEF6-KO report by Ramakers and colleagues (2012), but there are others.
  
  We appreciate the reviewer’s emphasis on the importance of framing our findings within the broader context of the existing literature. We will expand the Discussion to better integrate previous work on ARHGEF6-KO mice. Specifically, we will discuss how reduced interneuron number and altered interneuronal function may contribute to previously reported electrophysiological and behavioral phenotypes, acting in concert with previously described alterations in excitatory neurons and synaptic plasticity (Ramakers et al., 2012).
  
  Minor comments:
  
  (1) Figure 1A. It looks clear that the GE shows the highest expression of ARHGEF6; however, the reader needs the reference levels where the log2 expression is calculated. What are the reference levels?
  
  We would like to thank the reviewer for pointing this out. We will clarify in the caption that the log2(RPKM+1) expression values are shown as absolute values and are not relative to a reference condition.
  
  (2) Have the authors compared the number of GAD67-eGFP cells in the hippocampal cultures between WT and ARHGEF6-KO mice?
  
  We did not rely on total GAD67-eGFP counts in dissociated hippocampal cultures because differences could reflect initial plating composition, survival, and maturation. In our experience, the MGE-like organoid system provides a more controlled in vitro context to assess neuronal output in the ventral lineage.
  
  (3) Section 3, as a caution note, authors should mention that it is not possible to know from the evidence provided which cells are dying.
  
  We agree with the reviewer and will add a cautionary statement noting that TUNEL staining alone does not identify the precise dying cell type. We will clarify that increased cell death in the ganglionic eminence and MGE-like organoids is consistent with a prominent involvement of the ventral/inhibitory lineage, while acknowledging the limits of the assay.
  
  (4) In the dorsal-ventral assembloids, it is expected that the ventral organoid would contain lots of GFP expression compared to the dorsal, but in the image shown (Figure 5A) both parts of the assembloid seem to have the same amount and distribution of GFP. How is that possible?
  
  We appreciate the thoughtful comment of the reviewer. After two weeks of fusion, a considerable number of interneurons are expected to have migrated from the ventral to the dorsal compartment of the assembloid (Birey et al., 2017; Sloan et al., 2018). In terms of distribution, we think that current Figure 5A shows a gradient of eGFP-positive cells within the dorsal compartment, with the number of labeled cells decreasing as the distance from the fusion interface between the two organoids increases. By contrast, a comparable gradient is not evident in the ventral compartment, where several labeled neurons remain present even in regions distal to the fusion site.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  ARHGEF6 is a RAC1/CDC42 guanine nucleotide exchange factor that has been proposed to be associated with X-linked intellectual disability, but its relevance to the pathology is not well established. ARHGEF6 has been assigned a role in spine density and plasticity of hippocampal pyramidal neurons, but nothing is known about its role in interneuron development. Here, the authors show that ARHGEF6 is expressed early in development in the inhibitory lineage during the peak of interneuron generation and migration. The aim of the study is therefore to investigate whether, in addition to its role in pyramidal neurons, ARHGEF6 could play a role in inhibitory neuron development. Using both ARHGEF6-KO mice and organoids from ARHGEF6-KO hiPSCs, the authors show that ARHGEF6 plays a critical role in interneuron development and function
  
  Strengths:
  
  The major strength of the paper is the very detailed analysis of the role of ARHGEF6 using two different systems: ARHGEF6-KO mice and deletion of ARHGEF6 in human iPSC-derived organoids. Strikingly, deletion of ARHGEF6 in both systems induces similar defects such as an increase in apoptosis, reduced neuronal output, impaired neuronal morphology, and disrupted migratory dynamics. This compelling evidence demonstrates that ARHGEF6, in addition to its already well-described role in spine formation and plasticity, is playing a crucial role during embryonic development through its function in interneurons.
  
  We thank the reviewer for this positive assessment of our work and for highlighting the strength of our combined in vivo and human iPSC-derived organoid approaches. We are pleased that the reviewer recognizes the consistency of the phenotypes observed across both systems and acknowledges that our findings support a crucial role, during early stages of embryonic development, for a protein previously thought to be relevant primarily in the synaptic context.
  
  Weaknesses:
  
  (1) In Figure 1, the authors show that ARHGEF6 is expressed in different regions of the brain, including the interneuron lineage, and that depletion of ARHGEF6 reduces the number of GABAergic neurons in the adult cortex and hippocampus. To try to better characterize this defect, the authors in Figure 2 investigate whether deletion of ARHGEF6 affects interneuron migration and survival during embryonic development. To do so, ARHGEF6 ko mice were crossed with the GAD67-eGFP reporter line to follow the inhibitory lineage. The authors analyse apoptosis using TUNEL staining, and show that it is significantly increased in the ganglion eminence of ARHGEF6-KO E14.5 embryos. The authors claim that this is not the case in the cortex. However, the image shown in Figure 2A really suggests that staining is increased. Which part of the neocortex is analysed for quantification? This should be clarified.
  
  We would like to thank the reviewer for pointing this out. The region analyzed was the same as that used to assess GAD67-eGFP-positive cells in Figure 2F. We will clarify the exact neocortical region used for TUNEL quantification and revise the figure and legend to make the analyzed area explicit. We will also analyze additional animals to improve the accuracy of the analysis.
  
  (2) In Figure 2F-J, the authors investigate the migration of interneurons by analysing the GAD67-eGFP staining, and clearly show that the migratory abilities of the depleted neurons are reduced. However, the authors do not discuss the fact that, because depletion of ARHGEF6 increases apoptosis, there are fewer neurons available for migration. This is important for the interpretation of the data. This point should be clarified.
  
  We appreciate this comment and believe that it is particularly relevant to the interpretation of the data shown in Figure 2F–G. We will clarify the limited interpretation of this specific analysis in the Results section. The altered directionality observed in vivo, together with evidence of impaired migratory behavior obtained through in vitro live imaging, supports the possibility that altered migratory dynamics contribute to the phenotype, although increased apoptosis and reduced neuronal output may also contribute.
  
  (3) In Supplementary Figure S2, the authors describe the establishment of the ARHGEF6-KO human iPSC line and test the ability of these cells to undergo correct development, especially for the generation of neural progenitor cells. I was wondering why the authors do not present the data of both control and ARHGEF6-KO cells.
  
  We thank the reviewer for pointing this out. All staining reported in the organoids and assembloids in this paper shows that the WT ATCC-DYS0100 cell line, as well as the mutant, efficiently differentiates into neuronal tissue. The Supplementary Figure was intended to validate the impact of the mutation on the ability of the iPSC line to retain its differentiation capacity as a preliminary step before proceeding with organoid differentiation. We will integrate stainings for NPC markers on the WT line in the Supplementary Figure.
  
  (4) At the molecular level, how ARHGEF6 depletion could affect neuronal survival is missing. In addition, as ARHGEF6 is a GEF for RAC1 and Cdc42 amongst other GEFs, I would have expected that the authors test how RAC1 activity (and Cdc42) is affected in ARHGEF6-depleted brains and in ARHGEF6-KO organoids. The measure of phalloidin staining and the anisotropy index are not really meaningful.
  
  We appreciate the thoughtful comment of the reviewer. Previous evidence already shows that Arhgef6 loss reduces the activity of both GTPases and deregulates the expression of the cytoskeletal regulators Pak1–3, Limk1, and Cofilin in the mouse brain (Ramakers et al., 2012). Regarding organoids, we agree that direct RAC1/CDC42 activity measurements would have strengthened the molecular mechanism. We will revise the manuscript to avoid implying that our phalloidin-based measurements alone establish the underlying dysregulated molecular pathway.
  
  (5) The authors show that ARHGEF6-KO forebrain organoids were markedly smaller compared to their isogenic controls, and their study suggests that ARHGEF6 expression impacts progenitor maintenance and neurogenesis. Despite representing only a minority of the total neuronal population, I was wondering whether ARHGEF6-KO mice present brain morphology defects such as microcephaly.
  
  We appreciate the comment. We did not perform a morphometric analysis for microcephaly in the present study. We will add this limitation to the Discussion and note that gross brain morphology changes were not reported in the previously published ARHGEF6-KO mouse characterization (Ramakers et al., 2012). We will also clarify that the smaller organoid phenotype may reflect developmental defects that may reflect developmental defects that are not fully compensated in a reductionist in vitro model and therefore do not necessarily imply overt microcephaly in vivo.
  
  References
  
  Allen Institute for Brain Science. Allen Mouse Brain Atlas: Arhgef6 ISH data. Available from: Allen Brain Map.
  
  Birey, F., Andersen, J., Makinson, C. D., Islam, S., Wei, W., Huber, N., Fan, H. C., Metzler, K. R. C., Panagiotakos, G., Thom, N., O’Rourke, N. A., Steinmetz, L. M., Bernstein, J. A., Hallmayer, J., Huguenard, J. R., & Pașca, S. P. (2017). Assembly of functionally integrated human forebrain spheroids. Nature, 545(7652), 54–59. https://doi.org/10.1038/nature22330
  
  Liaci, C., Camera, M., Zamboni, V., Sarò, G., Ammoni, A., Parmigiani, E., Ponzoni, L., Hidisoglu, E., Chiantia, G., Marcantoni, A., Giustetto, M., Tomagra, G., Carabelli, V., Torelli, F., Sala, M., Yanagawa, Y., Obata, K., Hirsch, E., & Merlo, G. R. (2022). Loss of ARHGAP15 affects the directional control of migrating interneurons in the embryonic cortex and increases susceptibility to epilepsy. Frontiers in Cell and Developmental Biology, 10, 875468. https://doi.org/10.3389/fcell.2022.875468
  
  Nodé-Langlois, R., Muller, D., & Boda, B. (2006). Sequential implication of the mental retardation proteins ARHGEF6 and PAK3 in spine morphogenesis. Journal of Cell Science, 119(23), 4986–4993. https://doi.org/10.1242/jcs.03273
  
  Pelkey, K. A., Chittajallu, R., Craig, M. T., Tricoire, L., Wester, J. C., & McBain, C. J. (2017). Hippocampal GABAergic inhibitory interneurons. Physiological Reviews, 97(4), 1619–1747. https://doi.org/10.1152/physrev.00007.2017
  
  Ramakers, G. J. A., Wolfer, D., Rosenberger, G., Kuchenbecker, K., Kreienkamp, H.-J., Prange-Kiel, J., Rune, G., Richter, K., Langnaese, K., Masneuf, S., Bösl, M. R., Fischer, K.-D., Krugers, H. J., Lipp, H.-P., van Galen, E., & Kutsche, K. (2012). Dysregulation of Rho GTPases in the αPix/Arhgef6 mouse model of X-linked intellectual disability is paralleled by impaired structural and synaptic plasticity and cognitive deficits. Human Molecular Genetics, 21(2), 268–286. https://doi.org/10.1093/hmg/ddr457
  
  Sloan, S. A., Andersen, J., Pașca, A. M., Birey, F., & Pașca, S. P. (2018). Generation and assembly of human brain region-specific three-dimensional cultures. Nature Protocols, 13(9), 2062–2085. https://doi.org/10.1038/s41596-018-0032-7
  
  Yao, Z., Nguyen, T. N., van Velthoven, C. T. J., Goldy, J., Sedeno-Cortes, A. E., Baftizadeh, F., Bertagnolli, D., Casper, T., Chiang, M., Crichton, K., Ding, S.-L., Fong, O., Garren, E., Glandon, A., Gouwens, N. W., Gray, J., Graybuck, L. T., Hawrylycz, M. J., Hirschstein, D., … Zeng, H. (2021). A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell, 184(12), 3222–3241.e26. https://doi.org/10.1016/j.cell.2021.04.021
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.09.710568v2
www.biorxiv.org www.biorxiv.org

Contractile perinuclear actomyosin network promotes peripheral and polar chromosome interaction with the mitotic spindle

1
1. Public_Reviews 28 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Reviewer #1 (Evidence, reproducibility and clarity):
  
  Summary
  
  Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosinbased mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.
  
  In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.
  
  Major Comments
  
  A. Structural overhaul and figure reorganization
  
  The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.
  
  Move methodological and descriptive details (e.g., especially from the second Results subheading and Figure 2) to the Methods or Supplementary Materials.
  
  In these parts, we define four phases of kinetochore motion in early mitosis. Without such a description in the main text, readers would be confused about subsequent analyses. Figure 2 is also important to show examples of how the four phases develop. Although we respect this suggestion from the reviewer, we would like to keep these parts in the main text and main figure.
  
  Remove repetitive statements that simply restate that later phenotypes arise as consequences of delayed Phase 1 (applicable to subheadings 3 onward).
  
  As suggested, we have removed the statement for the delayed start of Phase 2 for peripheral kinetochores in azBB-treated cells (Page 9, second paragraph). We have also simplified the statement for the delayed start of Phase 3 and Phase 4 to avoid repetition (Page 9, third paragraph; Page 10, second paragraph).
  
  Figure 4I: This panel is currently unclear and should be drastically simplified.
  
  Following this suggestion, we simplified Figure 4I by removing the column of ‘Start’, which is easily deduced from the ‘Duration’ results and therefore does not provide much new information.
  
  I recommend to reorganize figures as follows:
  
  Figure I: Keep as single figure but simplify. Figure 1D and 1E could be combined, move unnormalized SCV to supplementary materials. Same goes for 1F.
  
  We have reorganized Figure 1, as suggested, and moved unnormalized data to supplemental materials.
  
  New Figure 2: Combine current Figures 2A, 3A, 3C, 3D, 4C, 4F, and 4H to illustrate how PANEM contraction facilitates initial interactions of peripheral chromosomes with spindle microtubules which increases speed of congression initiation.
  
  If we were to follow this suggestion, we would lose Figure 2B, D, Figure 3B and Figure 4A, where examples of kinetochore motions are shown in images and 3D diagrams. The new Figure would mostly consist of only graphs. Without examples of images and 3D diagrams, readers would have difficulty understanding the study. Although we respect this suggestion from the reviewer, we would like to keep Figures 2, 3 and 4, as they are (except for making Figure 4I simpler; see above).
  
  New Figure 3: Combine current Figures 5A, 5C, 5D, 5F, 6B, 6C, and lower panels of 4H to show how
  
  PANEM contraction repositions polar chromosomes and reduces chromosome volume in early mitosis to enable rapid initiation of congression.
  
  If we were to follow this suggestion, we would lose Figure 5B and Figure 6A, where examples of kinetochore/chromosome dynamics are shown in images and 3D diagrams. For the same reason as above, we would like to keep Figure 5 and 6 as they are, although we respect this suggestion from the reviewer.
  
  New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.
  
  We have conducted new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We have combined the new results with the original Figure S7 to create Figure 8 in line with this suggestion.
  
  On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.
  
  B. Specificity and redundancy of actin perturbation
  
  To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:
  
  Apply global actin inhibitors (e.g., cytochalasin D, latrunculin A) to disrupt the entire actin cytoskeleton. These perturbations strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as reported previously (Lancaster et al., 2013; Dewey et al., 2017; Koprivec et al., 2025). The minimal effect of global inhibition must be addressed when proposing a localized actomyosin mechanism. Comment if the apparent differences in this approach and one that the authors were using arises due to different cell types.
  
  We did experiments along this line, using a dominant-negative LINC construct, in our previous study (Booth et al eLife 2019). LINC-DN should more specifically remove/reduce PANEM than the global actin inhibitors mentioned above. LINC-DN attenuated the reduction of CSV soon after NEBD and increased the number of polar chromosomes (Booth et al eLife 2019); i.e. in this regard, the outcome was similar to azBB treatment in the current study. One can expect that global actin inhibitors would also inhibit the PANEM formation and show effects similar to LINC-DN. By contrast, the indicated references reported that global actin inhibitors strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as the reviewer noted. One possibility is that such differences may have arisen from different cell types – this could be important, especially given that some cells form the PANEM and others do not (Figure 8A). A second possibility is that cytokinesis, mitotic rounding and PANEM formation may rely on actin polymerization to different extents. For example, the same concentration of global actin polymerization inhibitors may affect cytokinesis, but may still allow PANEM formation to proceed without observable effects on early chromosome movements. As suggested, we discussed this topic in the Discussion (page 16, third paragraph).
  
  Clarify why spindle-associated actin, especially near centrosomes, as reported in prior studies using human cultured cells (Kita et al., 2019; Plessner et al., 2019; Aquino-Perez et al., 2024), was not observed in this study. The Myosin-10 and actin were also observed close to centrosomes during mitosis in X.laevis mitotic spindles (Woolner et al., 2008). Possible explanations include differences in fixation, probe selection, imaging methods, or cell type. Note that some actin probes (e.g., phalloidin) poorly penetrate internal actin, and certain antibodies require harsh extraction protocols. Comment on possibility that interference with a pool of Myo10 at the centrosomes is important for effects on congression.
  
  As the reviewer implies, we cannot rule out that we could not detect actin associated with the spindle or centrosomes because of the difference in methods or cell lines between the current study and the literature mentioned by the reviewer. We have therefore moderated our claim in the Discussion that ‘we did not detect any actin network inside the nucleus, on the spindle or between chromosomes’ by adding ‘at least, using the method and the cell line in the current study’ to this statement (Page 14, second paragraph). We have also cited the three references mentioned by the reviewer in the Discussion (Page 14, second paragraph). Regarding Myosin10, azBB (blebbistatin variant) should have negligible effects on class-X myosin, including Myosin-10 (Limouze et al 2004 [PMID 15548862]). It is therefore unlikely that the effects of azBB that we observed in the current study are due to the inhibition of Myosin-10. We have cited Woolner et al 2008 and another paper and discussed this topic in the Discussion (Page 14, second paragraph).
  
  C. Expansion of PANEM functional analysis
  
  To strengthen the conclusions and broaden the study beyond the group's previous work, PANEM function should be tested in additional contexts (some may be considered optional but important for broader impact): [underlined by authors]
  
  Test PANEM function in at least one additional cell line that displays PANEM to rule out cell-line-specific effects.
  
  As suggested, we have studied the effect of PANEM contraction in cell lines other than U2OS. We have found that when PANEM contraction was inhibited, the reduction in chromosome scattering was diminished in RPE1 cells (new Figure 8B, C). Moreover, we have found that inhibition of PANEM contraction increased polar chromosomes during prometaphase/ metaphase in RPE1 and HCT116 cells (which form PANEM), but not in HeLa cells (which do not form PANEM) (new Figure 8D, E). These results suggest that the effects of PANEM contraction, originally observed in U2OS cells, are also present in other cell lines (RPE1 and HCT116) that form PANEM.
  
  Examine higher-ploidy or binucleated cells to determine whether multiple PANEM contractions are coordinated and if PANEM contraction contributes more in cells of higher ploidies or specific nuclear morphologies.
  
  This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.
  
  Investigate dependency on nuclear shape or lamina stiffness; test whether PANEM force transmission requires a rigid nuclear remnant.
  
  This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.
  
  Analyze PANEM's contribution under mild microtubule perturbations that are known to induce congression problems (e.g., low-dose nocodazole).
  
  In the current study, we found that PANEM contraction affects chromosome motions in Phase 1 and Phase 3 but not Phase 2 or Phase 4. Mild microtubule perturbation itself could affect chromosome motions in all four Phases. We do not think it would be so informative to study what additional effects the reduced PANEM contraction shows when combined with mild microtubule perturbation.
  
  Evaluate PANEM contraction role in unsynchronized U2OS cells, where centrosome separation can occur before NEBD in a subset of cells (Koprivec et al., 2025), and in other cell types with variable spindle elongation timing.
  
  Following this suggestion, we first investigated the timing of spindle elongation, relative to NEBD, in asynchronous U2OS cells (Figure 8 – figure supplement 3). We imaged cells every 5 min (it was difficult to reasonably observe enough mitotic cells using a shorter interval). Most of the cells showed no significant change in the spindle length (distance between two spindle poles) after (or around) NEBD [e.g. Cell 1 in A] or a mild reduction in it [e.g. Cell 2 in A]. Only a small number of cells (2-3 out of 26) showed a mild increase in the spindle length after (or around) NEBD [e.g. Cell 3 in A]. Because the spindle elongation after NEBD was rare and mild, it was difficult to address how the timing of spindle elongation affects the effect of PANEM on reducing chromosome scattering and on chromosome relocation from polar regions. We explained this result and discussed this topic in the Discussion section.
  
  Quantify not only the percentage of affected cells after azBB but also the number of chromosomes per cell with congression defects in the current and future experiments.
  
  It is tricky to count the number of chromosomes because they frequently overlap. Counting kinetochores is more feasible, but kinetochore signals show some non-specific background (e.g. those outside of the nucleus in prophase). We therefore quantified the chromosome volume at polar regions in azBB-treated cells (Figure 6C).
  
  D. Conceptual integration in Introduction and Discussion
  
  The manuscript should better situate its findings within the context of early mitotic chromosome movements:
  
  Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.
  
  It has been a widely accepted view in the field that chromosome congression precedes biorientation, since the publication in 2006 (Kapoor et al Science 2006). Very recently, this view has been challenged by the new publication (Vukušić & Tolić, Nat comm 2025), as indicated by this reviewer. We have mentioned this new model and discussed the new interpretation of our results based on this new model, in the Discussion (page 15; ‘It has been a widely accepted view…’).
  
  To explain the new interpretation of our results more clearly, we have a new diagram as a supplemental figure (Figure 9 – figure supplement 1) in the revised manuscript.
  
  Explain that PANEM is most critical for polar chromosomes because their peripheral positions are unfavorable for rapid biorientation (Barišić et al., 2014; Vukušić & Tolić, 2025).
  
  We have included such a statement in the Discussion, as a part of the new interpretation of our results based on the new model that chromosome biorientation precedes congression (see above). We have also cited the indicated two papers.
  
  Discuss how cell lines lacking PANEM (e.g., HeLa and others) nonetheless achieve efficient congression, and what alternative mechanisms compensate in the absence of PANEM. For example, it is well established that cells congress chromosomes after monastrol or nocodazole washout, which essentially bypasses the contribution of PANEM contraction.
  
  Following this suggestion, we discussed three possible mechanisms that could compensate for a lack of PANEM and facilitate kinetochore-MT interaction and chromosome congression, based on previous literature (Page 17): 1) the enhanced assembly rate of spindle MTs may facilitate kinetochore-MT interactions in N-CIN+ cancer cells, 2) chromosome biorientation may precede congression more frequently to promote the congression towards the spindle midplane, and 3) the balance between CENP-E, Dynein and chromokinesin’s activities may incline to greater chromosome-arm ejection forces towards the spindle midplane.
  
  Minor Comments
  
  These issues are more easily addressable but will significantly improve clarity and presentation.
  
  Introduction
  
  Remove the reference to Figure 1A in the Introduction. The portion of Figure 1 and related text that recapitulates the authors' previous work should be incorporated into the Introduction, not the Results.
  
  As suggested in the second sentence of this comment, we have moved most of the second paragraph of the first section of Results to Introduction (Page 4) and cited Figure 1A and 1B in Introduction. We would like to keep the reference to Figure 1A in the Introduction, because showing the PANEM images at the beginning of the manuscript would help readers’ understanding of our study. In addition, citing Figure 1A in the Introduction is more consistent with the suggestion in the second sentence of this comment.
  
  Results (by subheading)
  
  First subheading: When introducing the ~8-minute early mitotic interval, cite additional studies that have characterized this period: Magidson et al., 2011 (Cell); Renda et al., 2022 (Cell Reports); Koprivec et al., 2025 (bioRxiv); Vukušić & Tolić, 2025 (Nat Commun); Barišić et al., 2013 (Nat Cell Biol).
  
  As suggested, we cited these references at the indicated part of the first section of the Results (page 5).
  
  Second subheading: Cite key reviews and foundational research on kinetochore architecture and sequential chromosome movement during early mitosis: Mussachio & Desai, 2017 (Biology); Itoh et al., 2018 (Sci Rep); Magidson et al., 2011 (Cell); Vukušić & Tolić, 2025 (Nat Commun); Koprivec et al., 2025 (bioRxiv); Rieder & Alexander, 1990 (J Cell Biol); Skibbens et al., 1993 (J Cell Biol); Kapoor et al., 2006 (Science); Armond et al., 2015 (PLoS Comput Biol); Jaqaman et al., 2010 (J Cell Biol).
  
  Rieder & Alexander, 1990 (J Cell Biol) and Kapoor et al., 2006 (Science) have already been cited in the second section of the Results in the original manuscript. We agree that all other references should be cited in this manuscript, and they are now cited in the Introduction and/or Discussion where they fit best (e.g. Mussachio & Desai 2017 reviews the kinetochore in general and is therefore best cited in the Introduction).
  
  Third subheading: Clarify why some kinetochores on Figure 3A appear outside the white boundaries if these boundaries are intended to represent the nuclear envelope.
  
  We interpret that these are background signals in the cytoplasm, which do not come from kinetochores, because 1) before NEBD, they were outside of the nucleus, and 2) after NEBD, they did not show any characteristic kinetochore motions such as those towards a spindle pole (Phase 2) and the spindle mid-plane (Phase 4). We have commented on these background signals in the legend for Figure 3A.
  
  Fourth subheading: Note that congression speed is lower for centrally located kinetochores because they achieve biorientation more rapidly (Barišić et al., 2013, Nat Cell Biol; Vukušić & Tolić, 2025, Nat Commun).
  
  Relevant to this comment, there was an error regarding the congression speed of central kinetochores (original Figure 4H). The congression speed of peripheral kinetochores was shown correctly, but for central kinetochores it was shown incorrectly with µm per time interval (30s) shown, rather than µm per minute. We amended this error in the revised manuscript (new Figure 4H). Based on the corrected data, the speed of congression is similar between peripheral and central kinetochores. The original Figure 3G (the speed of poleward motion for central kinetochores) had a similar error, which we have also corrected in the revised manuscript. We apologize for these errors and the confusion it may have caused.
  
  Regarding this comment, if biorientation is achieved more rapidly for central kinetochores, Phase 3 (rather than congression speed) would be shorter for central kinetochores. Indeed, Phase 3 is slightly shorter for central kinetochores (control) than for peripheral kinetochores (control) (Figure 4C), but the difference is not statistically significant (t test; p\=0.21).
  
  Fifth subheading: Cite studies on polar chromosome movements: Klaasen et al., 2022 (Nature); Koprivec et al., 2025 (bioRxiv). Clarify that Figure 5F displays only those kinetochores that initiated directed congression movements.
  
  These two references have already been cited and discussed in this Result section of our original manuscript. However, considering this suggestion, we have discussed more about polar chromosome movements reported by Koprivec et al (page 11). Meanwhile, the reviewer is correct about Figure 5F, and we have clarified this point in the Figure 5F legend.
  
  Sixth subheading (currently in Discussion): Move the final paragraph of the Discussion into the Results and expand it with preliminary analyses linking PANEM contraction to congression efficiency across untreated cell types or under mild nocodazole treatment.
  
  As suggested, we have moved the final paragraph of the Discussion in the original manuscript to make a new final section in the Results in the revised manuscript. Moreover, as suggested, we have studied the outcome of inhibiting PANEM contraction in cell lines other than U2OS (Figure 8 B–E), and have described the new results to the new final section in the Results.
  
  Discussion
  
  When discussing cortical actin, cite key reviews on its presence and function during mitosis: Kunda & Baum, 2009 (Trends Cell Biol); Pollard & O'Shaughnessy, 2019 (Annu Rev Biochem); Di Pietro et al., 2016 (EMBO Rep).
  
  As suggested, we have cited all these review papers in the Discussion (page 17), and mentioned the role of the cortical actin on the spindle orientation and positioning (Kunda & Baum, 2009; Di Pietro et al., 2016), as well as the function of the actomyosin ring on cytokinesis (Pollard & O'Shaughnessy, 2019).
  
  Significance
  
  Advance
  
  This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]
  
  We have addressed the underlined criticisms as detailed above.
  
  Audience
  
  Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.
  
  Expertise
  
  My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.
  
  Reviewer #2 (Evidence, reproducibility and clarity):
  
  In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.
  
  Significance
  
  While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]
  
  The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is.
  
  Reviewer #3:
  
  Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochore-microtubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.
  
  Major points
  
  (1) The complexity of tracking has been managed by classifying kinetochore movements into 4 categories, considering motions towards or away from the spindle mid-plane. While this is a very creative solution in most cases, there may be some difficult phases that involve movement in both directions or no dominant direction (eg Phase3-like). It is unclear if all kinetochores go through phase1, 2, 3 and 4 in a sequential or a few deviate from this pattern. A comment on this would be helpful. Also, it may be interesting to compare those that deviate from the sequence, and ask how they recover in the presence and absence of azBB.
  
  To respond to this comment, we would like to first clarify how we selected kinetochores for our analysis. We selected kinetochores that can be individually tracked. If kinetochore tracking was difficult (before the start of Phase 4 in control and azBB-treated cells or before observing the extended Phase 3 in azBB-treated cells) because of kinetochore crowding, we did not choose such kinetochores. For example, related to the next comment of this Reviewer, we did not include kinetochores close to spindle poles (within 4 µm) at NEBD in our analysis for the following two reasons: First, these kinetochores often did not show clear and rapid movements towards a spindle pole, which we used to define Phase 2. Second, although we referred to kinetochore co-localization with a microtubule signal for the start of Phase 2, this was difficult for kinetochores close to spindle poles because of a high density of microtubules. As requested, we have added this comment to the Method section (page 25).
  
  With the above selection, all selected kinetochores without azBB treatment (control) showed the poleward motion (Phase 2) and congression (Phase 4) in this order, though their extents were varied among kinetochores. All selected kinetochores with azBB treatment also showed the poleward motion (Phase 2), and some of them showed congression (Phase 4) after Phase 2. Then, Phase 1 and Phase 3 were defined as intervals between NEBD and Phase 2 and between Phase 2 and Phase 4, respectively. If no Phase 4 was observed with azBB, we judged that Phase 3 continued till the end of tracking. We have added this comment to the Method section (page 25-26).
  
  (2) Would peripheral kinetochore close to poles behave differently compared to peripheral kinetochore close to the midplane (figure S4)? In figure 3D, are they separated? If not, would it look different?
  
  Since we did not include kinetochores close to spindle poles (at NEBD), for which it was difficult to define Phase 2 (see our response to the above major point 1), in our analysis, the suggested comparison is not feasible.
  
  (3) Uncongressed polar chromosomes (eg., CENPE inhibited cells) are known to promote tumbling of the spindle. In figure 5B with polar chromosomes, it will be helpful to indicate how the authors decouple spindle pole movements from individual kinetochore movements.
  
  In contrast to CENPE-inhibited cells, azBB-treated cells did not show much tumbling of the spindle, though both cells showed uncongressed polar chromosomes. The reason for this difference may be fewer uncongressed polar chromosomes in azBB-treated cells. There were still modest spindle motions in azBB-treated cells. However, because kinetochore motions were assessed relative to a spindle pole (and other reference points on the spindle) in our study (Figure 2A, C), the modest spindle motions were offset in our analyses of kinetochore motions. We have clarified the underlined part in the Method section (page 24).
  
  (4) The work has high quality manual tracking of objects in early mitosis- if this would be made available to the field, it can help build AI models for tracking. The authors could consider depositing the tracking data and increasing the impact of their work.
  
  As suggested, we have included kinetochore tracking data as supplemental data in the revised manuscript (Figure 3 – source data 1–4; Figure 5 – source data 1, 2).
  
  Minor points
  
  (1) It will be helpful for readers to see how many kinetochores/cell were considered in the tracking studies. Figure legends show kinetochore numbers but not cell numbers.
  
  As suggested, we have now mentioned the number of cells, where the kinetochore motions were analyzed, in the legends for Figures 3, 4, 5, and supplemental figures.
  
  (2) Discussion point: If cells had not separated their centrosomes before NEBD, would PANEM still be effective? Perhaps the cancer cell lines or examples as shown in Figure 6A have some clues here.
  
  Following this suggestion, we first investigated the timing of spindle elongation, relative to NEBD, in asynchronous U2OS cells (Figure 8 – figure supplement 3). We imaged cells every 5 min (it was difficult to reasonably observe enough mitotic cells using a shorter interval). Most of the cells showed no significant change in the spindle length (distance between two spindle poles) after (or around) NEBD [e.g. Cell 1 in A] or a mild reduction in it [e.g. Cell 2 in A]. Only a small number of cells (2-3 out of 26) showed a mild increase in the spindle length after (or around) NEBD [e.g. Cell 3 in A]. Because the spindle elongation after NEBD was rare and mild, it was difficult to address how the timing of spindle elongation affects the effect of PANEM on reducing chromosome scattering and on chromosome relocation from polar regions. We explained this result and discussed this topic in the Discussion section.
  
  (3) Figure 7 cartoon shows misalignment leading to missegregation. It may be useful to consider this in the context of the centrosome directed kinetochore movements via pivoting microtubules. Is this process blocked in azBB-treated cells?
  
  We understand that the Reviewer refers to the kinetochore pivoting mechanism around a spindle pole, which was recently reported by the Tolic group (Koprivec et al., 2026). Such a pivoting mechanism would work only when the spindle elongates (i.e. the distance between spindle poles is enlarged) after NEBD. Therefore, to address this Reviewer’s question, we tried to assess how PANEM contraction contributes to relocating polar chromosomes when the spindle elongates before or after NEBD in asynchronous U2OS cells (i.e. in the situation where the kinetochore pivoting mechanism is applied or not), as we noted above in response to Point 2. However, spindle elongation after NEBD was rare and mild, and we were unable to address this issue (see our response to Point 2). We discussed this matter in the Discussion section.
  
  (4) Are all the N-CIN- lines with PANEM highly sensitive to azBB? In other words, is PANEM essential for normal congression in some of these lines.
  
  Because blebbistatin could kill cells by inhibiting cytokinesis, the blebbistatin sensitivity of cell growth may not necessarily reflect how essential the PANEM contraction is for chromosome congression.
  
  Instead, we addressed more directly how essential the PANEM contraction is for chromosome congression. We analyzed chromosome congression in RPE1 and HCT116 cells (both are NCIN-) in the presence and absence of pnBB, the inhibitor of PANEM contraction (new Figure 8D, E). With pnBB, these cells showed congression defects, suggesting that the PANEM contraction is essential for chromosome congression in these N-CIN- cells.
  
  (5) Are congression times delayed in lines that naturally lack PANEM?
  
  For example, it takes 10-20 min for HeLa cells (lacking PANEM) to complete chromosome congression after the NEBD (Bancroft et al 2025: https://doi.org/10.1242/jcs.163659). This is not significantly different from the time (8-18 min) for chromosome congression we observed in U2OS cells (which form PANEM). We assume that cells lacking PANEM have developed a compensatory mechanism for efficient chromosome congression – we have discussed possible compensatory mechanisms in the last paragraph of the Discussion (page 17).
  
  (6) Page 23 "we first identified the end of congression" how does this relate to kinetochore oscillations that move kinetochores away from the metaphase plate?
  
  The start of kinetochore oscillation was defined as the end of Phase 4 if we could track the kinetochore until that point. In some cases where the kinetochore became close to the midplane (< 2.5 µm), it was not possible to track it further due to kinetochore crowding around the spindle mid-plane – in such cases, the end of Phase 4 was assigned as the end of tracking. These definitions were not necessarily clear in the original manuscript. Moreover, in the original manuscript, it was not clearly stated that the end of Phase 4 was defined in the same way for both non-polar and polar kinetochores. We have now clarified these points in the Method section (page 25).
  
  (7) Are spindle pole distances (spindle sizes) different in early and late mitotic cells (4min vs 6min after NEBD) in control vs azBB-treated cells? Please comment on Figure S2E (mean distance) in the context of when phase 4 is completed. Does spindle size return to normal after congression?
  
  In Figure S2E (Figure 1 – figure supplement 6 in the revised manuscript), we did not observe a significant difference in the spindle-pole distance (the spindle size) between control and azBBtreated cells at any individual time points. The smallest p-value was 0.094 at 6.0 min. As suggested, we have explained this in the legend for this supplementary figure. Completion of Phase 4 is highly variable across different kinetochores within the same cell; thus, a general comment on its completion timing in cells is not feasible.
  
  Significance:
  
  The current work builds upon their previous work, in which the authors demonstrated that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. This work explains how the network facilitates chromosome capture and congression by tracking motions of individual kinetochores during early mitosis. The findings can be broadly useful for cell division and the cytoskeletal fields.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.19.677380v6
www.biorxiv.org www.biorxiv.org

Disentangling Cephalopod Chromatophores Motor Units with Computer Vision

1
1. Public_Reviews 28 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Renard, Ukrow et al. applied their recently published computational pipeline (CHROMAS) to the skin of Euprymna berryi and Sepia officinalis to track the dynamics of cephalopod chromatophore expansion. By segmenting each chromatophore into radial slices and analyzing the co-expansion of slices across regions of the skin, they inferred the motor control underlying chromatophore groups.
  
  Strengths:
  
  The authors demonstrate that most motor units of cephalopod skin include a subregion of multiple chromatophores, creating "virtual chromatophores" in between the fixed chromatophores. This is an interesting concept that challenges prevailing models of chromatophore organization, and raises interesting possibilities for how chromatophore arrays may be patterned during development.
  
  This study introduces new analyses of cephalopod skin that will be valuable for the quantitative study of cephalopod behavior.
  
  Weaknesses:
  
  The authors chose to image spontaneous skin changes in sedated animals, rather than visually-evoked skin changes in awake, freely-moving animals. Spontaneous chromatophore changes tend to be small shimmers of expansion and contraction, rather than obvious, sizable expansions. This may make it more challenging to distinguish truly co-occurring expansions from background activity. The authors don't provide any raw data (videos) of the skin, so it is difficult to independently assess the robustness of the inferred chromatophore groupings.
  
  The patch-clamp experiments in E. berryi are used to test the validity of their approach for inferring motor units. The stimulations evoke expansions of sub-regions of each chromatophore, creating "virtual chromatophores" as predicted from the behavioral analysis. However, the authors were not able to predict these specific motor units from behavioral analysis before confirming them with patch-clamp, limiting the strength of the validation. It would be informative to quantify the results of the patch-clamp experiments - are the inferred motor units of similar sizes to those predicted from behavior?
  
  The authors report testing multiple experimental conditions (e.g., age, size, behavioral stimuli, sedation, head-fixation, and lighting), but only a small subset of these data are presented. It is difficult to determine which conditions were used for which experiments, and the manuscript would benefit from pooling data from multiple experiments to draw general conclusions about the motor control of cephalopod skin.
  
  The authors use a different clustering algorithm for E. berryi and S. officinalis, but do not discuss why different clustering approaches were required for the two species.
  
  Impact:
  
  The authors use their computational pipeline to generate a number of interesting predictions about chromatophore control, including motor unit size, their spatial distribution within the skin, and the independent control of subregions within individual chromatophores by putatively distinct motor neurons. While these observations are interesting, the current data do not yet fully support them.
  
  The CHROMAS tool is likely to be valuable to the field, given the need for quantitative frameworks in cephalopod biology. The predictions outlined here provide a useful foundation for future experimental investigation.
  
  We thank the reviewer for the thoughtful and detailed evaluation of our work and for recognizing the potential of the CHROMAS pipeline for studying chromatophore control.
  
  We agree that some aspects of the manuscript required clarification and additional explanation, and we have revised the text accordingly. We also now provide access to representative raw video recordings in the Data Availability section. In the E. berryi patch-clamp experiments, single motor neurons evoked expansions of sub-regions of chromatophores, consistent with the “virtual chromatophore” concept. We have now quantified the size of motor units across patch-clamp sessions, and the results show that the inferred motor-unit sizes broadly match those predicted from behavioral recordings, supporting the validity of our approach.
  
  We agree that pooling data across individuals would provide valuable insight into variability across animals. In practice, we recorded chromatophore activity from several animals (14 Euprymna berryi and 12 Sepia officinalis) under different experimental conditions during development of the experimental pipeline. However, acquiring long, stable, artifact-free recordings suitable for motor unit analysis is technically challenging. We now clarify this point in the manuscript. Specifically, we explain that multiple animals were recorded during pipeline development, while the analyses presented focus on recordings with the highest signal quality. We anticipate that the framework introduced here will enable future studies to collect larger datasets and compare motor unit organization across individuals, developmental stages, and species.
  
  HDBSCAN was used for E. berryi during initial exploratory analyses, and Affinity Propagation was adopted for S. officinalis because it better captured the correlation structure of those recordings. We did not re-analyze the E. berryi data with Affinity Propagation, and the implications of algorithm choice are now discussed in the Discussion.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Overall, this is an excellent paper, making use of a newly developed system for monitoring the behaviour of chromatophores in the skin of (mostly) free-swimming bobtail squid and European cuttlefish. The manuscript is very well-written, clearly presented and very well-structured. The central finding, that individual chromatophores are connected to multiple motor neurones, is not new. Novelty instead comes from the ability to measure the actuation of chromatophore sections across wide areas of skin in free-swimming animals, showing the diversity of local motor units and reinforcing the notion that individual chromatophores are not necessarily the individual units of colour change, but rather local motor units that cover multiple neighbour and near-neighbour chromatophore muscles. This is an excellent finding and one that will shape our understanding of the neural control of cephalopod skin colour.
  
  Strengths:
  
  The methodological approach to collecting large amounts of data about local variations in the expansion of sections of chromatophores is exciting, and the analysis pipeline for clustering sections of chromatophores whose spontaneous activity correlated over time is powerful and exciting.
  
  Weaknesses:
  
  Some minor edits and typographical errors need correcting. I also had some concerns that the preparation for the electrophysiological section of the manuscript complies with the journal's ethical requirements, so I would urge that this be carefully checked.
  
  We thank the reviewer for the positive evaluation of our work and for recognizing the value of the methodological approach and the clarity of the manuscript.
  
  We have carefully reviewed the manuscript and corrected minor typographical errors.
  
  Regarding the ethical considerations raised for the electrophysiological experiments, we have carefully verified that the experimental procedures comply with the journal's ethical requirements and relevant institutional guidelines.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This study uses high-resolution videography and a custom computer-vision pipeline to dissect the motor control of cephalopod chromatophores in Euprymna berryi and Sepia officinalis. By quantifying anisotropic chromatophore deformations and applying dimensionality reduction methods, the authors infer that individual chromatophores can be a part of multiple motor units. Clustering analyses reveal putative motor units that often span multiple chromatophores, with diverse and overlapping geometries. Chromatophore expansion dynamics are faster and more stereotyped than relaxation, consistent with active neural contraction followed by passive recoil. Together, the results show that chromatophores function not as uniform pixels but as fractionated, coordinately controlled elements that enable flexible pattern generation
  
  Strengths:
  
  The authors present compelling, direct evidence that a). chromatophore deformations are anisotropic, and indirect evidence that b) individual chromatophores can be split across multiple putative motor units. This evidence is provided through data collected over large spatial scales, but also at a sub-chromatophore resolution. This combination of scale and resolution is not possible using traditional neuroanatomical and physiological approaches alone.
  
  The authors also develop a new non-invasive, image analysis approach to extract information about chromatophore deformation across large spatial scales on the organism's body. In principle, this approach is applicable across species and may allow for further comparative characterization of chromatophore motor control. It is therefore a promising new tool and useful resource for the community.
  
  Weaknesses:
  
  An important weakness of the work is that the methods the authors develop can only be applied during resting, spontaneous 'flickering' activity of chromatophores. The inability to reliably apply their technique during any kind of realistic camouflage is a large limitation, as it means this method cannot be used to study the dynamics of motor control during realistic camouflage behaviors.
  
  Another weakness of this paper is the rather limited electrophysiological validation of the computational findings. The authors present only one electrophysiology experiment in E. berryi, the species that they used only for 'methodological development' and not for detailed characterization. A complementary electrophysiological experiment in S. officinalis, or some visualization of neuron morphology confirming that motor neurons do indeed project to multiple chromatophores, would strengthen the generalizability of their computational analysis. This would be particularly pertinent to validate the author's claim that some motor units contain chromatophores that are quite distant from one another on the animal.
  
  Overall, the authors' technical contributions and method development are an important advance. This work serves as an excellent proof of concept that their method can extract useful information about chromatophore motor control. Further validation of their method is needed to fully trust the fine-scale conclusions drawn about the distribution and composition of multi-innervated chromatophores. Furthermore, the authors raise many interesting ideas about developmental constraints on circuit wiring and potential adaptive significance of multi-innervated chromatophores for certain features of camouflage patterning. Their method may be able to help resolve some of these questions in the future if it is refined and applied across developmental stages, regions of the animal, and across species
  
  We thank the reviewer for their thoughtful evaluation and for recognizing the potential of the computational approach introduced in this study.
  
  Regarding the focus on spontaneous chromatophore activity, we have clarified earlier in the Results section why these events are necessary to isolate individual muscle activations. While large camouflage patterns are visually striking, they involve the coordinated activation of many groups of chromatophores by premotor circuits simultaneously, making the identification of individual motor units, our goal here, impossible. Our approach can, however, also be applied during active behavior, including camouflage; the questions addressed there would be different, focusing on how multiple motor units are coordinated to generate the resulting skin patterns, rather than resolving the structure of single motor units. This could be challenging if the patterns of premotor control are highly variable, thus making the detection of meaningful or interpretable motion correlations difficult. This remains to be tested.
  
  We also acknowledge that electrophysiological validation remains limited. Patch-clamp experiments were performed in Euprymna berryi to test predictions generated by the computational analysis, and these experiments confirmed that activation of single motor neurons can produce anisotropic expansion of chromatophore subregions. We now provide the associated datasets in the Data Availability section. We agree that complementary electrophysiological or anatomical experiments in Sepia officinalis would further strengthen the conclusions. Such experiments represent an important direction for future work.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  General points:
  
  (1) Given all the experimental conditions and animals tested, the manuscript would be much stronger if the figures represented pooled data from many animals and experiments (e.g. Figure 1C).
  
  We agree that pooling data from multiple animals would strengthen the manuscript. In practice, we tested these experimental conditions across several animals (14 Euprymna berryi and 12 Sepia officinalis), but we selected the segments shown in the figures for their minimal artifacts and errors. Acquiring high-quality, stable recordings of this type is extremely challenging, and the presented data represents the clearest examples suitable for analysis and visualization. We hope that in the future these methods will enable not only the collection of a larger, high-quality dataset, but also comparisons across individuals, ages, species, and different regions of the mantle.
  
  (2) It's very unclear what animals were used for each experiment:
  
  (a) E. berryi: L677 states that 14 animals were filmed, and L684 implies that non-sedated individuals were used in addition to sedated animals, but it appears all the data is from a single E. berryi with sedation?
  
  The original wording was unclear, so we modified the sentence for clarity. The Methods now specify that 14 animals were filmed to refine the experimental pipeline and explore different conditions, while the data presented in the Results are from a single lightly sedated individual chosen for quality and stability of chromatophore activity.
  
  (b) S. officinalis: L692 onwards states that lots of different conditions and animals were explored, but only minimal data from a couple of animals is described in the figures. L156 states that all (?) the data comes from one head-fixed animal and one sedated and head-fixed animal. L549: The conclusion states that the pipeline was used in freely moving animals, but it appears that all of the S. officinalis were head-fixed? This is very confusing. Rather than describing the conditions of every experiment ever performed, the manuscript would benefit from explicitly stating the experimental conditions used for each figure.
  
  The original text was unclear. We have clarified in the manuscript which animals and experimental conditions were used for the analyses in each figure. To clarify, E. berryi was recorded without head fixation, whereas S. officinalis data were obtained under head-fixed conditions. We did film 11 S. officinalis without head fixation, and data can in principle be extracted from these recordings. Head fixation was used both to minimize visual artifacts and to enable longer, stable recordings, which was important for capturing the highest level of apparent noise in motor unit activation—information that is critical for our analyses of motor-unit organization, though not necessary for studies of broader camouflage patterns. Our computational pipeline enables large-scale analyses that would be very difficult or impossible with traditional electrophysiology, not that all data were acquired from freely behaving animals. While fully unconstrained recordings remain technically challenging due to optical and logistical constraints, we maintain that our approach provides a valid framework for analyzing freely behaving animals.
  
  (c) Additionally, there is a claim that the sedated condition represents the unsedated one (e.g. L151 and L643), but no data is shown to support this. L173 references Figure 6d as evidence, but 6d doesn't exist. Only L210 provides sedation/no sedation statistics for the number of components per motor unit. However, in L643 it says "and motor unit organization remained unchanged". This data needs to be shown to include that statement.
  
  Reference to the inexistant 6d figure was removed. L170 provides statistics for the number of principal components per chromatophore, and L210 provides statistics for the number of components per MU. We do not think a sub-figure is necessary. We, however, agree that L643 “motor unit organisation” is potentially misleading as we only compared the number of chromatophores belonging to a single MU and not the MU shape or distribution. Changed “organization” to “size (in chromatophores)”.
  
  (3) The text needs considerable revision. There are many typos (including multiple instances of "refs" instead of the actual references being inserted). These issues make the manuscript much more difficult to evaluate.
  
  Our apologies. We have now added the missing refs.
  
  (4) It is not clear how convincing the chromatophore groups are. For instance, Figure 4h could alternatively be interpreted as a group of 5 chromatophores in a motor group that happen to co-vary with a sixth one at a great distance. Without seeing some of the raw data (videos), it's difficult to assess how convincing it is that these chromatophores belong to the same group. I recommend analyzing: when multiple chromatophores expand together, what is the likelihood that other chromatophores also happen to expand at the same time (given the frequency that they're all changing shape spontaneously)?
  
  We appreciate the reviewer’s concern. Chromatophores are assigned to the same cluster because their activity, or that of their slices, covaries consistently over time. It is, of course, possible that what appears as a single motor unit may reflect two or more motor neurons acting simultaneously during the recording. Longer video segments increase confidence in the integrity of inferred motor units, but in the absence of a ground truth for motor unit spatial organization in this species at this age, it is difficult to quantify the likelihood that two motor units are being conflated. Raw video data is provided in the Data Availability section. We note, however, that most of the time motor units cannot be readily discerned by eye, because individual chromatophores and their constituent slices fluctuate continuously, and motor-unit correlations are subtle and distributed across multiple chromatophores.
  
  (5) The rationale for focusing on spontaneous activity is introduced relatively late in the manuscript and would benefit from being stated earlier. Examples should be provided of what this looks like (as opposed to regular chromatophore expansion). It would be valuable to see measurements across many experiments of how expanded the chromatophores are - what is the change in surface area? And what is the frequency of expansion for each chromatophore?
  
  Thank you for the remark. This is true. We have added a paragraph at the beginning of the Results section to clarify the rationale for focusing on spontaneous activity.
  
  This section now reads:
  
  “Because our primary aim was to describe the composition and coordination of chromatophore motor units, it was important to examine animals in the absence of the descending commands that occur during active behavior. Spontaneous activity, typically mild and “noisy” was thus ideal to enable measurements of the motion correlations between chromatophores that reflected shared motor neuron drive, rather than shared correlations due to upstream motor neuron groupings by premotor circuits.”
  
  We added an example of video recording of spontaneous activity in our Data Availability section.
  
  While quantifying expansion magnitude and frequency across experiments would indeed be valuable, these questions fall outside the primary focus of the present study, which centers on resolving motor unit organization. In the section “Dynamics of chromatophore expansion and contraction,” we analyze the speed of expansion and contraction to demonstrate that such kinetic features can be reliably detected with the temporal resolution of our video imaging approach. By isolating single muscle activations, we establish a methodological framework that can be used in future work to quantify expansion amplitude, rate of change and frequency across preparations.
  
  (6) Chromatophore expansion was only measured in anesthetized E. berryi, and L679 states that chromatophore expansion was triggered by shining light on the skin. However, light-mediated chromatophore expansion may be mediated by a different mechanism, so chromatophore correlations do not necessarily reflect the underlying motor control.
  
  We agree that there is, in principle, a theoretical risk of direct light-mediated activation of chromatophores. Yet, the kinetics of this light mediated activation are very different, and are the object of a separate, on-going investigation by our groups. In our experiments, the illumination was applied to the whole animal rather than locally to the skin, ensuring that all chromatophores and the eyes were exposed to the same light source. By transitioning from darkness to light, we created a window in which chromatophores were partially expanded—both fully contracted and fully expanded states would show little to no decorrelation. Within this window, we observed spontaneous fluctuations in chromatophore activity, which formed the basis for our correlation analyses. To our knowledge, direct light-mediated expansion of chromatophores has not been reported in E. berryi although it may exist there. Finally, the size, shape, and orientation of the inferred motor units align with electrophysiological evidence, supporting the validity of our motor unit inferences.
  
  (7) Some figures might be better suited for the supplement. For instance, it's not clear what the significance of Figure 5 is (it's not currently sufficiently justified in the text).
  
  We have clarified the purpose of Fig. 5 in both the Results and Discussion sections. In the Results, we now explain that events are separated by amplitude to show that expansion–contraction kinetics can be reliably measured across a full range of chromatophore events, validating the precision of our videographic approach. In the Discussion, we highlight that this precision allows measurement of radial muscle speeds and opens avenues to study chromatophore biomechanics, including the contributions of intertwined forces such as radial muscles, elastic pigment sacs, and intercellular coupling.
  
  (8) Multiple chromatophores can belong to multiple clusters - this study reveals that this is because subsections of a chromatophore are controlled separately. But do the same sections (slices) of chromatophores ever belong to multiple clusters?
  
  Yes, it is possible. Dubas (1985) used videographic recordings to show that the same chromatophore muscle fibers could be activated by stimulation of different nerve bundles, supporting Florey’s (1969) electrophysiological evidence for polyneuronal excitatory innervation. From Dubas: "Usually, different muscle fibres were recruited by each nerve but sometimes a single muscle fibre responded to stimulation of each nerve. Variations of the stimulus voltage also produced gradation of the amplitude of shortening of individual muscle fibres. This supports the evidence above for multiple innervation of single muscle fibres."
  
  The petal-like distribution of motor-neuron influence shows overlapping territories, suggesting that some chromatophore sections may be influenced by multiple neurons. However, this overlap could arise from polyinnervation of individual muscles, the presence of gap junctions between muscles, or passive mechanical coupling due to the elastic properties of the pigment sac.
  
  The petal-like distribution of motor-neuron influence shows overlapping territories, suggesting that some chromatophore sections may be influenced by multiple neurons. However, this overlap could arise from polyinnervation of individual muscles, the presence of gap junctions between muscles, or passive mechanical coupling due to the elastic properties of the pigment sac.
  
  With the present approach, it is not possible to disentangle the relative contributions of these mechanisms, which will require targeted physiological or anatomical experiments. For this reason, we adopted a hard clustering approach for individual chromatophore slices.
  
  (9) All time should be labeled in seconds, not in frames, and all distances should be measured in um or mm, not in pixels.
  
  We chose to present figures in pixels and frames to reflect the native units of our recordings and analyses, which preserves fidelity and reproducibility of the computational pipeline. For biological interpretation, corresponding values are converted to µm in the main text, providing the relevant real-world scale. A scale for conversion is provided in the figure legend.
  
  Specific comments:
  
  (1) L36: I'm not sure the description of virtual chromatophores here is clear enough to make sense to a more general audience.
  
  Addressed. We retained the concept of ‘virtual chromatophores’ in the abstract and added a brief clarifying phrase to indicate that these are functional groupings of adjacent chromatophore territories that act as single units.
  
  (2) L50: "Rimmed by" - consider rephrasing.
  
  Addressed. Replaced with “surrounded”.
  
  (3) L64: "refs" - actual references aren't inserted. There are multiple other examples of this.
  
  Addressed. Added missing references.
  
  (4) L100: This section could use rewriting. Some of the text reads more like a figure legend.
  
  Addressed. We have streamlined the main text to reduce redundancy with the figure legend.
  
  (5) L101: Consider the opening sentence/s providing a more general introduction to the question and approach.
  
  Addressed.
  
  (6) L104: This implies that the data presented are from 14 animals of many ages. This is only relevant if the pooled data is analyzed and presented.
  
  We agree that the original phrasing was ambiguous. We have modified the sentence for clarity, and explain in the Methods that 14 animals were filmed to refine the pipeline and explore experimental conditions, while the analyses shown are from a single animal.
  
  (7) L111: HDBSCAN should be defined.
  
  Addressed. The acronym has been expanded.
  
  (8) L173: Figure 6D doesn't exist.
  
  Addressed. Reference to the inexistent 6d figure was removed.
  
  (9) L193: "excluding negative (contraction) phases" This phrase requires clarification.
  
  Addressed. Added “see Methods” in the legend and added clarification on the reasoning in Methods.
  
  (10) L204: Should explain why the switch to affinity-propagation clustering was made when a different method was used for E. berryi.
  
  Addressed in discussion.
  
  (11) Figure 3: I recommend including a diagram or image of a whole cuttlefish and showing what the corresponding imaging area was in relation to the animal so the reader gets an intuitive sense of scale.
  
  Thank you. We have added a supplementary figure to give the reader a sense of scale.
  
  (12) L221/Fig 3b: These colors are supposed to represent clusters of 3 to 5 chromatophores? The clusters look much bigger.
  
  The figure shows clusters of 3 to 5 chromatophores, but many adjacent clusters were assigned the same color. We have changed the colors to remove this ambiguity.
  
  (13) Figure 3c: This would be more powerful if it represented the combined data of many experiments to draw a general conclusion. Also, shouldn't these cluster sizes match those in 2e, e.g. they get as big as 40?
  
  We assume the reviewer is referring to a comparison between Figures 3c and 2e. For visualization purposes, the graph in 3c was truncated to display over 90% of the data, which explains why the largest clusters appear smaller than in 2e. We modified the legend accordingly. We agree that the results would be strengthened by pooling data from additional experiments; however, acquiring high-quality, artifact-free recordings suitable for motor unit analysis is extremely challenging. We hope that our framework will enable future studies to extend this analysis.
  
  (14) Figure 4: I would show some of these examples earlier, to give the reader an intuitive sense of the data and claims (though it doesn't need its own figure - provide a couple of examples, and the diagram of how much of the mantle you're sampling) then put the rest in the supplement, and include some videos too.
  
  We agree that providing spatial context is important for readers to develop an intuitive understanding of the dataset. However, introducing examples of motor units earlier in the manuscript would, in our view, interrupt the logical progression of the Results, where motor unit identification builds on prior analyses. To address the reviewer’s concern, we have added a new supplementary figure (Fig. S1) illustrating the size and location of the sampled mantle region. In addition, we now provide representative videos in the Data Availability section to give readers direct visual access to the underlying dynamics.
  
  (15) Figure 4f: Is the location of the split color in each dot accurate? It's surprising that each one is split down the middle, and the pink side is always on the right - this is unintuitive given where the motor neuron is likely to be located.
  
  The dots and half dots represent the membership of a chromatophore to a particular cluster.
  
  (16) Figure 5: I didn't find this figure sufficiently justified in the text. I would move this to the supplement.
  
  Addressed in General point #7.
  
  (17) L350: States that 12 animals were patched, but the data isn't shown. It's important to show all of this data (some of which can be in the supplement).
  
  Addressed. We provided the data in the Data Availability Section.
  
  (18) Figure 5: I would quantify how many chromatophores were in each motor group across all the recording sessions, and compare this to the equivalent behavioral analysis.
  
  We assume the reviewer means Fig. 6. We calculated and stated the size of motor units across patching sessions.
  
  (19) Figure 5c: I recommend labeling each panel with a different number so you can refer to specific data.
  
  We assume the reviewer means Fig. 6c. We consider the figure layout clear enough to allow readers to follow the data without additional panel numbers.
  
  (20) L379: Typo: repeat of "quantitative"
  
  Addressed.
  
  (21) L576: Salinity should be 33-36 ppt, not %
  
  Addressed.
  
  (22) L877: The salinity units are sg? That should be stated. Though I would use the same units for salinity throughout.
  
  Addressed.
  
  Overall, this work introduces a potentially valuable quantitative framework for studying chromatophore dynamics. Addressing the points above would substantially strengthen the manuscript and clarify the scope and support for its conclusions.
  
  We thank the reviewer for these many helpful comments.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Line 64 - missing references for chromatophore colour with age.
  
  Addressed. Added missing refs.
  
  (2) Line 64-65 - would be good to have a little more detail about what is meant by 'migrating through the skin'. Is this a lateral process, or depth in the skin?
  
  Addressed. Changed “migrating in the thickness..” with “through the thickness..” to emphasize verticality.
  
  (3) Line 72 - typo, should read '...individual and groups...'
  
  Addressed.
  
  (4) Remove 'In Fig 1, ...' from line 104.
  
  Addressed.
  
  (5) Figure 1 - It's unclear why some chromatophores are uncoloured with a red dot in the centre. Are these chromatophores that do not share a cluster with neighbours? If so, wouldn't it make more sense to colour the chromatophore with a unique colour of its own? Or, at the very least, make a note in the caption to indicate that all white chromatophores are not clustered with neighbours.
  
  Segmented chromatophores are shown in white, with coloured slices highlighting cluster membership. Uncoloured slices represent outliers. Addressed in the figure legend.
  
  (6) Line 119 - the concept of a 'closed virtual chromatophore' needs a few more words of explanation. The way I interpret the text as it is, is that the motor units driving colour change are not necessarily the individual chromatophores, but a motor region containing a mixture of whole and partial chromatophores innervated by the same motor neuron. If this is the case, a few extra words of description would help here to remove any ambiguity as I think this is an important concept for the paper.
  
  Addressed. We added a sentence clarifying the concept.
  
  (7) Line 173 - Figure 6d doesn't exist in the paper. Was a different panel intended? If so, please make sure to number the figures in order of appearance in the manuscript.
  
  Reference to the inexistent figure 6d was removed.
  
  (8) Figure 3b is very difficult to see. Perhaps consider lightening the background image. Please also indicate whether the individual colours refer to individual clusters. If this is the case, then some of these clusters look much larger than the 3-5 suggested in the caption.
  
  This issue has been corrected.
  
  (9) Line 210 - remove the bold type.
  
  Addressed.
  
  (10) Line 211 - please specify which 'two groups' you are referring to here. Presumably, this is anaesthetised and non-anaesthetised.
  
  Addressed.
  
  (11) I think that the text is missing any indication of the pixel sizes involved in extracting slice metrics, particularly from the S. officinalis data. It would be great to include some data on how many pixels span the radius of an expanded chromatophore. There is some small indication of this in Figure 2a, but a panel or two with details about the pixel size of S. officinalis chromatophores and their slices would be welcome. This would help with the judgment of the robustness of the resolution of the analysis. Looking at the y-axis in Figure 5a, there is some indication that the chromatophore radius is only 1 to 8 pixels. Is this the case?
  
  Figure 5a doesn’t show chromatophore radius but instead the relative change in peak amplitude during an expansion event. At that point the chromatophore has likely a larger radius as you sum the baseline radius of the chromatophore + the size of the peak.
  
  (12) Line 246-7 - reword this sentence to avoid referring to Figure 3d in the narrative. Include it in parentheses instead.
  
  Addressed.
  
  (13) Lines 408 and 409 - missing references.
  
  Addressed.
  
  (14) Line 576 - salinity should be reported in parts per thousand, not per cent.
  
  Addressed.
  
  (15) Line 593 - how were animals <50mm fed?
  
  Animals smaller than 50 mm were fed Neomysis spp. or small Palaemonetes spp., as noted a few lines above the description for animals larger than 50 mm.
  
  (16) Line 847 - typo - '...putative motor units' ramifications...'
  
  Addressed.
  
  (17) Line 854 - better to write out the [chrom_id, label] info as narrative text rather than using the variable names.
  
  Addressed.
  
  (18) Line 876 - two typos '...were reared in an artificial...'
  
  Addressed.
  
  (19) Line 877 - please use the same salinity metric as used in the earlier part of the methods.
  
  Addressed.
  
  (20) Section 898-910 - equipment details would ideally include the location of the company. E.g. (BX51W1, Olympus, Tokyo, Japan).
  
  Addressed.
  
  Reviewer #3 (Recommendations for the authors):
  
  I am left with a number of questions that arise from the authors' work, some of which the authors themselves briefly mention in the technical limitations section.
  
  (1) In relation to the first weakness, do the authors know if the recruitment patterns they identify are likely to be the same when octopi perform visually-mediated camouflage to their environment?
  
  Thank you for this comment. We assume the reviewer is referring to S. officinalis. There seems to be a misunderstanding: our approach is designed to reveal the smallest independent functional units—motor units—that together generate skin patterns. The technique is fully applicable to an animal displaying camouflage, but the results would necessarily differ. Camouflage patterns are composed of relatively large shapes compared to individual motor units and arise from the coordinated activation of multiple units. Disentangling motor units requires decorrelated activity, whereas visually-evoked camouflage inherently drives correlated motor-unit activation by premotor control. To use an analogy, if our goal were to map the distribution and wiring of pixels on a screen, it would be more informative to broadcast a noise signal rather than display coherent images, as the noise produces decorrelated activity that allows the underlying structure to be resolved. We have clarified this important point in the early results section.
  
  (2) The authors provide indirect evidence that motor neurons innervate multiple chromatophores. Can sets of radial muscles within a chromatophore be innervated by multiple motor neurons? Is there neuroanatomical evidence or experiments that could perhaps shed light on this?
  
  Addressed above. Same question as #1(8).
  
  (3) Are multi-innervated chromatophores evenly distributed across the octopus's body? For instance, could the authors compare chromatophore recruitment over multiple patches on the animal from multiple regions?
  
  At present, we do not have sufficient data to quantitatively compare motor-unit structure or the distribution of multi-innervated chromatophores across different body regions of cuttlefish. However, we would not necessarily expect uniformity across the skin, as distinct body regions are associated with characteristic pattern elements (e.g., the white square on the central mantle or the thicker zebra stripes along the sides). It is therefore plausible that different motor-unit geometries and densities are differentially represented across regions to support these region-specific patterns. Future recordings spanning multiple patches and body locations will be required to test this question directly.
  
  (4) Relatedly, is there any idea of whether chromatophore size or age corresponds with the number of motor units within a single chromatophore?
  
  At present, our analyses are limited to single developmental time points, and we therefore cannot directly assess whether chromatophore size or age correlates with the number of motor neurons innervating an individual chromatophore. However, this is a question that our analysis framework is explicitly designed to address. Our custom pipeline, CHROMAS, (Ukrow, Renard et al., 2025) includes tools for longitudinal image alignment that allow chromatophores to be tracked within the same animal across development. Applying these scripts to developmental datasets enables future analyses linking chromatophore growth or age to changes in the motor innervation of single chromatophores.
  
  I understand that a full resolution to the issues raised above may require substantial additional experiments. At a minimum, further discussion of these points with integration of existing literature would elevate the paper.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2025.11.30.691401v2
www.biorxiv.org www.biorxiv.org

When word order matters: human brains represent sentence meaning differently from large language models

1
1. Public_Reviews 28 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  (1) The rationale behind averaging sentence embeddings across multiple transformer models (with different architectures and training objectives) is unclear. These transformer-based models have different training paradigms and model architectures, which may result in misaligned semantic spaces. The averaging operation may dilute the distinct sentence representations learned by each model, potentially weakening the overall semantic encoding for sentences. Please clarify this choice or cite supporting methodology.
  
  The reviewer questions the rationale for averaging sentence embeddings across different models. However, our method involves computing correlations separately for each model, then averaging the correlations. We apologize for the confusion. We have clarified this on page 3:
  
  “Results for the ‘Transformers’ model are computed by computing correlations separately for five different transformer models and then taking a simple average of these correlations. Results for each individual transformer are presented in Supplementary Information Figure S2.”
  
  (2) All structure-sensitive models discussed incorporate semantics to some extent. Including a purely syntactic baseline, such as a model based on context-free grammar, would help confirm the importance of syntactic structures.
  
  Following the suggestion, we have implemented two syntactic models and discuss the results on page 10:
  
  “We also found that purely syntactic models based on constituency parses (see Benepar and CFG) show poor correlations with brain activity (see Supplementary Information Figure S2). Examining the corresponding RSA matrices (see Figure S1), this seems to be due to such models being overly sensitive to syntactic form, and relatively insensitive to which words are assigned to different nodes within the syntactic tree. This is most evident for the edit-distance similarity metric, and to a lesser extent also for the subtree similarity metric. This finding highlights the value of hybrid approaches designed to appropriately balance sensitivity to lexical, syntactic, and compositional information in representing semantic information at the sentence level.”
  
  (3) In Figure 2, human behavioral judgments show weak correlations with neural data, and even fall below those of computational models, suggesting the behavioral judgments may not reflect the sentence structures in a brain-like way. This discrepancy between behavioral and neural data should be clarified, as it affects the interpretation of the results.
  
  While the behavioural judgements are made by different participants and involve a different task than the neuroimaging results, nonetheless we agree the difference is surprising and warrants more detailed consideration. We have included a more detailed discussion of this issue on page 11:
  
  “Our study has several limitations. First, we found a surprisingly low correlation between behavioural ratings and brain activations (see Figure 2). This may be partly explained by differences in task structure. In the behavioural experiment, participants viewed many pairs of related sentences, and were explicitly asked to pay attention to differences in the words of each sentence. In contrast, in the fMRI task, participants read one sentence at a time without an explicit comparison. In addition, we suspect that presentation of so many sentence pairs with highly similar structures may have biased the way in which participants rated sentence similarity. Modifications to the behavioural task to mitigate these aspects may reduce the divergence between behavioural and brain findings.”
  
  (4) To better contextualize model and neural performance, sentence similarity should be anchored to a notion of semantic "ground truth", such as the matrix shown in Figure 1a. Comparing this reference with human judgments, brain responses, and model similarities would help establish an upper bound.
  
  While our design matrix served as the basis for constructing a set of stimuli with systematic modifications, we respectfully suggest that it should not be regarded as a ‘semantic ground truth’. Sentence pairs within each category will not have the same degrees of semantic similarity since the words and context differ across sentences in a graded manner. Furthermore, while we anticipated ‘different’ sentence pairs would be less similar than ‘swapped’ sentence pairs, and that within each of the six block diagonals the ‘modified’ or ‘substituted’ sentence pairs would be the most similar, we did not have any prediction about the magnitude of these differences. Our goal was to construct a set of sentence pairs which spanned a range of semantic similarities, and allowed for dissociation between lexical similarity and overall similarity in meaning. The design matrix is not intended to represent a ‘ground truth’ that human judgements or brain representations would be expected to conform with.
  
  (5) The structure of this paper is confusing. For instance, Figure 5 is cited early but appears much later. Reordering sections and figures would enhance readability.
  
  We agree that placement of figures was not ideal in the previous draft. We have reworked the manuscript so that all figures appear closer to their mention in the text, and the figure (now Figure 3) appears in the correct order. We have also substantially revised the discussion, and included subheadings to help guide the reader through the various different issues we include.
  
  (6) While the analysis is broad and comprehensive, it lacks depth in some respects. For instance, it remains unclear what specific insights are gained from comparing across brain regions (e.g., whole brain, language network, and other subregions). Similarly, the results of simple-average and group-average RSA appear quite similar and may not advance the interpretation.
  
  We included both analyses in line with our preregistration, and also because we believe the fact that two distinct approaches to analyzing the data yield similar results strengthens our conclusions.
  
  (7) While explaining the grid-like pattern due to sentence length is important, this part feels somewhat disconnected from the central question of this paper (word order). It might be better placed in supplementary material.
  
  We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Consider including a purely syntactic baseline model. For instance, parse each sentence into a constituency tree and compute tree edit distances between pairs of trees. This would allow you to construct a sentence similarity matrix based solely on syntactic structure, and may clarify the role of syntax in sentence representations.
  
  See our response to Public Review comment 2.
  
  (2) Instead of averaging embeddings across different transformer-based models, I recommend reporting RSA results for each model individually. For instance, compare one sentence-level model (e.g., SentBERT or SimCSE) and one general-purpose language model (e.g., GPT-2 or Llama).
  
  See our response to Public Review comment 1.
  
  (3) I suggest revisiting the structure of the Results section to improve the clarity and impact of your key findings. Consider which results are most central to the paper's claims and ensure they are presented in the main text. Less central analyses (e.g., the analysis on the grid-like pattern) might be better suited for the supplementary information. Presenting behavioral results prior to neuroimaging results could also improve logical flow by first validating model similarity estimates behaviorally.
  
  As mentioned in our response to Public Review comment 5, we have revised the ordering of the figures to improve the flow of the main manuscript. We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript. In addition, we believe that presenting the neuroimaging results first is appropriate as this is the primary and most important contribution of our study.
  
  Reviewer #2 (Public review):
  
  (1) The stimuli are not fully controlled for lexical content across conditions. Residual lexical differences between sentences could still influence both brain and model similarity patterns. To more cleanly isolate syntactic effects, it would be useful to systematically vary only a single structural element while keeping all other lexical content constant (e.g., the boy kicked the ball / the ball kicked the boy). It would be better to engage more with the minimal pair paradigm, which is widely used in large language model probing research.
  
  The reviewer rightly argues that our stimuli do not fully control for lexical content across conditions, and that a more appropriate paradigm may be to utilise minimal pairs in which only a single variable of interest (such as sentence structure) is modified. We agree that most of our sentence pairs do not constitute minimal pairs; however, this was not our objective. Our study design aimed to synthesise traditional minimal pair approaches with more recent research paradigms using naturalistic stimuli. As such, we selected stimuli which are more complex and contain more variable features than traditional minimal pair studies, but which also are tailored to highlight differences which are of particular theoretical interest.
  
  Because we are interested in comparing the effects of multiple sentence elements and semantic roles, a systematic pairwise comparison of minimal pairs is not necessarily optimal. Instead, we designed our stimuli to leverage the advantage of fMRI in that we can measure the brain representations corresponding to each sentence, and hence can conduct a full series of pairwise comparisons of sentence representations. We do not claim this approach to be universally superior to a minimal pair approach, but we do believe our novel approach provides additional insights and a new perspective on semantic representation relative to minimal pair studies.
  
  We have added the following paragraph on pages 9-10 contrasting our approach to previous minimal-pair studies:
  
  “Another approach that has seen widespread use is the presentation of minimal sentence pairs that differ only in one specified aspect, for example, interchanging subject and object in a sentence (Frankland 2015, Wang 2016, Frankland 2020, Giglio 2024), or altering adjective-noun phrases to influence composition (Graves 2010, Schell 2017, Fyshe 2019, Ciapparelli 2025). Our approach is an extension of these approaches utilising more naturalistic and complex sentences, designed to facilitate comparison of a wider range of structural manipulations (see Table 1). In more completely characterising the representational structure of various computational models in response to different structural contrasts, we can more comprehensively evaluate their adequacy as models of semantic processing in the brain.”
  
  (2) The comparisons are done across fundamentally different model types, including static embeddings, graph-based parsers, and transformers. The inherent differences in dimensionality and training objectives might make the conclusion drawn from RSA inconclusive. Transformer embeddings typically occupy much higher-dimensional, anisotropic representational spaces, and their similarity structure may reflect richer, more heterogeneous information than models explicitly encoding semantic roles. A lower RSA correlation in this study does not necessarily imply that transformers fail to encode syntactic information; rather, they may represent additional aspects of meaning or context that diverge from the narrow structural contrasts probed here.
  
  The reviewer notes that low RSA correlations do not necessarily imply that transformers fail to encode syntactic information. We acknowledge this in our discussion (page 10), where we also highlight that our focus is not on whether transformers encode such information, but rather what transformer representations can tell us about how sentence structure is represented in the brain. Our results indicate that transformer embeddings do not have the same geometric properties as brain representations of sentence meaning, at least for certain types of sentences where lexical information is insufficient to determine overall meaning.
  
  The reviewer also notes that transformer embeddings are highly anisotropic; however, we adjust for this by normalising each feature as discussed on page 14. Finally, the reviewer notes that the transformers we examine differ in architecture and training objectives. This is not critical for our study because we are not seeking to determine which architecture or training objectives are best. Our goal is simply to compare a range of approaches and see which, if any, have similar sentence representations to those formed by the brain. In fact, our results indicate that architecture and training regime make relatively little difference for our stimuli, as shown by the pattern of results for all models in Figure S2.
  
  (3) The interpretation of the RSA correlation largely depends on the understanding of models. The authors suggest that because hybrid models correlate better than transformers, this implies that transformers are inferior at representing syntax. However, this is not a direct test of syntactic ability. Transformers may encode syntactic information, but it may not be expressed in a way that aligns with the RSA paradigm or the chosen stimuli. RSA does not reveal what the model encodes, and the models might achieve a good correlation for non-syntactic reasons (e.g., length of sentence, orthographic similarity, lexical features).
  
  The reviewer argues that RSA correlations do not measure the extent to which a model encodes syntactic information. This is very similar to the previous point. We do not claim that our results show that transformers do not encode syntactic information. Rather, our claim is that sentence embeddings derived from transformers have different geometric properties to brain representations, and that brain representations are better described by models explicitly representing key semantic roles. From this we conclude that, at least for the sentences we present, the brain is highly sensitive to semantic roles in a way that transformer representations are not (at least to the same extent). We have clarified this in a modified paragraph on page 11:
  
  “We emphasise that our results do not show that transformers fail to represent syntactic or semantic role information. Indeed, large language models show clear capabilities of correctly interpreting sentence structure (Chang 2024), and probing studies have found that transformers represent information about syntax and word order (Clark 2019, Manning 2020). This is consistent with our finding that directly prompting GPT-4 to rate sentence similarity yields very high correlations with human judgements (see Supplementary Information Figure S3). Nonetheless, the fact that transformers can encode and utilise structural information to perform linguistic tasks does not mean that they effectively utilise this information to construct a brain-like representation of sentence meaning.”
  
  We also respectfully disagree with the reviewer’s suggestions that sentence length and orthographic or lexical similarities may drive model correlations with brain activity. As we discuss on page 19, we explicitly control for differences in sentence length when computing correlations. Our process for constructing our sentence set also controls for lexical similarity by generating pairs of sentences with all or mostly the same words but different orderings. We did not explicitly address orthographic similarity, but this will be strongly correlated with lexical similarity.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Model dimensionality: the interpretability of cosine similarity diminishes as the dimensionality increases, and there are some math tricks to work around it. To make a fair comparison among models with different dimensionalities, it would be better to apply some dimensionality-insensitive distance metrics.
  
  We thank the reviewer for this suggestion. We repeated all vector-based similarity calculations using the Dimension Insensitive Euclidean Metric (DIEM). As shown in Figure S9, the results are broadly similar, though with overall somewhat lower brain correlations for most transformers compared to cosine similarity.
  
  (2) Depending on the scope of the current study, if the authors would like to establish whether transformers are inferior to graph-based models in representing syntax, a linear classifier using the model embeddings would be sufficient. I think this would be a more direct assessment of model syntax ability than correlation with brain data.
  
  As we discuss in our previous responses, our objective in this study was not to assess how well transformers can represent syntax. Rather, the goal was to assess whether internal transformer representations have similar geometric properties to patterns of brain activation. Our results indicate that transformers do represent sentence structure, but in a different manner to the human brain.
  
  Reviewer #3 (Public review):
  
  (1) The interpretation of findings is nuanced. Although Transformers underperform as brain models on the critical subsets of controlled sentences, a Transformer outperforms all other models when evaluated on the union of all sentences when both word-level content and structure vary. Transformers also yield equivalent or better models of human behavioral data. Thus, although Transformers have demonstrable flaws as human models, which are pinpointed here, in the general case, (some) Transformers are more human-like than the other models considered.
  
  The reviewer argues that we overstate some of our conclusions, as several transformers achieve higher brain correlations than the hybrid model when computed over all sentence pairs, as well as on the behavioural data. In response, we first note that our primary interest in this paper is on the block diagonal sentence pairs, as these were specifically designed to interrogate how different models represent sentence structure. The comparison with all sentence pairs is presented for comparison but is not our primary focus on this paper, as also reflected in the pre-registered prediction that our VerbNet-CN hybrid model would show higher brain correlations than transformers over this block diagonal subset.
  
  Second, we have included a new analysis in the revised manuscript (Figure S9) where we compute brain correlations controlling for the pattern of similarities observed in the primary visual cortex (averaged over participants), as a way to control for visual similarity. This added control substantially reduces the brain correlations of the transformers, such that they all have lower correlations than VerbNet-CN and AMR-smatch even over the set of all sentence pairs. We provide interpretation of this result in the discussion.
  
  Third, we would like to note one of the disadvantages of transformers as a model of mind or brain representations is that they are largely a ‘black box’ whose workings are poorly understood. One advantage of hybrid models like our simple semantic role model is that they can be much easier to interpret, thereby enabling them to be used to determine which features are most important for brain representations of sentence meaning, and what mechanisms are used to combine individual words into a full sentence. Given their relative simplicity and interpretability, we believe hybrid models have considerable value as scientific tools, even in cases where they achieve comparable correlations to transformers. We have added a short discussion of this issue in the revised manuscript (page 10).
  
  (2) There may be confounds between the critical sentence structure manipulations and visual representations of sentence stimuli. This is inconvenient because activation in brain regions that process semantics tends to partially correlate with visual cortex representations, and computational models tend to reflect the number of words/tokens/elements in sentences. Although the study commendably controls for confounds associated with sentence length, there could still be residual effects that remain. For instance, the Graph model correlates most strongly with the visual cortex despite these sentence length controls.
  
  We agree with the reviewer that this is a potential confound. As noted in the previous response, we have implemented a new control analysis in which we directly control for visual similarities as reflected in participant-averaged similarities of primary visual cortex activations in response to all stimuli. These results are shown in Figures S8-S11 in the SI. We show that transformer correlations are reduced much more than graph and hybrid models with this control. Also, we note that the AMR-smatch graph model shows high correlations with other brain regions even after removing correlations with the visual cortex (Figure S10). This indicates that the model represents a range of sentence features, including both superficial visual or length-related features, as well as semantic features that are represented in common with language and other cortical regions.
  
  (3) Sentence similarity computations are emphasized as the basis for unifying comparative analyses of graph structures and vector data. A strength of this approach is that correlation is not always the ideal similarity metric. However, a weakness is that similarity computations are not unified across models. This has practical consequences here because different similarity metrics applied to the same model produce positive or negative correlations with brain data.
  
  The reviewer notes that the method for computing similarities differs between the vector-based (mean and transformer) models, and the hybrid and syntax-based models, thereby potentially adding an additional confound to our results. We agree that this is a potential limitation, and our correlations should always be understood as applying to a model paired with a similarity metric. However, we believe that this is mostly unavoidable when comparing different formalisms. In the revised manuscript we have incorporated an entirely new similarity metric for vector-based models (DIEM similarity), as well as an extended discussion of the effect of different similarity metrics for graph and hybrid models.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Compute separate RSAs on each sentence pair type (especially Swapped), to quantify how each sentence type manipulation contributed to the divergence between model and brain. Although the manuscript is already brimming with analyses, I think squeezing this in would be helpful because the results currently rely on qualitative inspection of group-average scatter plots to interpret how sentence pair manipulations contributed to the divergence between Transformers and humans. The Swapped condition would appear to be the centrepiece of the title and manuscript, and potentially the only condition for which confounds associated with the surface form of sentence are controlled for (because sentences should be the same words in different orders). Thus, this analysis might see to the inconvenient visual cortex correlations in Figures 3d/e.
  
  We respectfully disagree that computing separate RSA for each sentence pair type would be a useful additional analysis. The motivation for the construction of our stimulus set was to provide a range of variants of a given base sentence that alter the semantic meaning and lexical content (somewhat) independently. The purpose of the ‘modified’ sentences, for instance, is to construct sentences with a similar overall meaning but lower lexical similarity due to the inclusion of many modifier words. It is precisely the comparisons across the different pair types that provide information about how each model represents sentence semantics, so restricting an analysis to only a single subset would not be very informative. Another problem with this approach is that it would dramatically reduce the number of sentence pairs analysed, thereby decreasing statistical power. In the revised manuscript we have provided additional details regarding the motivation and rationale for how our stimulus set of 108 sentences was constructed, which should help to elucidate this point more clearly. The following excerpt is from page 3:
  
  “Within each of the six subsets, we begin with a base sentence such as `the cameraman brought the equipment to the director', which we then systematically modified in various ways to create different combinations of lexical and compositional similarity, in order to dissociate these two aspects of meaning (see Table 1 for further details).”
  
  (2) Explaining the motivation for the sentence stimulus types. I appreciated the careful design of the dataset, but I couldn't immediately work out the motivation for all the different sentence types, and why this selection was ideal to identify divergences with Transformers. For instance, given the goal of (approximately) controlling for lexical similarity whilst varying sentence meaning, I couldn't immediately see why stimulus blocks weren't all built from rearranging the same content words (as in the Swapped condition). The negative RSA correlation with the Mean model also made me stop and think - it seems like the more similar the words in a sentence, the more different their structure, and vice versa, but I wasn't clear that this was a design feature. Thus, a few extra words motivating the conditions could be helpful for the reader, and these might helpfully lead them to anticipate the negative RSA correlation.
  
  As noted in the previous response, in the revised manuscript we have expanded our explanation of the rationale for the construction of our 108 sentences. In particular, Table 1 in the methods section now includes two additional columns which summarise the intended combinations of lexical and overall sentence similarity which our sentence pairs are intended to satisfy.
  
  (3) Explanation for why different implementations and similarity computations between variants of ostensibly equivalent Graph / Hybrid models yielded widely divergent positive vs negative brain correlations, despite both positively capturing behavioural ratings. This might incorporate a brief intuitive explanation of how Graph model similarities were computed (e.g., what SMATCH and WWLK do). In light of the above, why do different similarity algorithms applied to the Graph model yield positive and negative correlations on the same brain (e.g., Figure S2 - Graph / Graph-WL a,b, diag-pairs). Same goes for why Hybrid and Hybrid-AMR yielded positive vs negative correlations (e.g., Figure S2 - Graph / Graph-WL a,b, diag-pairs). Acknowledge that the brain results are sensitive to similarity computations in the Discussion.
  
  We appreciate this suggestion. We have added an extended consideration of these issues to the discussion (pages 10-11), as well as some additional details regarding the differences between the Smatch and WWLK metrics in the methods section (page 17).
  
  (4) Acknowledgement and explanation of why the human similarity ratings were poor at explaining brain data in Figure 2a,b (right column diag-pairs). The poor behaviour vs brain match is indirectly implied in the Discussion as "the comparison between behavioural and fMRI data is somewhat difficult owing to the difference in task structure." However, I would suggest being upfront and explicitly mentioning and explaining the poor brain match in Figures 2a and b, because the reader will notice and wonder - especially because the models correlate strongly with the behavioural data without the models doing the human behavioral task (though this could be a possibility, see later).’
  
  As suggested, we have included a passing reference to this in the presentation of our main results in page 5, and a lengthier discussion on page 11:
  
  “Our study has several limitations. First, we found a surprisingly low correlation between behavioural ratings and brain activations (see Figure 2). This may be partly explained by differences in task structure. In the behavioural experiment, participants viewed many pairs of related sentences, and were explicitly asked to pay attention to differences in the words of each sentence. In contrast, in the fMRI task participants (who were not the same as the behavioural task participants) read one sentence at a time without an explicit comparison. In addition, we suspect that presentation of so many sentence pairs with highly similar structures may have biased the way in which participants rated sentence similarity. Modifications to the behavioural task to mitigate these aspects may reduce the divergence between behavioural and brain findings.”
  
  (5) Brief explanation of why model vs brain correlations tended to be strongest in the visual cortex (Figure 3d,e). Currently, this issue is only mentioned in passing, however, it seems worthy of further comment.
  
  We appreciate the reviewer for highlighting this issue. We have added discussion of the potential for visual confounds to several points in the revised manuscript, including the ‘Neuroscience of semantics’ subsection on page 11. As noted, we have also added a new analysis in which we compute correlations controlling for the average RSA similarities of the primary visual cortex. We find that this additional control significantly reduces correlations for most transformer models, but only has a more modest reduction on the correlations for most of the graph and hybrid models, particularly VerbNet-CN (see Figures S8-S11).
  
  (6) Softening/clarifying some statements that could be misconstrued as suggesting Transformers were universally inferior models. Statements made in the Abstract/Discussion initially came over to me as implying that Transformers were universally inferior models when compared to the Graph/Hybrid models - but this appears only to be true when one looks at analyses conducted within block diagonal sentence subsets. Otherwise, when analyses are conducted on all sentences (between and within blocks, Figure 5) Llama 3 L2 provides by far the strongest brain model. Transformers also appear to yield the strongest accounts of the behavioural data, whether tested on block diagonal or all sentence pairs (Figure S3). To remedy this, I would suggest softening some statements in the Abstract/Discussion that could be misconstrued as suggesting that Transformers were universally inferior. I would also suggest explicitly acknowledging that when the entire dataset was analyzed, Transformers were most accurate, and that (some) Transformers best accounted for the behavioural data.
  
  We agree that there was some lack of precision in certain sections of the previous draft regarding the conclusions to be drawn regarding the representational capacities of transformers. We have revised the abstract and conclusion to better reflect our intended message, which is that transformers certainly can represent sentence structure and semantic roles, but that the way in which they do this (through vector representations in their hidden layers) is significantly different to how such features are represented in the human brain. In particular, we have included this new text on page 10:
  
  “We emphasise that our results do not show that transformers fail to represent syntactic or semantic role information. Indeed, large language models show clear capabilities of correctly interpreting sentence structure, and probing studies have found that transformers represent information about syntax and word order. This is consistent with our finding that directly prompting GPT-4 to rate sentence similarity yields very high correlations with human judgements (see Figure S3). Nonetheless, the fact that transformers can encode and utilise structural information to perform linguistic tasks does not mean that they effectively utilise this information to construct a brain-like representation of sentence meaning.
  
  (7) Given that GPT-4 was already deployed to parse semantic roles for the hybrid model, and GPT-4 should be able to generate reasonable similarity ratings between sentence pairs, it struck me that an interesting addendum could be to use GPT-4 similarities derived from the human behavioral task to interpret both brain and human behavioral data. This might also help support the case for conducting analyses within a similarity-based framework.
  
  We appreciate this suggestion. We have added this model (GPT-4 ratings of sentence similarity) to the revised manuscript (see Figures S1-S3).
  
  Other changes
  
  As noted by reviewer 3, the full set of sentence pairs was missing from the previous draft. They have been added to the SI of the revised manuscript.
  
  We have renamed the Graph and Hybrid models in the manuscript to AMR-Smatch and Verbnet-CN respectively, for greater clarity as to which models these terms refer to, and also to better differentiate from the newly added constituency parse graph models.
  
  We have thoroughly revised the discussion section, incorporating feedback from all reviewers regarding areas needing additional depth.
  
  We have added subsections to the discussion to aid the reader navigating the now lengthier section.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.19.665701v2
www.biorxiv.org www.biorxiv.org

Decoupling AMPK from fatty acid synthesis allows maintenance of fitness late in life

1
1. Public_Reviews 28 May 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  This rigorous and creative study uses an elegant combination of metabolomics, transcriptomics, and budding yeast molecular genetics to discover that (i) activating AMPK to maintain mitochondrial respiration fueled by cytosolic Acetyl CoA and (ii) increasing fatty acid synthesis independent of respiration drive independent pathways that increase the fitness of replicatively-aged budding yeast cells, albeit without increasing their lifespan. This work will be of interest to scientists in the field of aging and metabolism. Some clarifications in the text would address the following concerns, which would increase the impact of the study:
  
  (1) What does activation of AMPK (via PGDP-Sak1 expression) do to the replicative lifespan? How many bud scars, in general, do the subpopulations that are older - yet have less Tom70 (increased mitochondrial fitness) - have, after the 48 hrs time point that they are examining? How many divisions occurred in this 48hr time period - i.e. is it long enough to have all cells reach the end of their replicative lifespan? This information is important to rule out that a subset of the mutant cells just divided faster and hence had more divisions within 48 hrs (growing faster and living longer are different things). Having identical growth curves doesn't indicate per se that they all divide at the same rate, as there may be a subpopulation that divides faster and a subpopulation that doesn't grow so well.
  
  Increasing AMPK activity increases replicative lifespan [PMID: 25869125], but given our finding that AMPK activation splits the population, such replicative lifespan assays are hard to interpret. Bud scar counts have a similar issue. Hence we restricted the lifespan and bud scar analyses to wt and A2A which are more homogenous (Figures S2 B and E). A2A cells at 48h have ~25% more bud scars than wt cells. Yes, by 48h most of the cells have lost viability (Figure 2E). The reviewer is correct that you can't properly compare the lifespan curves if the cells divide at different rates, hence our follow-up test of wt at 48h vs A2A at 40h viability after we had confirmed that these timepoints captured cells at equivalent replicative ages (Figure 2D,E). This shows that viability of A2A is slightly lower than wt at matched age, indicating a slightly shorter lifespan.
  
  (2) A2A cells do not have an extended replicative lifespan (RLS) but show an increase in the "low senescence" population (Figure 2). If the cells are not becoming senescent, why don't they have longer RLS? Not having a longer lifespan seems inconsistent with the statement that "bud scar counting confirmed that A2A cells reach a higher age than wild type", which comes back to how many times the cells can divide in the 48hr timepoint studied and their rate of cell division? Also, the lifespan curve shown is plotted against time, not cell division number, which does not take into account different division times of cells within the population (described above). It would be much more useful to show standard lifespan curves showing cell division numbers per lifespan per cell.
  
  Our observation that cells can reach the end of life without senescing is consistent with other studies that have studied the life course of individual cells by microscopy [PMID: 31291577, 32675375]. These studies always highlight some proportion of the cells that reach the end of life with no or minimal senescence, though this fraction varies with the experimental system. The question of why cells lose viability without senescing is a complete unknown in the field, but reflects a wider lack of consensus as to why yeast lose viability with replicative age.
  
  We are wary about making strong statements on lifespan for exactly the reason the reviewer picks out. In liquid culture we can only assess viability over time, and it is clear from the comparison of liquid and solid media lifespans performed by the Gottschling lab [PMID: 19652178] that culture system has a huge effect on lifespan, with cells in classical microdissection-based lifespan assays living far longer than they do in liquid. This of course means that classical microdissection assays are not very useful for A2A so we are left with an unsatisfactory approximation. We have therefore restricted our conclusion on lifespan to simply say that lifespan of A2A cells is not extended which our data in Figures 2D,E,S2B does support (see also answer to Q1), and therefore with the majority of A2A cells showing low senescence marks and high fitness at 48h we can conclude that lifespan and fitness loss must be separable.
  
  We will note these limitations of lifespan measurements in the manuscript.
  
  (3) Increased "fitness" of the old cells is implied from the increased size of the colonies that the old cells can make. However, this is a measure of the fitness of the daughters per se, not the old mother cells. Are the old mothers just passing on healthier mitochondria and more lipids to the daughters, such that they can divide more times? If the aged cells have an "increased fitness", why don't they divide more times themselves (i.e. live longer?).
  
  Yes, colony growth speed is defined by daughter cell replication, and as long as the daughters and subsequent generations divide at the same rate irrespective of whether they come from a young or old mothers then the size of the colony after 24 hours varies based on the time it took the initial mother to produce a daughter. This is what the assay really measures. We note that aged wildtype mothers often do not divide at all in the first 24 hours after being put on an agar plate (hence the tiny reported colony size), even though they do eventually produce a daughter which then forms a colony, whereas A2A cells tend to produce the first daughter rapidly whether young or old. It is known that daughters of aged wildtype mothers also divide slower, which will also contribute to differences in colony size, and this may well result from a lipid and/or mitochondrial contribution, but the primary driver of colony size in 24 hours is the time the mother took to initially divide. We will add this detail to the manuscript.
  
  As noted above, the mechanistic basis of lifespan is unknown, but although senescence can shorten lifespan, our work and that of others shows that lifespan is still limited in the absence of senescence.
  
  (4) The statement is made that "these experiments define two classes of aging cells with distinct metabolic needs, coherent with the model of two aging trajectories previously proposed (referencing Nan Hao's work)". However, the big difference here is that in Nan Hao's work, their two aging trajectories influenced the length of lifespan, but that does not appear to be the case here. That distinction should be made clear. Perhaps the authors could also speculate as to why the A2A yeast stops dividing after presumably the same number of cell divisions, even though they have an activated AMPK and activated fatty acid synthesis pathway.
  
  We will add this distinction. As noted above, we are wary of making strong statements regarding lifespan as the assays we can do in liquid culture are limited. We are therefore similarly wary about speculating about causes for the lack of lifespan difference because in reality all we can do is rule out a big effect. We would love to speculate on why the A2A cells don't have an extended lifespan, but at this point we don't have any good ideas on this point!
  
  (5) I am a bit confused by the use of the word "senescence" by this lab here and in their previous growth on galactose studies. If yeast don't senesce, which is usually defined as an irreversible arrest of the cell cycle where cells stop dividing, shouldn't the yeast that do not senesce still be dividing and hence have a longer lifespan? Should a different term be used rather than senescence? Such as "fitness late in life". The authors giving their definition of senescence may help reduce this apparent contradiction.
  
  We completely agree, this is confusing and noted this distinction in the Introduction. Use of the term senescence to mean a loss of fitness late in life in yeast stems from the classical definition of senescence as applied to whole organisms. However, the term senescence as applied to cells has a more specific meaning in terms of the cell cycle as the reviewer notes. As an individual S. cerevisiae is both a cell and an organism, the terminology clashes. However, the marker we largely employ (Tom70-GFP) which in our hands is a very good proxy for fitness was originally defined as marking the senescence entry point (SEP), so overall we feel we can't avoid the term.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this study, the authors investigate how cytosolic acetyl-CoA metabolism influences replicative aging in budding yeast. They propose that acetyl-CoA regulates aging through three major pathways: (1) mitochondrial transport to support mitochondrial function, (2) fatty acid synthesis, and (3) global protein acetylation. The data show that AMPK activation promotes mitochondrial import of acetyl-CoA and partially mitigates mitochondrial decline in a subset of aging cells.
  
  Furthermore, the engineered A2A strain, which enhances mitochondrial acetyl-CoA utilization while relieving inhibition of fatty acid synthesis, increases the proportion of cells exhibiting a "low senescence" phenotype.
  
  Overall, this is a thoughtful and potentially impactful study that advances our understanding of metab to olic control of aging. Addressing the points below, particularly by refining interpretations and, where feasible, incorporating additional analyses, will further strengthen the manuscript and its conclusions.
  
  Strengths:
  
  The study has several notable strengths. It addresses an important question by shifting the focus from lifespan to preservation of late-life fitness, which is highly relevant to aging biology. The work integrates metabolic, genetic, and functional analyses to link cytosolic acetyl-CoA flux with distinct aging outcomes, and the engineering of the A2A strain provides a clear and elegant demonstration of how coordinated pathway modulation can improve cellular fitness.
  
  Weaknesses:
  
  (1) While the manuscript focuses on mitochondrial transport and fatty acid synthesis, cytosolic acetyl-CoA is also a key regulator of histone acetylation and chromatin silencing. It would strengthen the study to consider whether acetyl-CoA depletion contributes to improved fitness through enhanced rDNA silencing. Given the well-established role of rDNA instability in yeast aging, additional experiments examining rDNA silencing and stability would be valuable. For example, monitoring rDNA copy number changes (not necessarily ERCs) under AMPK activation, oleic acid supplementation, and in the A2A strain, similar to approaches used in the authors' prior work, would help clarify whether chromatin regulation contributes to the observed phenotypes.
  
  We have data addressing this point that we will add to the manuscript. In short, we see no difference in gene expression from Sir2-repressed sub-telomeric regions or MAT loci, but the genome-wide gene expression dysregulation associated with age is partially suppressed in PGPD-SAK1. However, A2A does not suppress this further, so it is not critical for the suppression of senescence in A2A though we are following this up. ERC accumulation is higher in A2A at 48h, consistent with the cells being older, meaning that ERCs are unlinked to senescence onset as we have previously reported. There is a strong upregulation of transcripts from Sir2-repressed rDNA intergenic spacers with age in all genotypes, but we attribute this simply to the copy number increase of these regions on ERCs rather than a defect in silencing. We have previously looked for heritable changes in rDNA copy number arising during ageing and found (to our surprise) absolutely nothing, so we don't expect any changes under these conditions.
  
  (2) The current data do not fully distinguish whether AMPK activation and oleic acid supplementation act on distinct subpopulations of aging cells. An alternative explanation is that oleic acid supplementation enhances mitochondrial function and acts additively with AMPK activation, thereby increasing the fraction of cells in the "low senescence" state. Since this distinction is not central to the main conclusions, I suggest softening the language around subpopulation specificity. Emphasizing instead that the A2A strain coordinately modulates multiple branches of acetyl-CoA metabolism to improve late-life fitness would maintain the strength of the central message without overinterpretation.
  
  We agree that oleic acid and the lipids produced downstream of Acc1 in A2A may improve late life fitness via enhanced mitochondrial function, and in support of this Oxygen Consumption Rate is marginally (though significantly) higher in A2A than PGPD-SAK1. We will add this data to the manuscript. However, we disagree with the interpretation of an additive effect as we report throughout the study that AMPK activation and lipid biosynthesis/supplementation affect different sub-populations of cells. We do not observe populations of intermediate senescence cells, rather by flow cytometry and fitness assays we observe individual cells in binary low senescence or high senescence states.
  
  (3) The manuscript proposes that lipid starvation and excess acetyl-CoA are major drivers of senescence in distinct subpopulations of wild-type aging cells. This conclusion is not yet fully supported by the presented data. Direct measurements of age-dependent divergence in acetyl-CoA and fatty acid levels at the single-cell level would be needed to substantiate this model. Based on the current evidence, a more conservative interpretation would be that aging cells exhibit differential sensitivity to perturbations in acetyl-CoA and lipid metabolism. Accordingly, I recommend revising the statement in the Abstract ("We further implicate lipid starvation and excess acetyl coenzyme A availability as major drivers of senescence...") and the corresponding discussion text to better align with the data.
  
  We agree and will adjust the abstract to make it clearer that the lipid starvation / excess acetyl coA interpretation is a model.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  These findings suggest that PGPD-SAK1 yeast show a subpopulation with lowered TOM70-GFP expression in high bud scar staining aged cells. Deletion of CAT2 or MLS1 reduces this effect. A PGPD-SAK1 acc1S1157A double mutant (called "A2A" here) shows an even larger effect of lowered tom70 expression in high bud scar staining aged cells. Utilization of various additional mutants involved in acetyl-CoA transport, carnitine shuttle, respiration, etc., leads the authors to conclude that these shifts in TOM70-GFP in aged cells are linked to the AMPK-fatty acid metabolic regulatory system.
  
  Strengths:
  
  These extensive and clearly described experiments reveal interesting changes in TOM70-GFP intensity in subsets of aged yeast in several mutants eventually identified as linked to the AMPK-fatty acid metabolic regulatory system.
  
  Weaknesses:
  
  (1) 3 biological replicates for mRNASeq is low.
  
  Thank you for pointing this out. We performed another replicate after posting the initial preprint but didn’t update the figure in the eLIFe-reviewed version. We will add this to the scatter plots and analysis in Figure 1, the findings have not changed.
  
  (2) While "Traditional conceptions of ageing implicate a progressive accumulation of damage leading to systemic degradation in performance until death, with evolutionary pressures acting to maximise early life fitness and fecundity at the expense of ageing health." is tangential perhaps to the data and conclusions of the study, both claims of this sentence are at best controversial, and the manuscript is no weaker for their omission.
  
  We actually feel that this sentence is very important to the message of the manuscript, which is that ageing does not necessarily have to involve a loss of fitness before death. Ageing is often described as the progressive wearing out of components leading to decline and death (with an old car often used as an analogy); in the ageing field this is certainly controversial, but outside the field this remains the normal understanding. We think it is important to state this widely held viewpoint with which our findings are hard to reconcile.
  
  Our interpretation that yeast are bet-hedging as a population growth strategy and this drives ageing in the long term is a classic antagonistic pleiotropy - we will add this term (from the citation that is already in the manuscript) and clarify in the discussion to make it obvious why we are introducing this concept in the introduction.
  
  (3) The statement that "Here, we determine the basis of senescence and fitness loss in replicatively ageing yeast" is a bit strong as a summary of the present careful work presented here. If the authors had created yeast mutants that retained fitness indefinitely, this would be a more appropriate strength of claim to summarize the work.
  
  Indeed - we will refine this sentence.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.27.645766v2
search.brave.com search.brave.com

Brave Search

2
1. indyweb 28 May 2026
  
  in Public
  
  Teleshuttle ucm.teleshuttle.com › 2018 › 11 › as-we-will-think-legacy-of-ted-nelson.html Smartly Intertwingled: "As We Will Think" -- The Legacy of Ted Nelson, Original Visionary of the Web Why Nelson matters A fuller explanation of why Nelson matters is in my post from a few years ago, Digital Camelot - The Once and Future Web of Engelbart and Nelson, but here I caption its core message: If you care about modern culture and how technology is shaping it, this is worth thinking about -- A powerful eulogy for where the Web might have gone, and still may someday, and the friendship of the two people most responsible for envisioning the Web* -- Ted Nelson's eulogy for his friend Doug Engelbart, as reported by John Markoff in The Times -- with Nelson's inimitable flair.
2. indyweb 23 May 2026
  
  in Public
  
  Medium rreisman.medium.com › as-we-will-think-the-legacy-of-ted-nelson-original-visionary-of-the-web-f4f69a60bd6 “As We Will Think” — The Legacy of Ted Nelson, Original Visionary of the Web | by Richard Reisman | Medium 26 November 2018 - A fuller explanation of why Nelson ... about — A powerful eulogy for where the Web might have gone, and still may someday, and the friendship of the two people most responsible for envisioning the Web* — Ted Nelson’s ...
Visit annotations in context

Annotators

indyweb

URL

search.brave.com/search
elifesciences.org elifesciences.org

How attention simplifies mental representations for planning

1
1. Public_Reviews 28 May 2026
  
  in eLife (unscoped)
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This study investigated how visuospatial attention influences the way people build simplified mental representations to support planning and decision-making. Using computational modeling and virtual maze navigation, the authors examined whether spatial proximity and the spatial arrangement of obstacles determine which elements are included in participants' internal models of a task. The study developed and tested an extension of the value-guided construal (VGC) model that incorporates features of spatial attention for selecting simpler task mental representation.
  
  Strengths:
  
  (1) Original Perspective:
  
  The study introduces an explicit attentional component to established models of planning, offering an approach that bridges perception, attention, and decisionmaking.
  
  (2) Methodological Approach:
  
  The combination of computational modeling, behavioral data, and eye-tracking provides converging measures to assess the relationship between attention and planning representations.
  
  (3) Cross-validated data:
  
  The study relies on the analysis of three separate datasets, two already published and an additional novel one. This allows for cross-validation of the findings and enhances the robustness of the evidence.
  
  (4) Focus on Individual Differences:
  
  Reports of how individual variability in attentional "spillover" correlates with the sparsity of task representations and spatial proximity add depth to the analysis.
  
  We thank the Reviewer for their overall positive assessment of our work and their helpful comments. We have addressed each point below.
  
  Weaknesses:
  
  (1) Clarity of the VGC model and behavioral task:
  
  The exposition of the VGC model lacks sufficient detail for non-expert readers. It is not clear how this model infers which maze obstacles are relevant or irrelevant for planning, nor how the maze tasks specifically operationalize "planning" versus other cognitive processes.
  
  The method for classifying obstacles as relevant or irrelevant to the task and connecting metacognitive awareness (i.e., participants' reports of noticing obstacles) to attentional capture is not well justified. The rationale for why awareness serves as a valid attention proxy, as opposed to behavioral or neurophysiological markers, should be clearer.
  
  We thank the reviewer for urging further clarity here. Our work builds closely on the previous maze navigation paradigm and VGC model developed and reported by Ho et al. Nature (2022). We directly adopted variants of their maze stimuli, computational model and obstacle awareness measures, and married these with an investigation of the role of visuospatial attention. We agree that it would be useful for the reader to have a more in-depth description of the paradigm and model, and how it operationalises planning, without needing to refer back to the original Ho et al. paper. We have now added additional explanatory sections to the Introduction and Methods as follows:
  
  On page 4:
  
  “One elegant approach to forming such a simplified representation is to adaptively select the granularity of information required to complete the task (Ho et al., 2022), known as value-guided construal (VGC). Unlike previous accounts, which model human planning as a search over all items (e.g.., tube lines), the VGC model predicts that a cognitively limited decision-maker selects a manageable subset of information over which to plan— i.e., a task representation—balancing utility and complexity (Ho et al., 2022). In our example, the VGC algorithm would plan over a few relevant tube lines rather than planning over all possible stations. To select the representation that achieves the best balance between utility and complexity, the model searches across all possible combinations of tube lines, computing the value (i.e., the plan’s utility minus its cost) of each representation for planning a specific journey. The algorithm then selects the representation with the highest value, which ensures that an ideal observer selects a representation which only includes the items (i.e., tube lines) that lead to successful planning while excluding as many items as possible to keep the plan as simple as possible. For our purposes, items included in the representation are considered taskrelevant, while items that are not represented are considered task-irrelevant. This algorithm, therefore, provides a normative standard of an efficient plan to which we can compare people’s actual plans.”
  
  On page 6:
  
  “We operationalized planning using a maze navigation paradigm, akin to our tube-related example, where participants were required to plan a route through the maze, avoiding obstacles that blocked their path. Obstacles predicted by the sVGC model to be included in the representation were considered task-relevant.”
  
  “At the end of every trial, participants reported their awareness of specific obstacles (see Methods for details). The level of awareness reported for different obstacles provides a read-out of what features of the environment individuals were subjectively representing while solving a particular maze. While other markers of attention and awareness (for instance, behavioural or neurophysiological variables) could also be used, here we focused on direct awareness reports in order to relate our findings both to those of Ho and colleagues and to the subjective awareness reports used in consciousness science (e.g. the Perceptual Awareness Scale (Barnett et al., 2024; Overgaard & Sandberg, 2021; Ramsøy & Overgaard, 2004; Samaha et al., 2015)). Participants were instructed to maintain central fixation while planning (see dataset dSC 1), in line with previous empirical work using this task (Ho et al., 2022).”
  
  To visualize our effects, we binarized the predictions of the sVGC model such that obstacles with a marginalized probability greater than 0.5 were considered taskrelevant, while other obstacles were considered task-irrelevant (e.g., Figure 2b). We now clarify this point in the caption of Figure 2.
  
  (2) Attention framework:
  
  The account of attention is largely limited to the "spotlight" model. When solving a maze, participants trace the correct trail, following it mentally with their overt or covert attention. In this perspective, relevant concepts are also rooted in attention literature pertaining to object-based attention using tasks like curve tracing (e.g., Pooresmaeili & Roelfsema, 2014) and to mental maze solving (e.g., Wong & Scholl, 2024), which may be highly relevant and add nuance to the current work. This view of attention may be more pertinent to the task than models of simultaneously tracking multiple objects cited here. Prior work (notably from the Roelfsema group) indicates that attentional engagement in curve-tracing tasks may be a continuous, bottom-up process that progressively spreads along a trajectory, in time and space, rather than a "spotlight" that simply travels along the path. The spread of attention depends on the spatial proximity to distractors - a point that could also be pertinent to the findings here.
  
  Moreover, the tracing of a "solution" trail in a maze may be spontaneous and not only a top-down voluntary operation (Wong & Scholl, 2024), a finding that requires a more careful framing of the link to conscious perception discussed in the manuscript.
  
  Conceptualizing attention as a spatial spotlight may therefore oversimplify its role in navigation and planning. Perhaps the observed attentional modulation reflects a perceptual stage of building the trail in the maze rather than a filter for a later representation for more efficient decision making and planning. A fuller discussion of whether the current model and data can distinguish between these frameworks would benefit readers.
  
  We thank the reviewer for highlighting relevant findings in the attention literature that were missing from our discussion. We fully agree that a complete account of the interplay of planning, navigation, and attention is likely to recruit the kind of curvetracing processes highlighted by the reviewer. However, we emphasise that our current focus is not on the process of navigation through a maze, but on the process of construing the maze itself. In other words, we are focused not on how people represent their path from A to B, but how they represent the maze itself, which they then use as a basis for planning between A and B. The VGC model predicts that a subset of obstacles will be included in this construal. We think that a spotlight model is a good starting point for this work, because attention is being deployed across the whole maze stimulus, and then becomes attached to particular objects located in particular positions. This is a distinct process from that involved in navigating the path itself. Accordingly, our stimuli were designed such that task-relevant obstacles could be presented either proximally or distally to the optimal path (e.g., Figure 1a and Supplemental Figures S1-6). An obstacle that blocks any possible path on one side of the maze is task-relevant but located a long way from the optimal path. The results of Ho and colleagues’ (2022) third experiment demonstrate how task-relevant yet distal obstacles are better remembered than task-irrelevant proximal obstacles (see Figure 4 of Ho et al., 2022). We also observed that obstacles further away from the navigation path were often represented by participants (see Figures S1-6), which cannot be explained by curve tracing alone.
  
  While these results cannot definitively rule out the possibility that participants automatically trace the path while also construing the maze, they suggest that the value-guided construal process is an independent predictor of participants’ representations beyond proximity to the navigated path. To make this distinction clearer, we now cite the papers alluded to by the reviewer, in the Discussion on pages 28-29, while also acknowledging the potential for investigating attention during the navigation process itself:
  
  “Future work may also wish to examine the relevance of visuospatial attention for the navigation process itself in this task. While our present findings speak to how individuals perceive the maze while planning, it remains unclear how attention is deployed during navigation along a path, such as how object-based attention progressively spreads along trajectories in time and space(Pooresmaeili & Roelfsema, 2014; Wong & Scholl, 2024).”
  
  There is also one additional nuance to the current spotlight model that we were inspired to consider by the reviewer’s comment. This is the idea that attentional effects may spread within or along the obstacles themselves. We cannot explore this in the current data because we asked for awareness of the entire obstacles, not parts of obstacles, but it may be possible to explore this in future work, for instance, with eye tracking measures.
  
  More generally, the growth-cone (i.e., zoom lens) model of attention for curve tracing proposed by Roelfsema and colleagues shares considerable similarities with the spotlight of attention model. Both models argue for the grouping of spatially proximal items based on attention. While the growth-cone model argues for varying sizes of zoom lenses (i.e., receptive fields of neurons) that facilitate the tracing of proximal items, both models predict that spatially proximal items are preferentially processed together because of attention. Indeed, the spotlight model could model these varying zoom lenses by altering the width of the attentional spotlight dynamically across the visual scene based on the spatial proximity of obstacles. Following related comments by Reviewer 2, we now investigate inter-individual differences in the attentional spotlight of participants and observed that these differences significantly predict participants’ mental representations (see Attentional spotlight model of task representations). We have now updated the Discussion to include consideration of these alternative model frameworks:
  
  On page 27:
  
  “Second, in the current work we were unable to distinguish whether these attentional effects are driven by a fixed spotlight of attention, or whether attention operates akin to a zoom lens, shifting the ‘width’ of the focus of attention according to the task demands (Eriksen & St. James, 1986; Müller et al., 2003; Schad & Engbert, 2012). The latter view would be consistent with growth-cone models of attention in which the focus of attention expands and contracts in accordance with task demands, mirroring the various receptive field sizes in the visual hierarchy (Pooresmaeili et al., 2014; Pooresmaeili & Roelfsema, 2014). In partial support of this idea, we found significant inter-individual differences in the width of participants’ attentional spotlight (Figure S11). It is also possible that attention is deployed within or along parts of obstacles, rather than on entire obstacles. Future work using naturalistic measures of eye movements may be able to address these questions.”
  
  (3) Lateralization of attention:
  
  The analysis considers whether relevant information is distributed bilaterally or unilaterally across the visual display, but does not sufficiently address evidence for attentional asymmetries across the left and right visual fields due to hemispheric specialization (e.g., Bartolomeo & Seidel Malkinson, 2019). Whether effects differ for left versus right hemifield arrangements is not made explicit in the presented findings.
  
  We thank the reviewer for this suggestion. To address this point, we fitted a three-way interaction model between VGC model prediction, lateralization index, and side (left vs right hemifield). We did not find evidence for the three-way effect (β= 0.01, SE= 0.02, 95% CI [-0.03, 0.04], p = 0.738; ΔBIC = 58.30 in favour of the null effect; see table below), suggesting that the side to which participants lateralized their attention did not influence their task representations. This result is now reported on page 12:
  
  “This effect did not vary significantly as a function of the specific hemifield (i.e., left vs right) in which task-relevant information was presented (β= 0.01, SE= 0.02, 95% CI [-0.03, 0.04], p = 0.738; ΔBIC = 58.30 in favour of the null effect; see table S14).”
  
  We also explored inter-individual differences in participants’ tendency to lateralize their attention (see also the next point). We observed that participants tended to lateralize their attention slightly more to the right-hand side for non-lateralized maze stimuli, despite the normative sVGC model predicting that participants should not lateralize their attention for these stimuli (Figure 3c). These results may speak to potential asymmetries in lateralization, but given the exploratory nature of these analyses, they should be verified and replicated in future work.
  
  (4) Individual differences:
  
  Individual differences in attentional modulation are a strength of the work, but similar analyses exploring individual variation in lateralization effects could provide further insight, and the lack of such analyses may mask important effects.
  
  Thank you for this suggestion. In new analyses, we explored whether i) participants exhibited differences in their tendency to lateralize their awareness reports, and ii) whether the degree to which they tended to lateralize their awareness predicted their performance on a separate set of maze stimuli. In short, we observed substantial variation in participants’ tendency to lateralize their awareness (Figure S11) and found that this tendency reflected an inter-individual difference which was stable across maze types. We report these new findings on pages 14-16.
  
  “Inter-individual variation in lateralization of attention
  
  Next, we investigated participants’ tendency to pay attention to obstacles within a single hemifield (left vs right) regardless of the sVGC model predictions. To do so, we computed an awareness lateralization index (ALI) based on participants’ self-reported awareness reports of obstacles on each trial (Figure 3a). Large positive values indicate that participants were preferentially aware of the right hemifield, whereas negative values indicate preferential awareness of the left hemifield. Values close to zero indicate that participants paid attention to both hemifields equally (see Methods for details). We observed that participants’ tendency to lateralize their awareness varied greatly across the Ho datasets 1 and 2 (Figure 3b); some participants preferentially paid attention to a single hemifield, regardless of whether the sVGC model predictions were lateralized. For the dSC1 dataset, we observed that on some trials, participants significantly lateralized their awareness (|ALI| > 0.5; Figure 3c) even though the sVGC model predictions were non-lateralized. These findings suggest that participants’ tendency to pay attention to a single hemifield may represent an observable inter-individual difference in how they allocate their awareness to form task construals.”
  
  “To further explore these inter-individual differences, we tested whether participants’ tendencies to lateralize their attention to a single hemifield was consistent across trials and maze stimuli. We observed that participants’ tendency to lateralize their attention to a single hemifield was similar for left and right lateralized maze stimuli (Spearman ⍴= 0.72, Figure 3d). This suggests that participants who preferentially attended to a single hemifield did so regardless of which hemifield they should attend to. More consequentially, the tendency for participants to lateralize their awareness on maze stimuli whose model predictions were also lateralized linearly correlated with participants’ tendency to lateralize their attention on non-lateralized maze stimuli (Spearman ⍴= 0.88, Figure 3d). Taken together, these findings emphasize that some individuals tend to preferentially attend to a single hemifield when planning. This tendency, importantly, represents an inter-individual difference in how participants allocate their attention across various maze types.”
  
  (5) Distinction between overt and covert attention:
  
  The current report at times equates eye movement patterns with the locus of attention. However, attention can be covertly shifted without corresponding gaze changes (see, for example, Pooresmaeili & Roelfsema, 2014).
  
  We fully agree, and thank the reviewer for prompting further reflection on this distinction. In the online experiments run by Ho and colleagues (i.e., datasets Ho1 and Ho2), participants’ eye movements were not tracked, and therefore, they could not disambiguate whether participants were engaging in covert or overt attention to sample maze obstacles. In our third experiment (i.e., dataset dSC1), we both recorded eye movements and explicitly instructed participants to fixate centrally while viewing the maze. This ensured that participants oriented their attention only covertly during planning (see Figure S13-14).
  
  We now elaborate on this important distinction in the Results section of the manuscript, page 12:
  
  “In addition, we monitored participants’ eye movements in dataset dSC 1 to ensure that attention shifts would be covert as opposed to overt—a distinction which could not be determined in the online samples of datasets Ho 1 and 2.”
  
  On page 28:
  
  “Importantly, while the visuospatial attention effects observed in the Ho 1 and 2 datasets are likely driven by both covert and overt shifts in attention, the findings presented in experiment 3 (i.e., dSC1 dataset) rule out the contribution of overt shifts in attention through the use of eye tracking (see Figure S13-14)(Carrasco, 2011; Pooresmaeili & Roelfsema, 2014).”
  
  The implications for interpreting the relationship between eye movement, memory, and attention in this setting are not fully addressed. The potential dynamics of attention along a maze trajectory and their impact on lateralization analysis would benefit from further clarification.
  
  We thank the reviewer for urging more clarity here. The attentional dynamics we document in our study concern how people perceive / construe the maze itself, rather than how they deploy their attention to guide active navigation. We have now sought to make this distinction clear at a number of points in the paper. The core idea is that attention acts as an early filter to select which obstacles are part of a task construal, which then affects both awareness and memory.
  
  We have now clarified the focus of our study in the introduction on pages 5-7:
  
  “Our focus in this study was to examine how participants perceive and represent their environment (the maze stimulus). This is a distinct process to how participants orient their attention during navigation itself, which is not part of our current study. To do so, we harness classical signatures of attentional selection to characterise how visuospatial attention shapes awareness of maze obstacles during planning.” … “Our focus in the present study was to examine attentional effects on participants’ perception of the maze stimulus. We did not quantify how individuals deploy their attention in the phase in which they were navigating through the maze.”
  
  We did not explicitly test for memory effects in our new experiments, but Ho and colleagues demonstrated that the sVGC model predicted not only awareness reports, but also participants’ memory of obstacles (see Ho et al., 2022). Indeed, task representations computed from memory or awareness reports were strikingly similar in their experiments (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness). In relation to eye movements, we refer the reviewer back to our previous response, which details how eye movements were measured and controlled during maze construal.
  
  Figure 1 legend (b) --> (c)
  
  We have corrected this typo in the figure caption.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Castanheira et al. investigate the role of spatial attention for planning during three maze navigation experiments (one new experiment and two existing datasets). Effective planning in complex situations requires the construction of simplified representations of the task at hand. The authors find that these mental representations (as assessed by conscious awareness) of a given stimulus are influenced by (spatially) surrounding stimuli. Individual participants varied in the degree to which attention influenced their task representations, and this attentional effect correlated with the sparsity of representations (as measured by the range of awareness reports across all stimuli). Spatially grouping taskrelevant information on either the left or right side of the maze led to mental representations more similar to optimal representations predicted by the valueguided construal (VGC) model - a normative model describing a theoretical approach to simplifying complex task information. Finally, the authors propose an update to this model, incorporating an attentional spotlight component; the revised descriptive model predicts empirical task representations better than the original (normative) VGC model.
  
  Strengths:
  
  The novelty of this study lies in the proposal and investigation of a cognitive mechanism through which a normative model like value-guided construal can enable human planning. After proposing attention as this mechanism, the authors make concrete hypotheses about mismatches between the VGC predictions and real human behavior, which are experimentally validated. Thus, not only does this study describe a possible mechanism for simplification of task information for planning, but the authors also propose a descriptive model, revising VGC to incorporate this attentional component.
  
  A strength of this paper is the variety of investigative approaches: analysis of existing data, novel experiment, and a computational approach to predict experimental findings from a theoretical model. Analyzing pre-existing datasets increases the size of the participant cohort and strengthens the authors' conclusions. Meanwhile, comparing the predictions of the existing normative model and the authors' own refined model is a clever approach to substantiate their claims. In addition, the authors describe several crucial controls, which are key to the interpretability of their results. In particular, the eye tracking results were critical.
  
  In summary, this paper constitutes an important step toward a more complete understanding of the human ability to plan.
  
  We thank the Reviewer for their thoughtful and positive assessment of our findings. We also appreciate the constructive feedback on our methodology, which we believe has substantially improved our manuscript.
  
  Weaknesses:
  
  (1) There is a critical conceptual gap in the study and its interpretation, mainly due to the reliance on a self-report metric of awareness (rather than an objective measure of behavioral performance).
  
  a. Awareness is tested by a 9-point self-report scale. It is currently unclear why awareness of task-irrelevant obstacles in this task would necessarily compromise optimal planning. There is no indication of whether self-reported awareness affects performance (e.g., navigation path distance, time to complete the maze, number of errors). Such behavioral evidence of planning would be more compelling.
  
  We thank the reviewer for prompting further reflection on the connection between construal and navigation performance. We wish to emphasise that the primary focus of our study was on measuring and modeling participants’ task construals using perceptual awareness judgments, building on the methods developed by Ho and colleagues, rather than on navigation performance itself. However, as the reviewer points out, there is a natural relationship between construal and performance – if you represent the wrong obstacles, plans may be disrupted.
  
  To explore the relationship between task construals and performance on the navigation task we first regressed out the effects of the sVGC model on participants’ awareness reports and computed the mean squared residuals for each trial. We then used these values to predict participants’ navigation response times on each trial. We observed a significant negative relationship, suggesting that on trials where participants’ representations showed greater deviations from the normative model, they were in fact faster at navigating the mazes. This relationship was surprising, and at odds with the initial idea that adhering to normative VGC aids in task performance. However, we think that this direction of effect may make sense if one considers that a large part of the actual construal (rather than the normative prediction) in our data was in fact driven by effects such as lateralisation which are not accounted for by the sVGC model. If one is faster at harnessing inductive biases such as lateralisation, then one may be faster to complete the maze but also show a greater deviation from the predictions of the original model.
  
  To further explore these effects, we next focused on the distinction between lateralised and non-lateralised mazes. Here, we reasoned that the initial phase of lateralised attentional selection would lead to lateralised mazes being easier to navigate than nonlateralised ones. We conducted new analyses to determine whether participants navigated lateralized maze stimuli faster and with fewer moves than maze stimuli with non-lateralized model predictions. As detailed in Methods, we excluded trials in which participants significantly deviated from the optimal number of moves (9 or more moves) and took longer than 20 seconds to solve the maze. In line with our interpretation that attention operates as an inductive bias, participants were faster and deviated less from the optimal path on lateralized compared to non-lateralized mazes.
  
  We now report these new results on navigation performance on pages 20-21:
  
  “Maze navigation performance
  
  The previous analyses focused on participants’ task representations during planning. We next sought to explore links between participants’ task representations and maze navigation performance. Participants performed the maze navigation task near-ceiling: they solved 95% of maze stimuli in under 20 seconds, with minimal deviation from the optimal path (i.e., 9 moves or fewer). Notwithstanding this limited variance in task performance, we explored whether participants’ task construals may have impacted their navigation speed. To do so, we first regressed out the effects of the sVGC model from participants’ awareness reports and used the mean squared residuals for each trial to predict response times (see Methods for details). Surprisingly, we observed a negative relationship between mean squared residual variance and response times (β = -0.31, SE = 0.05, 95% CI [-0.41, -0.21], p < 0.001), indicating that participants were faster on trials where the sVGC model explained less variance in their awareness reports. In other words, trials in which participants deviated more from the sVGC model predictions were solved faster. We note that one reason for this may be the strong influence of the lateralisation effect on navigation performance (see paragraph below), which itself is not part of the sVGC model prediction.”
  
  “We then explored whether participant performance differed between lateralised and nonlateralised mazes. Here, we reasoned that the initial phase of lateralised attentional selection would lead to lateralised mazes being easier to navigate than non-lateralised ones. Consistent with this hypothesis, participants were faster (β = -0.04, SE = 5.91*10<sup>3</sup>, 95% CI [-0.06, -0.03], p< 0.001) and followed the optimal path more closely (β = -0.59, SE = 0.09, 95% CI [-0.78, -0.40], p< 0.001) when maze stimuli were more lateralized.”
  
  And in the Discussion section, on page 23:
  
  “Mental representations and task performance
  
  We observed that participants were faster and deviated less from the optimal path on maze stimuli that were lateralized. This effect is not predicted by the original sVGC model but dovetails with the interpretation that early visuospatial attention operates as an inductive bias to guide the formation of simplified task representations. Surprisingly, we also observed that participants were faster to navigate mazes on trials where their simplified task representation deviated from the sVGC model prediction. We interpret this seemingly contradictory finding in the following way: there are several factors beyond the sVGC model – including, for instance, maze lateralisation – that predict both construal and performance on the maze navigation task. Further work is needed to understand how inductive biases such as lateralisation shape both construal and performance, and the real-world benefits that such strategies might afford for naturalistic stimuli.”
  
  b. Relatedly, it would have been more convincing to have an objective measure of awareness, for instance, how the presence or absence of a "task-irrelevant" obstacle affects performance (e.g., change navigation path distance or time to complete the maze), or whether participants can accurately recall the location of obstacles.
  
  We thank the reviewer for prompting further reflection on the validity and robustness of our awareness measures. We emphasise however that our focus is not (primarily) on maze navigation performance, but on task construal, which as noted in our previous response may come apart from navigation performance for a variety of reasons. Our primary goal is to measure participants’ subjective awareness of the maze as a marker of their idiosyncratic (conscious) mental representation on each trial. In doing so, we build on a rich tradition of measuring subjective awareness in consciousness and perception science (for instance, work using the Perceptual Awareness Scale, or detection judgments). In this sense, we think our awareness scale (following Ho et al.) represents a valid and straightforward way of assessing our target psychological construct. However, we also agree with the Reviewer that convergent evidence from other measures is always valuable. In Ho and colleagues’ original paper, they developed a variant of the maze task where participants had to recall the location of obstacles, as well as rate their awareness (Exp 3) and a variant in which participants could hover their mouse over hidden obstacles in the maze to reveal their location – an online metric of attentional deployment (Exp 4). These data afforded us the opportunity to validate the awareness reports against an objective measure of recall, as suggested by the Reviewer. In reanalysing these data, we observed that the obstacle awareness and memory/hover measures were strikingly correlated within two independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness). These re-analyses are now reported on page 22 of our manuscript, to highlight the convergent validity of the awareness metric:
  
  “Finally, we examined the convergent validity of participants’ awareness reports by reanalyzing the memory recall data reported in Ho and colleagues’ experiment(Ho et al., 2022). We reasoned that participants should demonstrate similar task representations regardless of the measure used to probe the construal. In line with this prediction, we observed that the obstacle awareness reports and memory/hover measures were strikingly correlated within three independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness; see Tables S18 and S19).”
  
  c. Consequently, I'm not sure that we can conclude that the spatial context does impact participants' ability to plan spatial navigation or to "incorporate taskrelevant information into their construal". We know that the spatial context affects subjective (self-reported) awareness, but the authors do not present evidence that spatial context affects behavioral performance.
  
  Following the line of argument above, we think it’s important to separate out task construal (the simplified representation of the maze, measured by awareness reports), and the impact of this on navigation and other aspects of behaviour. The awareness reports (and other convergent measures) show that task-relevant information (as predicted by the VGC) is incorporated into the construal, a process which is modulated by spatial context. These are the key targets of our modeling. Whether this impacts performance is a distinct question, and one that we now address in our response to point a above.
  
  d. Another concern that may complicate interpretation is the following: Figure 3c shows improved VGC model predictions (steeper slope) for mazes with greater lateralization. However, there are notable outliers in these plots, where a high lateralization index does not correspond to good model performance. There is currently no discussion/explanation of these cases.
  
  The Reviewer astutely points out some outliers in our analysis. While on average lateralized maze stimuli are represented more closely to the sVGC model, there are indeed some noticeable outlier mazes. These mazes represent stimuli in which participants tended to lateralize their attention to the ‘wrong hemifield’—e.g., participants were more aware of obstacles in the right hemifield despite sVGC model predicting that obstacles on the left hemifield were task-relevant. We believe this explains the poor sVGC model fits on these trials. We note, however, that on average participants were capable of attending to the correct hemifield without explicit instructions (i.e., 9 out of 12 mazes).
  
  We have now included a discussion of these outliers in the results section of the paper on page 12:
  
  “We note that for three maze stimuli whose model predictions were lateralized there was nevertheless a poor fit to the sVGC model (see Figure 2c, right panel). These outliers correspond to maze stimuli where participants, on average, lateralized their attention to the incorrect hemifield (i.e., the opposite hemifield to that predicted by the sVGC model).”
  
  (2) I noticed an issue with clarity regarding task-relevance. It is currently not fully clear which obstacles are "task irrelevant". Also, the term is used inconsistently, sometimes conflating with "awareness". For example, in the "Attentional spotlight model of task representations" section, the authors state that "taskrelevant information becomes less relevant when surrounded by task-irrelevant information". But they really mean that participants become less aware of those task-relevant obstacles. I assume task-relevance is an objective characteristic related to maze organization, not to a participant's construal. Indeed, the following paragraph provides evidence of model predictions of awareness.
  
  We apologize for any confusion regarding the terminology of our manuscript. We indeed use the terms task-relevant and task-irrelevant to refer to obstacles that are objectively predicted by the normative sVGC model or the attentional spotlight model to be included in (>0.5) or excluded from (<0.5) task construals, respectively. This designation reflects the predictions from the computational model and does not reflect participants’ reported awareness. We then ran linear hierarchical models to predict participants’ awareness reports from these model predictions. The Reviewer is correct that the task-relevance of obstacles is indeed related to the maze’s organization, and not related to participants’ subjective reports of awareness. We have now clarified this point throughout the manuscript to better emphasize the difference between the model predictions of taskrelevance and participants’ subjective reports.
  
  On page 17:
  
  “To achieve this, we computed the predictions of the existing VGC model for each obstacle’s task relevance in a given maze, and averaged these predictions within an attentional spotlight of 3 squares (Figure 4a & S8, see Methods for details). This process yielded novel model predictions, whereby some obstacles which were once predicted as task-irrelevant by the normative sVGC are now predicted as task-relevant by the attentional spotlight model. We depict the effects of this spatial spotlight in Figure 4a: task-irrelevant stimuli (plotted in grey; see middle left obstacle) neighbouring taskrelevant obstacles (plotted in orange) become more task-relevant, whereas taskrelevant information becomes less relevant when surrounded by task-irrelevant information (see bottom right orange obstacle). This deviation in model predictions from the normative sVGC model was used to predict participants’ awareness reports. We hypothesized that this spotlight-VGC model would predict participants’ reports better than the original VGC model, which does not account for spatial attention.”
  
  (3) The behavioral paradigm has some distinct disadvantages, and the validity of the task is not backed up by behavioral data.
  
  a. I understand the need for central fixation, but it also makes the task less naturalistic.
  
  The fixation cross was required on every trial such that participants could maintain central fixation for our eye tracking experiment. While this design is less naturalistic, it allows us to examine the eye movements of participants. Requiring participants to fixate during the ‘planning’ phase of the experiment allowed us to isolate the effects of covert attention from changes in awareness due to overt shifts in attention. In other words, differences in participants’ awareness reports in the 3rd experiment cannot be explained by longer fixation times to specific obstacles.
  
  b. The task with its top-down grid view does not seem to mimic real human navigation. Though this grid may be similar to mental maps we form for navigation, the sensory stimuli corresponding to possible paths and to spatial context during real-life navigation are very different.
  
  We agree with the reviewer that while our task is engaging for participants and simple to follow, it does not mimic naturalistic navigation in humans. There is a natural tension in computational / experimental work in cognitive science in wanting to build closely on previous results and paradigms, while ensuring that results can generalise to real-world contexts. Here, our choice of paradigm and measures was closely built on previous papers using this task from Ho and colleagues (2022, 2023). While preparing this response, we learnt that the MIT group had also harnessed this same task to develop a novel dynamic variant of the VGC model (Chen et al., 2026) called the Just in Time model (JIT). The advantage of building on this prior work is that we are able to iteratively refine and expand the VGC approach, and (in our case) bring it into closer contact with work on modeling the deployment of spatial attention in human vision. The top-down aspect of the maze notably facilitated the study of the spatial deployment of attention. We now discuss the novel dynamic variant of the VGC model in our paper on page 27:
  
  “We close by reflecting on opportunities for further work in this area. First, an important next step is to explore the process by which task representations are formed, and how inductive biases might affect the process of task construal. The sVGC model is a normative model of the optimal task representation. Since it’s construction involves an exhaustive calculation over possible paths, it is not a plausible basis for a model of the psychological process by which participants actually construct task representations. More recently a process model of task construal has been proposed, the Just in Time model (JIT). The hypothesis of the JIT model is that participants’ task representations are built up over time by iteratively simulating possible paths through the maze, affording insight into the construal process (Chen et al., 2026). In future work, it would be of interest to ask whether the attentional effects we observe in our experiments could be meshed with a dynamic JIT account of construal. We speculate that visuospatial attention may operate as an early filter, limiting the space of potential construals based on coarse spatial features of the environment, constraining a dynamic selection of obstacles. Brain imaging techniques with high time resolution, such as M/EEG, may be able to shed further light on how task representations are formed as participants plan.”
  
  c. Behavioral performance is not reported, so it is unknown whether participants are able to properly complete the task. The task seems pretty difficult to navigate, especially when the obstacles disappear, and in combination with the central fixation.
  
  Behavioural performance is now reported in response to point 1a above.
  
  d. There is no discussion of whether/how this navigation task generalizes to other forms of planning.
  
  We fully agree that an important next step would be to generalise our results on construal to naturalistic forms of planning – for instance, using immersive VR mazes, and or investigating cognitive rather than perceptual construals. We have now added a line to this effect to the Discussion on page 28.
  
  “An important next step to further our understanding of task representations would be to extend the current paradigm to other forms of planning and more naturalistic tasks, such as navigating immersive virtual reality (VR) environments, planning over cognitive rather than perceptual representations (e.g. planning over an abstract space), or internallyguided planning based on working memory.”
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) There are, of course, benefits to simple tasks like the ones described, but it would be interesting to compare the results to a possible experiment in which a top-down grid/map is used for planning, but then task execution is carried out in a simulated environment corresponding to the map. Also, perhaps beyond the scope of the questions addressed in this paper, but I am curious how unexpected obstacles affect representations. For instance, if participants plan based on a topdown map and then begin "real" navigation but encounter an unexpected obstacle that was not indicated on the map, does this modulate representations/awareness of future obstacles (near vs. far)?
  
  We fully agree that all of these lines of investigation would be super interesting to pursue in future studies, and we have added a line to the discussion to that effect on page 28:
  
  “An important next step to further our understanding of task representations would be to extend the current paradigm to other forms of planning and more naturalistic tasks, such as navigating immersive virtual reality (VR) environments, planning over cognitive rather than perceptual representations (e.g.. planning over an abstract space), or internallyguided planning based on working memory.”
  
  (2) Regarding self-reported awareness as a metric, an additional experiment could ask participants to recreate the maze (identify locations of obstacles after they disappear). This would be a more objective measure of awareness.
  
  Yes indeed, and as described above, this was a metric used by Ho and colleagues in their previous experiment. As we describe in more detail above, the task representations obtained via memory or awareness reports demonstrated striking similarity (⍴ = 0.86).
  
  (3) What is meant by "all possible orientations of the maze" in this Methods sentence: "For dataset dSC 1, participants solved each of these 24 mazes four times (i.e., all possible orientations of the maze)"?
  
  We thank the Reviewer for prompting more clarity here. We vertically and horizontally reversed mazes (i.e., left-right flipped) such that participants could not predict the location of the goal or start location. In this way, each maze stimulus had four potential orientations. This resulted in 96 trials of 24 unique mazes. We have clarified this point in the Methods section on page 30:
  
  Maze stimuli were vertically and horizontally reversed (i.e., left-right flipped) such that participants could not predict the location of the start or goal location. This resulted in four potential orientations of each maze across all 24 mazes, 96 trials in total.
  
  (4) For lateralization, it was unclear until reading the Methods that the lateralization index was calculated using the VGC-predicted level of taskrelevance. From the main text and Figure 2, I assumed you were just counting the number of task-relevant obstacles on each side, rather than also quantifying relevance. I understood after reading the Methods, but this could be clarified further.
  
  We agree with the Reviewer that this was not evident from the text. We have now updated the Results section of the manuscript to clarify this point on page 11:
  
  “To test this hypothesis, we derived a measure of task-relevant lateralization inspired by the attention literature (Ghafari et al., 2024; Keefe & Störmer, 2021; Vollebregt et al., 2015) (Figure 2a). Specifically, we separated maze stimuli across the vertical meridian and computed the ratio of task-relevant information presented on the left versus right side derived from the sVGC model. For example, the maze shown in Figure 2a has twice the amount of task-relevant information presented in the left hemifield than in the right (lat. Index= 1/3). A lateralization index of 0.0 indicates that both hemifields contain equal amounts of task-relevant information (i.e., non-lateralized). The lateralization index was computed using the continuous VGC predictions for each obstacle (see Methods).”
  
  (5) The explanation in the Methods of how the width of the attentional spotlight was chosen references Figure 1b and Supplementary Figure S2, but it seems that Supplementary Figure S8 explains this more in the caption. Also, I don't see how Figure S2 supports this.
  
  We apologize for this typo. The explanation of how we selected the width of the attentional spotlight should indeed reference supplemental Figure 15 (previously Figure S8). We have now corrected this and elaborated on this choice in the Methods section on page 35:
  
  “We fixed the ‘width’ of the attentional spotlight to a distance of 3 squares based on the observation that the two neighbouring obstacles positively predicted the awareness of a probe. We observed that the mean and median distance between neighbouring obstacles of the 2nd rank (i.e., second closest) was 3 squares away for all mazes (Figure S15). We therefore opted to fix the value of the attention spotlight to 3 squares based on these observations. Future work utilizing this model should consider the statistics of their maze stimuli when deciding on the ‘width’ of the attentional spotlight.”
  
  (6) The attentional spotlight width was assumed to be 3 squares, based on the linear regression predictions of the effect of neighboring obstacles on stimulus awareness. Given the individual differences across participants, it would be interesting to choose a different attentional spotlight size for each participant. Would a participant-specific attentional spotlight width improve the predictions of the spotlight-VGC model?
  
  The Reviewer highlights a very interesting question: do individuals vary in terms of their attentional spotlight? To test this hypothesis, we first estimated the size of the attentional spotlight for each individual based on lateralized maze stimuli, and then used this to generate personalized attentional spotlight model predictions for each subject based on these values (Figure S11). We restricted this analysis to the dSC1 dataset, where we had substantially more trials (96 in total).
  
  In brief, we observed that indeed the personalized spotlight model fit participants’ awareness reports better than both a normative sVGC model and a group-level attentional spotlight model. We interpret these findings with some caution as i) a subset of individuals had flat attentional slopes and therefore were excluded from these analyses, and ii) we believe we require additional trials to ensure a robust model fit at the individual level. While our results are encouraging, we hope future investigations into inter-individual differences will extend these findings.
  
  We have included these additional analyses in the main text.
  
  On page 18:
  
  “To further explore inter-individual differences in task construal, we tested whether adjusting the attentional spotlight width to each participant’s awareness reports improved the predictions of the attentional spotlight model. To do so, we first determined the width attentional spotlight of each individual in the dSC1 dataset based on lateralized maze stimuli. We then generated person-specific attentional spotlight model predictions for the non-lateralized maze stimuli to avoid overfitting the data (Figure S11). We note that 7 participants had either flat attentional slopes or negative beta coefficients, which prevented the selection of an appropriate attentional spotlight width (see Methods for details). We observed a significant improvement in model fit for the person-specific attentional spotlight model relative to both the group-level attentional spotlight model (ΔBIC= -1487.39) and the normative sVGC model (ΔBIC= -1655.29). While the limited trial numbers per participant in our current dataset warrants caution in interpreting these findings, these findings do encourage further research on inter-individual differences in attentional deployment during planning.”
  
  On pages 23-24:
  
  “Inter-individual differences in attention
  
  We also observed considerable inter-individual differences in attentional effects across participants (Figure 1c). While some participants were strongly influenced by the spatial context of neighbouring stimuli, others showed more limited evidence for an attentional effect (Figure 1b). Inter-individual differences in attention predicted the sparsity of participants’ simplified representations: participants with larger attention effects exhibited sparser representations. Moreover, these inter-individual differences in effects of spatial proximity could be incorporated into the attentional spotlight model by varying the width of the spotlight, resulting in better model predictions.”
  
  “Beyond these spatial proximity effects, we also observed that participants varied in their tendency to lateralize their attention to a single hemifield (Figure 3). This tendency was observed across all three datasets, including on maze stimuli whose value-guided model predictions were not lateralized. This suggests that although a strategy of allocating attention is sub-optimal for these maze stimuli, some individuals preferentially attend to a single hemifield in a heuristic-like fashion. This tendency to attend to a single hemifield was a robust inter-individual difference across maze stimuli (Figure 3d), and dovetails with individual-level variation in spatial proximity effects. Taken together, these findings offer novel insights into how people vary in the ways they allocate spatial attention to solve complex problems. Future research could explore how these individual differences constrain performance on other tasks that require planning and search in highdimensional spaces.”
  
  On page 17 of the Supplemental Materials:
  
  (7) The supplementary text about lateralization effects, above Supplementary Table S8, references Table S6, but it is Table S6 does not seem to display lateralization results.
  
  We thank the Reviewer for pointing out this typo: we now refer to the correct supplementary table (S9).
  
  (8) Why does it matter that "the maze stimuli were not designed to test horizontalmeridian lateralization effects"? What is the effect on power? Is it because there is not a good enough range in lateralization indices? It would be good to clarify, or just remove that explanation, since the cortical retinotopy explanation seems more convincing.
  
  We did not specifically design the maze stimuli such that there is an equal number of obstacles above and below the horizontal meridian. As such, the lateralization index derived along the horizontal meridian does not control for the number of obstacles in each hemifield, which may influence participants’ awareness reports. In contrast, we designed maze stimuli such that this would not be a concern for the vertical meridian. We have clarified this point in the discussion on page 27.
  
  “Third, while we observed clear lateralization effects along the vertical meridian (i.e., left vs right hemifield), effects along the horizontal meridian were less clear (i.e., above vs below; see Table S15-16). One potential explanation of this asymmetry is the retinotopic organization of the cortex, in which spatially adjacent stimuli can be retinotopically distant if presented on the opposite side of the vertical (but not horizontal) meridian, facilitating distractor inhibition. Importantly, while the visuospatial attention effects observed in the Ho 1 and 2 datasets are likely driven by both covert and overt shifts in attention, the findings presented in experiment 3 (i.e., dSC1 dataset) rule out the contribution of overt shifts in attention through the use of eye tracking (see Figure S13-14)(Carrasco, 2011; Pooresmaeili & Roelfsema, 2014).”
  
  (9) For Figure 2c, it would be helpful to directly state what each dot and line mean.
  
  We updated the caption of Figure 2c to clarify what we are plotting: each point represents an obstacle, and each line the linear fit for a maze stimulus.
  
  “Each point represents an obstacle in a maze, and each line represents the model fit for that specific maze stimulus.”
  
  (10) Figures and wording imply there is only a single probe obstacle per trial, but methods and model imply that participants are asked to report awareness for every obstacle. This should be clarified.
  
  We apologize for any confusion regarding the methodology of our study. The Reviewer is correct that participants reported their awareness of every obstacle presented on a given trial. We have clarified this in the Results section of the manuscript on page 7:
  
  “Note, participants reported their awareness of every obstacle presented on a given trial.”
  
  We have also updated the caption of Figure 1 to clarify this point:
  
  “Once participants finished navigating the maze, they were asked to report their awareness of every obstacle presented on a given trial in a random order.”
  
  (11) What is the reason for the exclusion of participants (33 for experiment 1 and 26 for experiment 2)?
  
  Participants were excluded from the Ho et al. datasets 1 and 2 based on their preregistered exclusion criteria, as detailed in the Methods section of their paper. In short, trials were excluded if participants took longer than 20 seconds to complete the trial, or if they spent longer than 5 seconds in the initial state. Participants were excluded if less than 80% of trials remained after reaction time exclusions or if they failed 2 out of 3 comprehension checks. We have elaborated on this point in the Methods section on page 31.
  
  “Participants were excluded from analyses based on pre-registered exclusion criteria as detailed in (Ho et al., 2022). In short, participants were excluded if 20% or more of their trials were removed based on reaction times, or if they failed 2 out of 3 comprehension checks.”
  
  (12) The supplemental figures are not referenced in order, and some are not referenced at all; this should be fixed.
  
  We thank the Reviewer for pointing this out and have reorganized our Supplementary materials accordingly.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors build on a recent computational model of planning, the "value-guided construal" framework by Ho et al. (2022), which proposes that people plan by constructing simple models of a task, such as by attending to a subset of obstacles in a maze. They analyze both published experimental data and new experimental data from a task in which participants report attention to objects in mazes. The authors find that attention to objects is affected by spatial proximity to other objects (i.e., attentional overspill) as well as whether relevant objects are lateralized to the same hemifield. To account for these results, the authors propose a "spotlight-VGC" model, in which, after calculating attention scores based on the original VGC model, attention to objects is enhanced based on distance. They find that this model better explains participant responses when objects are lateralized to different hemifields. These results demonstrate complex interactions between filtering of task-relevant information and more classical signatures of attentional selection.
  
  Strengths:
  
  (1) The paper builds on existing modeling work in a novel manner and integrates classic results on attention into the computational framework.
  
  (2) The authors report new and extensive analyses of existing data that shed light on additional sources of systematic variability in responses related to attentional spillover effects
  
  (3) They collect new data using new stimuli in the original paradigm that directly test predictions related to the lateralization of task-relevant information, including eye tracking data that allows them to control for possible confounds.
  
  (4) The extended model (spotlight-VGC) provides a formal account of these new results.
  
  We thank the Reviewer for their positive assessment of our manuscript and their insightful comments, which has improved the clarity of our findings.
  
  Weaknesses:
  
  (1) The spotlight-VGC model has a free parameter - the "width" of the attentional spotlight. This seems to have been fixed to be 3 squares. It would be good if the authors could describe a more principled procedure for selecting the width so that others can use the model in other contexts.
  
  Our choice for this parameter was informed by the spatial effects reported in Figure 1b. We observed that the two closest neighbouring obstacles to a probe had similar awareness (i.e., positive beta weights). We therefore compute the mean and median distances between obstacle pairs that were the second closest obstacle to a probe. This distance was 3 squares away, as depicted in Figure S15. We fixed the width of the attentional spotlight across all studies based on this observation. We agree that future research utilizing this model may need to tune this hyperparameter depending on the mean distance between a probe and its neighbours.
  
  We have clarified this point in the methods section on page 35:
  
  “We fixed the ‘width’ of the attentional spotlight to a distance of 3 squares based on the observation that the two neighbouring obstacles positively predicted the awareness of a probe. We observed that the mean and median distance between neighbouring obstacles of the 2nd rank (i.e., second closest) was 3 squares away for all mazes (Figure S15). We therefore opted to fix the value of the attention spotlight to 3 squares based on these observations. Future work utilizing this model should consider the statistics of their maze stimuli when deciding on the ‘width’ of the attentional spotlight.”
  
  Following the suggestion of Reviewer 2 point 6, we now also explored inter-individual differences in this parameter. To do so, we first used the lateralized mazes in the dSC1 dataset to determine the optimal width of the attentional spotlight for each individual.
  
  Then, we used this spotlight to derive model predictions for each person. We observed that these personalized attentional spotlight model predictions fit participants’ awareness reports on non-lateralized mazes better than the fixed-width spotlight model. We believe this preliminary result suggests the importance of modelling inter-individual differences in attentional deployment during planning. We report these effects on page 17.
  
  (2) Have the authors considered other ways in which factors such as attentional spillover and lateralization could be incorporated into the model? The spotlightVGC model, as presented, involves first computing VGC predictions and only afterwards computing spillover. This seems psychologically implausible, since it supposes that the "optimal" representation is first formed and then it gets corrupted. Is there a way to integrate these biases directly into the VGC framework, perhaps as a prior on construals? The authors gesture towards this when they talk about "inductive biases", but this is not formalized.
  
  We thank the reviewer for bringing up this very important point. We think that a full computational treatment of the inductive bias would be a distinct project, but now seek to expand our discussion on the mechanisms by which representations could be formed. In this context, we specifically highlight novel computational work from the MIT group that was published as a preprint in the time since we submitted our paper, and which proposes a new process account of construal, the “Just in Time” (JIT) model. We also elaborate on a possible mechanism by which visuospatial attention may aid the dynamics of the construal process. In short, we agree with the reviewer that spatial attention may bias individuals to search over a subset of potential representations based on low-level spatial characteristics of the obstacles (e.g., their spatial spread in the visual field), prior to (or in concert with) a dynamic JIT-like selection process. We now elaborate on these possibilities on pages 27-28:
  
  “We close by reflecting on opportunities for further work in this area. First, an important next step is to explore the process by which task representations are formed, and how inductive biases might affect the process of task construal. The sVGC model is a normative model of the optimal task representation. Since it’s construction involves an exhaustive calculation over possible paths, it is not a plausible basis for a model of the psychological process by which participants actually construct task representations. More recently a process model of task construal has been proposed, the Just in Time model (JIT). The hypothesis of the JIT model is that participants’ task representations are built up over time by iteratively simulating possible paths through the maze, affording insight into the construal process (Chen et al., 2026). In future work, it would be of interest to ask whether the attentional effects we observe in our experiments could be meshed with a dynamic JIT account of construal. We speculate that visuospatial attention may operate as an early filter, limiting the space of potential construals based on coarse spatial features of the environment, constraining a dynamic selection of obstacles. Brain imaging techniques with high time resolution, such as M/EEG, may be able to shed further light on how task representations are formed as participants plan.”
  
  […]
  
  “Fourth, it will also be necessary to elaborate on how bottom-up and top-down aspects of attentional selection are combined to guide complex task representations and plans. Foundational questions remain unanswered, for instance: can multiple spatial locations be preferentially selected at once, i.e. are there multiple spotlights (Awh & Pashler, 2000; McMains & Somers, 2004; Pylyshyn & Storm, 1988; Shaw & Shaw, 1977)? There is also discourse on how spatial attention may move from one location to another: are the intervening visual regions between attended locations similarly selected (Dubois et al., 2009; Kr & Np, 1999; McMains & Somers, 2004, 2005)? Our findings tentatively suggest that individuals are able to attend to disparate spatial regions to form sparse task representations, yet there is substantial variability in how individuals orient their attention during the task. The present paradigm and computational modelling, in conjunction with carefully designed stimuli, may help resolve these outstanding questions.”
  
  (3) Can the authors rule out that the lateralization effects are the result of memory biases since the main measure used is a self-report of attention?
  
  We thank the reviewer for bringing up this important point. In our experiments, we sought to measure participants’ subjective awareness of the maze stimuli as a readout of their conscious task representation on each trial. This approach marries an extensive literature on measures of perceptual awareness in consciousness science (e.g., using the Perceptual Awareness Scale) with computational models of planning. Participants’ memory of (their awareness of) the obstacles is inherent to this approach, but just as with similar approaches in consciousness science (e.g. measures of iconic memory in the Sperling paradigm), we think it provides a reasonably “online” measure of awareness. It’s important of course to ensure that results obtained with awareness reports are not idiosyncratic, and generalise to other approaches to quantifying task representations.
  
  To further bolster the convergent validity of our awareness measure, we reanalyzed the data from Ho and colleagues. In their original paper, they developed a variant of the maze-navigation task where participants were asked to recall the location of obstacles as well as report their awareness (Exp 3) and a third variant of the task where participants could hover their cursors over hidden obstacles to reveal their locations (Exp 4). These data allowed us to validate the awareness reports against objective measures of recall and mouse-tracking data. We observed that the subjective awareness reports of participants were strikingly correlated with recall/hover measures across two independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness). We believe these findings validate participants’ awareness reports. These findings are now reported on page 22 of the manuscript.
  
  “Finally, we examined the convergent validity of participants’ awareness reports by reanalyzing the memory recall data reported in Ho and colleagues’ experiment (Ho et al., 2022). We reasoned that participants should demonstrate similar task representations regardless of the measure used to probe the construal. In line with this prediction, we observed that the obstacle awareness reports and memory/hover measures were strikingly correlated within three independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness; see Tables S18 and S19).”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

elifesciences.org/reviewed-preprints/108034v1
www.biorxiv.org www.biorxiv.org

Biophysically inspired mean-field model of neuronal populations driven by ion exchange mechanisms

1
1. Public_Reviews 27 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the current reviews.
  
  Reviewer #1 (Public review)
  
  Summary:
  
  In this manuscript the authors derive a mean-field model for a network of Hodgkin-Huxley neurons retaining the equations for ion exchange between the intracellular and extracellular space.
  
  The mean-field model derived in this work relies on approximations and heuristic arguments that, on the one hand, allow a closed-form derivation of the mean-field equations, and on the other hand restrict its validity to a limited regime of activity corresponding to quasi-synchronous neuronal populations. Therefore, rather than an exact mean-field representation, the model provides a description of a mesoscopic population of connected neurons driven by ion exchange dynamics.
  
  Strengths:
  
  The idea of deriving a mean-field model which relates the slow-timescale biophysical mechanism of ion exchange and transportation in the brain to the fast-timescale electrical activities of large neuronal ensembles.
  
  Weaknesses:
  
  The idea underlying this work is not completely implemented in practice.
  
  The derived mean field model do not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes. The agreement with the in vitro experiment is hardly evident, both for the mean-field model and for the network model. The assumptions made to derive the closed-form equations of the mean field model have not been justified by any biological reason, they just allow for the mathematical derivation. The final form of the mean-field equations do not clarify whether or not microscopic variables are used together with macroscopic variables in an inconsistent mixture.
  
  Comments on revisions:
  
  The main weaknesses I listed in the first report are still present, since the authors did not answer my questions on a solid basis. I report the list for completeness:
  
  (1) It seems that the reduction methodology that is employed is not the most suitable one for the single-neuron model they are considering.
  
  (2) The formulation of the mean-field derivation is unnecessarily complicated. It could be heavily simplified by following previously published approaches to derive biologically realistic neural masses.
  
  (3) The model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.
  
  Therefore, my statement remains unchanged.
  
  Reviewer #2 (Public review)
  
  Summary:
  
  The authors aiming in developing a neural mass model characterized by few collective variables mimicking the dynamics of a network of Hodgkin - Huxley neurons encompassing ion-exchange mechanisms. They describe in details the derivation of the mean-field model , then they compare experimental results obtained for the hippocampus of a mice with the neural network simulations and the mean-field results. Furthermore, they report a bifurcation analysis of the developed model and simulation of a small network containing various coupled neural masses, somehow moving towards the simulation of an entire connectome.
  
  Strengths:
  
  The author attempts to develop a mean-field model for a globally coupled network of heterogeneous Hodgkin-Huxley neurons with explicit ion exchange mechanism between the cell interior and exterior.
  
  Weaknesses:
  
  (1) They do not employ the reduction methodology more suited for the single neuron model they consider.
  
  (2) Their derivation of the neural mass model is based on several assumptions, and not all well justified.
  
  (3) Their formulation of the mean-field derivation is unnecessary complicated, it can be strongly simplified by following previously published approaches to derive biologically realistic neural masses.
  
  (4) Their model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.
  
  General Statements:
  
  The authors honestly declared the many limitations of their approach, once assumed this the results of the mean-field are somehow inconsistent with the neural network simulations as expected.
  
  The authors suggest to employ this model for the simulations on the whole connectome to follow seizure propagation, however I believe that a simpler model, as the Epileptor, remains superior in this respect to this model. That indeed includes biophysical parameters but their correspondence with the ones employed in the network dynamics remain elusive, due to the many assumptions required to derive this mean field model. Furthermore it is more complicated than the Epileptor, I do not think that the present model will be largely employed by the community.
  
  Comments on revisions:
  
  The authors have corrected mistakes present in the manuscript and put a correct list of references.
  
  However, they refuse
  
  (1) To simplify the formulation of the model, the model contains unnecessary complications, as I have clearly written in my report, the authors agree, but they do not want to change the formulation;
  
  (2) To derive the mean field model in a simpler way, as possible, and as I asked many times in my Referee report, this would help the readers to understand the important aspect of the derivation, without not needed and confusing complicated formulations;
  
  (3) To compare direct simulations of the network with neural mass results in sub-section "Bifurcation analysis: emergent network states and multistability" to show bistability, as I asked.
  
  As a matter of fact the performed modifications do not solve my previous doubts on the validity of the results reported in the manuscript.
  
  Therefore, my previous assessments remain valid.
  
  We thank the editors and the two reviewers for their continued engagement with our manuscript. The three weaknesses retained from the first round are essentially identical between the two public reviews:
  
  (i) The reduction methodology is not the most suitable for the single-neuron model we consider;
  
  (ii) The mean-field derivation is unnecessarily complicated;
  
  (iii) The model works only in highly synchronous regimes and does not reproduce the asynchronous evolution typical of neural circuits.
  
  Both reviewers explicitly note that their assessments remain unchanged and we have decided not to alter the formulation of the model. We use this response to state—on the public record—exactly where we agree with the reviewers, where we disagree, and why.
  
  On point (i): the reduction methodology.
  
  We fully agree with the reviewers' technical observation: the Ott–Antonsen / Lorentzian-ansatz reduction in the form introduced by Montbrió, Pazó and Roxin (2015) is exact for canonical Type I neurons (QIF), whose membrane-potential equation is quadratic, and is not directly applicable to a Type II / Hodgkin–Huxley-type neuron whose voltage dynamics is cubic-like. On this point there is no disagreement.
  
  Where we differ is in the conclusion the reviewers draw from this observation. The reviewers read our work as applying an inappropriate reduction methodology to an inappropriate neuron model. We instead positioned our work, from the outset, as an extension of that methodology: we keep the biophysically detailed Hodgkin–Huxley substrate (because it is the only level at which extracellular ion concentrations, depolarization block, bursting and seizure-like events are biophysically grounded), and we adapt the reduction by approximating the cubic voltage nullcline as a piece-wise quadratic with two parabolas of opposite curvature. This is explicitly an approximate, not exact, mean-field. The Lorentzian ansatz is then applied on each branch of the piece-wise quadratic, with the limitations of this construction analyzed in the manuscript.
  
  The reviewers' alternative—starting from a Type I canonical model and grafting on biophysical features—would indeed yield an exact mean-field, but it would forfeit precisely what motivates our work: a tractable mesoscopic description in which the slow variables are physiologically interpretable ion concentrations rather than phenomenological parameters. The trade-off is that we give up exact rigour in order to construct a bridge between the Montbrió-style next-generation neural mass models on one side and the Epileptor on the other, with the additional benefit that the parameters of the resulting neural mass retain a biophysical correspondence (e.g., [K<sup>+</sup>]_bath, Δ[K<sup>+</sup>]_int, [K<sup>+</sup>]_g, the gating variable n) that the Epileptor does not afford.
  
  We therefore respectfully maintain our position: the methodology is not "the wrong reduction for a Type II neuron"; it is an extended reduction designed to be applicable beyond the Type I case, with explicitly characterized validity.
  
  On point (ii): the formulation is unnecessarily complicated.
  
  We agree with the reviewers that, given the assumptions we ultimately adopt, namely that the gating variable n and the potassium concentrations Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are treated as collective (mesoscopic) variables shared by the population, with n a function of the average membrane potential, the closed neural mass equations could be reached by the more direct path used by Guerreiro et al. (2022) and the related literature (R1–R7). In the revised manuscript we now state this explicitly, and we note that the same five-dimensional system arises under either derivation.
  
  Our choice to follow Chen and Campbell (2022) is motivated by the fact that it makes each approximation visible at the point where it is invoked. In particular, it exposes the moment-closure step (Eq. 19), the vanishing-flux boundary condition (Eq. 28), and the locations where microscopic and mesoscopic variables enter the description. We believe that for a reader trying to extend our framework, for instance to a setting with partial heterogeneity in the slow variables, or with stochastic gating, this is the more useful presentation. We have added a remark stating that the simpler Guerreiro-type derivation reaches the same equations under our assumptions, so that readers can take whichever route they find clearer.
  
  On point (iii): the model only works in highly synchronous regimes.
  
  Here we partially agree and partially disagree, and we would like the partial disagreement to appear on the public record.
  
  We agree that the Lorentzian ansatz is, strictly, valid in regimes where the population's membrane potential distribution is unimodal, that is, when essentially all neurons sit on the same side of the threshold V*. Where we disagree is with the implication that the mean-field model fails outside the strongly synchronous regime. The supplementary analysis in Fig. S2, added in the previous round, quantifies the error introduced by the first-moment approximation of n as a collective variable across the full range of [K<sup>+</sup>]_bath values, spanning quiescent, bursting, seizure-like, sustained ictal and depolarization-block dynamics. The fraction of neurons whose gating variable deviates from the population mean is below 2% for the parameters used throughout the manuscript, and the error becomes appreciable only during the brief transitions between sub- and supra-threshold states. These are precisely the moments at which the population is genuinely bimodal and the single-Lorentzian assumption is theoretically expected to leak. In other words, the error peaks coincide with the moments where our derivation tells us in advance that the assumption is locally invalid; the model "knows where it fails." Away from these transitions, the mean-field tracks the population average across all dynamical regimes shown in Fig. 3, not only in the most strongly synchronized ones.
  
  This is, in our view, the strongest argument we can make: we are not claiming exactness, and we are not unaware of the limitations. We have characterized them analytically (the construction of the piece-wise Lorentzian, and the theoretical reason a closed solution exists only when the two branches collapse onto one), and we have characterized them numerically (Fig. S2). The deviations are bounded, their location in parameter space is well identified, and they coincide with transitions where the underlying assumption is locally violated. We believe this constitutes a controlled approximation rather than an uncontrolled one, and we would like this distinction to be visible to readers of the Reviewed Preprint.
  
  We note, in this connection, that the reviewers' preferred reference point, the next-generation neural mass model of Montbrió et al. (2015), which is exact and one-to-one with its underlying network, is exact precisely because the underlying network is a network of QIF neurons. The corresponding statement for a network of Hodgkin–Huxley-type neurons with explicit ion exchange does not, to our knowledge, exist in closed form, and may not exist at all. The relevant question is therefore not whether our model matches the exactness of the QIF case, but whether the controlled approximation we provide is useful. Given the qualitative agreement with neural-network simulations across the full range of [K<sup>+</sup>]_bath, the qualitative agreement with the in vitro recordings, and the recovery of the expected bifurcation structure with new emergent regimes, we believe the answer is yes.
  
  Other outstanding points in the review.
  
  Reviewer 2 reiterates the view that the Epileptor remains superior for whole-connectome seizure-propagation simulations because it is simpler and better characterized. We do not dispute that the Epileptor is more thoroughly analyzed and more parsimonious. The complementarity we propose is not a replacement but a parameter-grounding, as the Epileptor's phenomenological parameters (excitability, slow permittivity) acquire, in the present framework, an interpretation in terms of measurable biophysical quantities (extracellular potassium, intracellular potassium variation, glial buffering).
  
  We thank the reviewers and editors once again for their careful reading, and we are grateful that the points of disagreement have been sharpened to a state where readers can judge them transparently.
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, the authors derive a mean-field model for a network of Hodgkin-Huxley neurons retaining the equations for ion exchange between the intracellular and extracellular space.
  
  The mean-field model derived in this work relies on approximations and heuristic arguments that, on the one hand, allow a closed-form derivation of the mean-field equations, and on the other hand restrict its validity to a limited regime of activity corresponding to quasi-synchronous neuronal populations. Therefore, rather than an exact mean-field representation, the model provides a description of a mesoscopic population of connected neurons driven by ion exchange dynamics.
  
  We agree with the reviewer's characterization. Our manuscript describes the derivation as relying on "approximations and heuristic arguments" and states that "the derivation is not exact"; what we provide is a controlled, approximate mesoscopic description in which the slow variables are physiologically interpretable ion concentrations rather than phenomenological parameters. An exact closed-form thermodynamic limit is, to our knowledge, available only for canonical Type I (QIF) networks (Montbrió, Pazó and Roxin, 2015) and a few of their extensions; it is not currently known for a Hodgkin–Huxley-type network with explicit ion-exchange dynamics. We acknowledge that the original description of the regime of validity may have caused confusion on this point, and in the revised manuscript we have therefore replaced the looser formulation "strongly synchronous regimes" by the more accurate "regimes where the membrane-potential distribution is unimodal and can be reasonably approximated by a Lorentzian" throughout the manuscript.
  
  Strengths:
  
  The idea of deriving a mean-field model that relates the slow-timescale biophysical mechanism of ion exchange and transportation in the brain to the fast-timescale electrical activities of large neuronal ensembles.
  
  We thank the reviewer for recognizing the motivation behind our work. This explicit coupling between slow biophysical ion dynamics and fast electrical activity is precisely the feature we tried to preserve in the reduction, even at the cost of giving up exactness.
  
  Weaknesses:
  
  The idea underlying this work is not completely implemented in practice.
  
  We address this general statement through the four specific sub-points the reviewer raises in the paragraph that follows.
  
  The derived mean field model does not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes.
  
  We partially agree and partially disagree. We agree that the Lorentzian ansatz is strictly valid where the membrane-potential distribution is unimodal, i.e. when essentially all neurons sit on the same side of the threshold V*. We disagree with the implication that the mean-field fails outside this regime. To make this claim quantitative, we added a new supplementary figure (Fig. S2) that quantifies the deviation of individual neurons' gating variables from the population mean across the full range of [K<sup>+</sup>]_bath values—quiescent, bursting, seizure-like, sustained ictal and depolarization-block dynamics. The fraction of deviating neurons is below 2% for the parameters used in the manuscript, with localized peaks only during the brief, genuinely bimodal transitions between sub- and supra-threshold states—precisely the moments at which the theory predicts the assumption to be locally invalid. Away from these transitions, the mean-field tracks the population average across all dynamical regimes shown in Fig. 3, not only in the strongly synchronized ones.
  
  The agreement with the in vitro experiment is hardly evident, both for the mean-field model and for the network model.
  
  We acknowledge that the experimental and simulated traces in the original Fig. 4 did not match quantitatively; this was never our intention. The figure and its caption have been reorganized in the revised manuscript to frame the comparison as qualitative: we aim to demonstrate the shared structure i.e., the slow modulation of fast population activity by extracellular potassium fluctuations, rather than to claim a quantitative fit.
  
  We also added two clarifications that account for the residual differences: (i) the network simulations were intentionally run with rescaled biophysical parameters (membrane capacitance, gating time constants) to keep the computational cost feasible, a standard practice when the goal is to validate dynamical mechanisms rather than absolute timescales; (ii) the in vitro LFP recordings were AC-coupled, so the slow DC components visible in the mean-field traces are filtered out at acquisition.
  
  The assumptions made to derive the closed-form equations of the mean-field model have not been justified by any biological reason, they just allow for the mathematical derivation.
  
  We agree that the modelling assumptions were scattered through the original derivation. In the revised manuscript, the three core assumptions are stated explicitly at the point of derivation: (i) the gating variable n is treated as a collective, population-averaged variable; (ii) the potassium concentrations Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are homogeneous across the population, biophysically justified by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforces near-instantaneous equilibration at the mesoscopic scale; (iii) no heterogeneity is assumed at the level of ion dynamics. The meaning of "locally homogeneous" is now defined explicitly.
  
  On the biophysical motivation of the in vitro perturbation used in the experiment, we have added a new Methods subsection that explains how low extracellular Mg<sup>2+</sup> unblocks NMDARs and abolishes the divalent-cation stabilisation of the resting membrane potential, depolarising hippocampal neurons and increasing the driving force for outward K<sup>+</sup> currents. This provides a biophysical link between the experimental perturbation and the model's main control parameter, the extracellular potassium concentration. We also added a reference to the well-established model of epileptic discharges that underpins the experiment.
  
  The final form of the mean-field equations does not clarify whether or not microscopic variables are used together with macroscopic variables in an inconsistent mixture.
  
  We now explicitly acknowledge that in the spiking-network simulations the gating variable n is microscopic (each neuron has its own n_i), whereas in the mean-field derivation it is treated as mesoscopic and shared by the population. This asymmetry between modalities is discussed both in the Results and in the Limitations sections, and is identified as a likely source of some of the discrepancy between the two modalities.
  
  We have also made the notation in Eqs. (36)–(37) consistent (firing rate r used throughout, full current-based dV/dt̄ restored) and fixed the typos and broken equation/reference labels that contributed to the impression of inconsistency (Eqs. 18, 28, 29; the Fig. 2(c) [K<sup>+</sup>] bath label; the lost reference at line 696).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors aim to develop a neural mass model characterized by a few collective variables mimicking the dynamics of a network of Hodgkin – Huxley neurons encompassing ion-exchange mechanisms. They describe in detail the derivation of the mean-field model, then they compare experimental results obtained for the hippocampus of a mouse with the neural network simulations and the mean-field results. Furthermore, they report a bifurcation analysis of the developed model and simulation of a small network containing various coupled neural masses, somehow moving towards the simulation of an entire connectome.
  
  We thank the reviewer for the accurate summary of the manuscript's structure and aims.
  
  Strengths:
  
  The author attempts to develop a mean-field model for a globally coupled network of heterogeneous Hodgkin-Huxley neurons with an explicit ion exchange mechanism between the cell interior and exterior.
  
  We thank the reviewer for recognizing this objective. The retention of Hodgkin–Huxley dynamics with explicit ion exchange is precisely the feature that distinguishes our framework from QIF-based reductions, and it is what enables the slow variables of the resulting mean-field to retain a direct biophysical interpretation.
  
  Weaknesses:
  
  (1) It seems that the reduction methodology that is employed is not the most suitable one for the single-neuron model they are considering.
  
  We agree, on technical grounds, with the observation: the Ott–Antonsen / Lorentzian-ansatz reduction is exact for canonical Type I neurons (QIF) and is not directly applicable to a Type II Hodgkin–Huxley-type neuron with a cubic-like voltage nullcline. Where we differ is in the conclusion. We did not apply an inappropriate reduction to an inappropriate neuron; we deliberately extended the methodology by approximating the cubic nullcline as a piece-wise quadratic with two parabolas of opposite curvature, and then applying the Lorentzian ansatz on each branch. The result is an explicitly approximate, biophysically grounded mean-field, with its regime of validity stated and quantified (Fig. S2).
  
  To make this positioning explicit, we have added a paragraph to the Introduction that situates our work within the next-generation neural mass literature (Byrne et al. 2020; Montbrió, Pazó & Roxin 2015; Guerreiro et al. 2022; Forrester et al. 2024; Perl et al. 2023; Gerster et al. 2021; and works on short-term plasticity, adaptation, conductance-based reductions,
  
  spike-timing-dependent plasticity, random connectivity and noise) and clarifies that we see our contribution as complementary to these approaches, not as a competitor to the exact QIF reductions.
  
  (2) The authors' derivation of the neural mass model is based on several assumptions, and not all well justified.
  
  We agree that, in the original submission, the modelling assumptions were scattered through the derivation. In the revised manuscript, the three core assumptions are stated explicitly at the point of derivation: (i) the gating variable n is treated as a collective population-averaged variable; (ii) the potassium concentrations Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are homogeneous across the population, biophysically justified by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforces near-instantaneous equilibration at the mesoscopic scale; (iii) no heterogeneity at the level of ion dynamics is assumed. The meaning of "locally homogeneous" is now defined explicitly. In addition, we have added Fig. S2, which quantifies numerically the error introduced by the moment-closure assumption (deviation below 2% for the parameters used in the manuscript).
  
  (3) The formulation of the mean-field derivation is unnecessarily complicated. It could be heavily simplified by following previously published approaches to derive biologically realistic neural masses.
  
  We agree that, under the assumptions ultimately adopted in our model—namely that n, Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are mesoscopic—the final five-dimensional system can be reached by the more direct path used by Guerreiro et al. (2022) and the related literature. We now state this explicitly in the revised manuscript and note that the same system arises under either derivation, so that the reader can take whichever route they find clearer. Our choice to retain the Chen and Campbell (2022) formalism is pedagogical: it exposes the moment-closure step (Eq. 19), the vanishing-flux boundary condition (Eq. 28), and the locations where microscopic versus mesoscopic variables enter the description, which is the more useful presentation for a reader wishing to extend the framework (e.g. to partial heterogeneity in the slow variables or to stochastic gating). We also made the notation in Eqs. (36)–(37) consistent (firing rate r used throughout, full current-based dV/dt̄ restored) and fixed a number of typos and broken equation/reference labels.
  
  (4) The model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.
  
  We partially agree and partially disagree. We agree that the Lorentzian ansatz is strictly valid where the membrane-potential distribution is unimodal; we have replaced "strongly synchronous regimes" by this more accurate formulation throughout the manuscript. We disagree, however, with the implication that the mean-field is useful only in those regimes. Fig. S2, added in this revision, explicitly quantifies the deviation across all dynamical regimes (quiescent, bursting, seizure-like, sustained ictal and depolarization-block dynamics): it remains below 2% for the parameters used in the manuscript, with localized peaks only during the brief sub-to-supra-threshold transitions where the population is genuinely bimodal. Away from these transitions, the mean-field tracks the population average across all dynamical regimes shown in Fig. 3.
  
  General Statements:
  
  The authors honestly declared the many limitations of their approach. It is assumed that the results of the mean-field are somehow inconsistent with the neural network simulations as expected.
  
  We thank the reviewer for acknowledging that the limitations are honestly declared. As detailed above and quantified in Fig. S2, the deviation from the network simulations is bounded and well characterized; it is not assumed but measured.
  
  The authors suggest employing this model for the simulations on the whole connectome to follow seizure propagation, however, I believe that the Epileptor remains superior in this respect to this model. That indeed includes biophysical parameters but their correspondence with the ones employed in the network dynamics remains elusive, due to the many assumptions required to derive this mean-field model. Furthermore, it is more complicated than the Epileptor, I do not think that the present model will be largely employed by the community.
  
  We do not propose our model as a direct replacement for the Epileptor and we do not dispute that the Epileptor is more thoroughly analyzed and more parsimonious. The complementarity we propose is not a replacement but a parameter-grounding: the Epileptor's phenomenological parameters (excitability, slow permittivity) acquire, in our framework, a concrete interpretation in terms of measurable biophysical variables (extracellular potassium, intracellular potassium variation, glial buffering). Retaining the Hodgkin–Huxley substrate is essential to ground these variables biophysically.
  
  To make this complementarity more visible, the Limitations and Discussion section has been expanded to discuss the choice of a purely excitatory network as a first step (with excitatory–inhibitory generalizations available via the synaptic reversal potential) and to point to additional biological ingredients (calcium and other ions, plastic synapses, random connectivity and noise, adaptation, spike-timing-dependent plasticity) that the framework can accommodate, with reference to the next-generation neural mass literature.
  
  We thank the reviewers and editors for their careful reading. We hope this public response makes our reasoning, the limits of our approach, and the concrete revisions made in this round transparent.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) In general, the writing is scattered. Every time a model is introduced, one starts from the general formulation only to find that a very simplified case is used with respect to that formulation, which is very confusing. Authors need to reduce unnecessary formulations that confuse the reader and make it clear which formulations are actually used.
  
  We thank the reviewer for this comment and understand the concern regarding the balance between general formulations and specific approximations. Our intention in including the more general equations and derivations (e.g., Eq. 7 and others) was pedagogical — to ensure completeness and transparency in the modeling steps, especially for readers less familiar with mean-field reductions of biophysically detailed models. These general forms also serve to clarify the assumptions underlying the simplifications we employ. In the latest version, we improved the clarity of core equations (e.g., Eq. 37), which form the basis of all simulations presented (see details below, in the answer to question 14).
  
  (2) The Introduction would benefit from a wider view of the literature. The literature on exact mean field models (i.e. derived from the Lorentzian Ansatz) has flourished in the last years. In particular, it would be worth considering the following papers, where exact neural mass models are applied to perform whole-brain and large-scale brain simulations:
  
  Forrester, M., Petros, S., Cattell, O., Lai, Y. M., O'Dea, R. D., Sotiropoulos, S., & Coombes, S. (2024). Whole brain functional connectivity: Insights from next generation neural mass modelling incorporating electrical synapses. PLOS Computational Biology, 20(12), e1012647.
  
  Perl, Y. S., Zamora-Lopez, G., Montbrio, E., Monge-Asensio, M., Vohryzek, J., Fittipaldi, S.,
  
  Campo, C. G., Moguilner, S., Ibanez, A., Tagliazucchi, E., Yeo, B. T. T., Kringelbach, M. L., & Deco, G. (2023). The impact of regional heterogeneity in whole-brain dynamics in the presence of oscillations. Network Neuroscience, 7(2), 632-660.
  
  Byrne, Aine, James Ross, Rachel Nicks, and Stephen Coombes. "Mean-field models for EEG/MEG: from oscillations to waves." Brain topography 35, no. 1 (2022): 36-53.
  
  Gerster, M., Taher, H., Skoch, A., Hlinka, J., Guye, M., Bartolomei, F.,... & Olmi, S. (2021). Patient-specific network connectivity combined with a next generation neural mass model to test clinical hypothesis of seizure propagation. Frontiers in Systems Neuroscience, 15, 675272.
  
  Byrne, Aine, Reuben D. O'Dea, Michael Forrester, James Ross, and Stephen Coombes. "Next-generation neural mass and field modeling." Journal of neurophysiology 123, no. 2 (2020): 726-742.
  
  Benitez-Stulz, Sophie, Samy Castro, Gregory Dumont, Boris Gutkin, and Demian Battaglia. "Compensating functional connectivity changes due to structural connectivity damage via modifications of local dynamics." bioRxiv (2024): 2024-05.
  
  We have added the following paragraph:
  
  “Recently, a class of these models, called next-generation neural mass models [42], has been developed based on an analytical approach introduced by [25] that allowed for the exact derivation of mean field parameters for a population of quadratic integrate-and-fire (QIF) neurons. These can be linked to EEG/MEG oscillations [43], including epipeltic seizures [43], and have been used to study various aspects of the whole-brain dynamics such as the low-dimensional manifold of the resting state [45,46], aging [47] and neural signatures of consciousness [48].”
  
  We have also modified the preceding paragraph of the introduction that now reads:
  
  “At the mesoscopic level, the observable properties of a neuronal ensemble are generally explained by statistical physics formalism of mean-field theory [19-22]. Mean-field models demonstrated a predictive value for studying the mesoscopic dynamics of neuronal populations [23], providing statistical descriptions of neuronal networks [2, 19, 24-29], which can be used to address questions related to network-level mechanisms [12, 24, 30].
  
  In general, neural mass models have a low enough number of parameters to be tractable and provide general intuitions regarding mechanisms underlying complex neuronal activity [31-36]. For example, statistical population measures, such as the firing rate, can be used to assess mesoscopic dynamics [1, 7, 31, 36-41].”
  
  (3) Moreover, conductance-based models have been already implemented in neural mass models not only in references [69, 71, 95], but also in:
  
  Guerreiro, I. C., Di Volo, M., & Gutkin, B. (2023). A new generation of reduction methods for networks of neurons with complex dynamic phenotypes.
  
  Capone, C., Di Volo, M., Romagnoni, A., Mattia, M., & Destexhe, A. (2019). State-dependent mean-field formalism to model different activity states in conductance-based networks of spiking neurons. Physical Review E, 100(6), 062413.
  
  We have added the following sentence:
  
  “Moreover, conductance-based couplings between the spiking neurons have been already implemented in neural mass models [58, 59, 91, 93, 121], but without an extracellular exchange mechanism.”
  
  (4) Sec. 1.1 As previously established in the literature, a system of all-to-all coupled neuronal equations can be solved exactly in the thermodynamic limit (i.e., infinite neurons limit) if the single neuron membrane potential equation is a quadratic function and if the instantaneous distribution of membrane potentials of neurons in a population is described by a Lorentzian [Montbrió, E., Pazó, D. & Roxin, A. Physical Review X 5 (2), 021028 (2015)]. This means that the thermodynamic limit can be performed for a Canonical Type I model like the quadratic integrate-and-fire.
  
  What is the biological justification and the reason to approximate a different neuron type (a type II neuron model), whose membrane potential equation resembles a cubic function, with a quadratic function? The fact that it can be solved in the quadratic approximation is not, in my opinion, a sufficient justification. It would be more correct to start from a type I neuron at the microscopic level with a quadratic function and then provide additional biological features.
  
  We thank the reviewer for raising this important point. We respectfully disagree with the notion that starting from a canonical Type I model (such as the quadratic integrate-and-fire neuron) would be a more biologically grounded approach. While the quadratic form is analytically convenient, it does not capture certain key features of neuronal excitability particularly those related to bursting, seizure-like events, and depolarization block which are closely tied to the cubic-like nullcline geometry arising in Hodgkin–Huxley-type models, especially in the presence of slow ion dynamics.
  
  Our work seeks to bridge biophysical realism with analytical tractability. The step-wise quadratic approximation we employ is specifically designed to mimic the cubic membrane potential profile that emerges from the full ion-exchange dynamics. While the Lorentzian Ansatz is not strictly justified in this case from first principles, we show that it yields a workable and biologically interpretable mean-field description, which aligns with single-neuron dynamics, population simulations, and even in vitro observations. To our knowledge, this is a novel contribution that extends mean-field modeling beyond currently available approaches, which are often restricted to simplified or phenomenological neuron models.
  
  In this context, using a quadratic approximation is not merely a mathematical convenience — it is a means to retain key dynamical features of more realistic (non-Type I) neurons within a tractable framework, enabling insights into complex behaviors like multistability and pathological bursting.
  
  (5) Sec. 1.2 As shown in Figure 3, the mean-field equations do not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes. This represents a strong limitation in the model, especially because exact neural mass models (as shown in Reference [23]) perfectly fit the dynamics of the underlying network model both in the asynchronous and in the synchronized regime.
  
  We appreciate the reviewer’s observation and acknowledge that our original description may have caused confusion. The model's validity is not strictly limited to strongly synchronous regimes, but rather to regimes where the distribution of membrane potentials across the neuronal population remains unimodal and can be reasonably approximated by a Lorentzian. This includes but is not restricted to—highly synchronized states.
  
  We agree that this distinction is important and have clarified it in the revised manuscript (e.g., “in strongly synchronous regimes” —> “in regimes where the membrane potentials' distribution is unimodal and can be reasonably approximated by a Lorentzian”).
  
  In contrast to exact mean-field reductions based on quadratic integrate-and-fire neurons (e.g., [23]), our model originates from a biophysically grounded HH-type neuron with ion exchange dynamics, and necessarily involves heuristic approximations to achieve a closed-form mean-field description. While this results in a less exact correspondence with network simulations in more heterogeneous or bimodal states, our goal was to retain biological interpretability and account for phenomena such as ion-driven bursting and seizure-like transitions, which are not captured by standard QIF-based neural masses.
  
  We see our contribution as complementary to existing exact reductions — offering a biophysically grounded alternative that remains tractable and informative in a relevant class of unimodal, mesoscopic dynamical regimes.
  
  (6) Sec. 1.3 In this section the authors show the comparison between in vitro experiments and simulations with both the network model and the neural mass model (Figure 4, panels a,b,c). The qualitative agreement that is supposed to be shown is hardly evident. The shape of the signals is different as is the type of bursting. The only agreement results in the fact that there are repeated spiking events at successive times in a periodic manner. However, the time scale of the simulations is different for neural network simulation and mean-field experiment, making it difficult to compare them. While the period of the bursting event is around 2 min for mean field simulation (in according with experiments), the time scale of the network simulation is 60 times smaller, thus meaning that we are considering completely different mechanisms and phenomena. The justification given by the authors, that "the parameters were modified to simulate shorter fluctuations (in the network of Hodgkin-Huxley neurons) for computational efficiency" is inappropriate.
  
  The poor agreement turns out to be even worse in the comparison between experiments and mean-field simulations shown in panels d and e of Figure 4. While the mean field simulation is characterized by a periodic behaviour both in the mean membrane potential and in the external potassium concentration, the in-vitro traces are not periodic and show an increasing irregular activity of the extracellular LFP in correspondence with increasing external potassium concentration.
  
  How it is possible to justify the implementation of this model if the working hypotheses are not supported by the results? The worst agreement of the network simulations with the experiments reinforces the doubt raised in the previous point: what is the reasoning underlying the choice of Hodgkin-Huxley as a single neuron model?
  
  We thank the reviewer for this detailed critique. We acknowledge that the comparisons in Figure 4 involve limitations and we now provide a clearer rationale and context in the revised manuscript. First, we emphasize that our intention is not to claim a quantitative match between the experimental and simulated traces, but rather to demonstrate that our model grounded in biophysical mechanisms such as ion exchange is capable of qualitatively reproducing a key feature observed experimentally: the slow modulation of neuronal activity by extracellular potassium concentration. For example, both in vitro (Fig. 4a, 4d) and in our simulations (Fig. 4b, 4e), bursts of activity ride on slower oscillations of potassium, and the interplay of fast and slow dynamics is central to both.
  
  Regarding the discrepancy in timescales between the neural network and mean-field simulations: the network simulations were intentionally run with accelerated dynamics by rescaling biophysical parameters (e.g., membrane capacitance and gating time constants) to keep the computational cost feasible. We now clarify in the manuscript that this choice is standard practice in computational modeling when the primary goal is to validate dynamical mechanisms rather than replicate absolute timescales.
  
  On the shape of LFP signals: the experimental recordings were AC-coupled, and the DC components associated with slower shifts in membrane potential such as those modeled in the mean-field simulations are not captured in those recordings. This limits the visibility of key features like the underlying potential jumps. Additionally, no claim is made regarding a specific bursting classification in either data or simulation.
  
  We agree that the experimental trace in Fig. 4d shows more complex, non-periodic dynamics (e.g., slowing burst frequency and irregularity), which are not captured by our current deterministic model. These differences could plausibly arise from additional physiological processes (e.g., stochastic transitions between metastable regimes or variability in ion regulation) that are not modeled here. In future work, such phenomena may be captured by introducing noise or parameter variability (see, e.g., Saggio et al., A taxonomy of seizure dynamotypes , elife 2020), or by allowing the parabola coefficients in the nullcline approximation to vary dynamically.
  
  Finally, regarding the choice of a Hodgkin–Huxley-type neuron: this model allows us to incorporate a biophysical description of ion exchange, which is central to the phenomena we study. While modeling the spiking mechanisms explicitly precludes certain mathematical simplifications available to very simplified neuron models with reset, it enables direct links between mesoscopic dynamics and measurable quantities such as extracellular potassium an essential objective of our work. To summarize, we rearranged Fig4:
  
  Potassium can have periodic behavior with V bursting riding on top (Fig.4 a). The model also shows this behavior at different timescales (Fig. b,c,e).
  
  AC LFP recording is filtered so we might not see the V jump during the bursts (because we do not have DC recordings). No claim about bursting class here.
  
  Potassium can also have more complex behavior (e.g., slowing down of burst frequency Fig.4.d), that the deterministic model do not show, but maybe exploring dynamical parameters (e.g., from parabolas or K_bath) or with added noise allowing to jump between regimes (reference Saggio et al. eLife 2020).
  
  (7) Sec. 1.5 Here six neural masses are coupled via long-range structural connections with random weights. Simulations of the system are shown for two different values of the global coupling parameter (G = 0 and G = 100). How many realisations of the network have been considered?
  
  We thank the reviewer for pointing this out. The presented simulation was intended as a proof-of-concept demonstration to illustrate the model’s capacity to support network-level propagation of pathological activity. For this purpose, we considered a single representative realization of the structural connectivity with random weights. Given the deterministic nature of the model and the qualitative focus of the demonstration, additional realizations do not qualitatively change the observed behavior — namely, the transition from localized to network-wide bursting as coupling strength increases. We have now clarified this in the revised manuscript.
  
  “This simulation serves as a proof of concept to illustrate how local pathological activity can propagate through a network depending on the strength of coupling. We used a single representative realization of randomly weighted structural connectivity. While we did not perform a systematic exploration of different realizations or coupling strengths, we observed that the qualitative behavior namely, the emergence of network-wide bursting beyond a critical coupling threshold remains robust across similar setups. The model is compatible with empirical connectome data and can be readily extended to simulations using realistic brain network architectures.”
  
  In future applications involving data-driven network architectures or variability analyses, we agree that exploring multiple realizations or empirical connectomes will be valuable.
  
  How do the results depend on the different choices of the random weights? What is the dependence of the emergent dynamics on G? What kind of dynamics can be observed varying smoothly the parameter G (e.g. from 0 to 100)?
  
  This section serves as a proof of concept to show that pathological activity in one node can propagate through the network when coupling is strong. We used a single random weight configuration and did not systematically explore variations in G or connectivity. While richer dynamics likely emerge across intermediate values of G, a full parameter sweep is beyond the scope of this study. We clarify this in the revised text (see answer above).
  
  (8) Sec. 2.1 In the description of the experiment it is mentioned that only Mg^{2+} is varied. What is the role played by Mg^{2+} variation in influencing the external potassium concentration variation? How the experiment can be linked to the model? How the hypothesis of introducing an equation for the potassium concentration current in the microscopic model is supported by the experiment and vice-versa?
  
  We thank the reviewer for this question. We have added a new subsection in the Methods explaining the.agnesium removal as a mean to influence the external potassium dynamics:
  
  “The membrane of hippocampal neurons is equipped with N-methyl-D-aspartate type glutamate receptors (NMDARs). These receptors have a very high affinity for glutamate and can, in principle, be activated by ambient glutamate present at low concentrations in the brain extracellular fluid (ECF). Under normal physiological conditions, this activation does not occur because extracellular magnesium ions (Mg<sup>2+</sup>) block the NMDAR channel at membrane potentials more negative than about –50 mV; this voltage-dependent block prevents receptor activation at rest. When extracellular magnesium is removed, the block is relieved, allowing NMDARs to be activated, leading to neuronal depolarization toward the action potential threshold [117].”
  
  “In addition, as a divalent cation, Mg<sup>2+</sup> interacts with the negatively charged neuronal membrane, contributing to the stabilization of the resting membrane potential. Lowering extracellular magnesium concentration disrupts this effect, resulting in membrane depolarization [118].”
  
  “Consequently, magnesium removal not only facilitates NMDAR-dependent depolarization, but also directly depolarizes neurons. This depolarization increases the driving force for outward potassium currents through K<sup>+</sup> channels, meaning that variations in Mg<sup>2+</sup> can indirectly influence external potassium dynamics during neuronal activity.”
  
  (9) Sec. 2.6 The modified version of the continuity equation has been derived following Reference [95], where the authors consider a network of Izhikevich neurons, and each neuron is modelled by a two-dimensional system consisting of a quadratic integrate and fire equation plus an equation that implements spike frequency adaptation. In particular, in [95] the authors achieve a closed set of mean-field equations with the inclusion of the mean-field dynamics of the adaptation variable by using a Lorentzian ansatz combined with the moment closure approach. The moment closure condition is also assumed in the present manuscript (Eq. 19). Under which assumptions is the implementation of the moment closure condition justified?
  
  We are thankful to the reviewer (and also to the R2) for pointing out to the validity of the justification of the assumptions that we have used in our formalism. We hence agree that the moment closure is not a sufficient justification for assuming that V depends on the mean n, which is neccessary for the derivation of Eq. 20, but in addition we need the assumption that n can be treated as a collective variable as it is done in the works mentioned by the reviewer 2. In addition we have performed numerical simulations of the full system to calculate the error term introduced by this approximation, and the results in the new Fig. S2 show that this is below 2% for each of the different dynamical regimes.
  
  We have hence modified the justification for Eq. (19) reading:
  
  “Next we assume a first-order moment closure condition for the variable n [59], justified by the numerical simulations of the full network (see Fig. S2) which show that for most of the neurons (close to 99 \% for the value of ∆ same as in the other simulations) the mean of the population is well capturing the behavior of the single neurons [122]. Finally, putting together these factors and assuming that n can be treated as a collective variable for each neuron (see Limitations of the model} section) we arrive to ” and also
  
  “The validity of the first moment closure, Eqs. (19), as in [59], is supported by the numerical simulations, which show that, both, during the silent regime and when seizure-like events occur, n<sub>i</sub> for most neurons track the network averaged ⟨n | V, η⟩. In particular, it is less than 2% of the neurons that fire while the mean is low, and vice-versa, Fig. S2. In less synchronized scenarios (larger ∆ or smaller J), however, this value would increase, but the mean would always capture the qualitative behaviour of the population.”
  
  This is also now explicitly mentioned in the following paragraph:
  
  “Unlike the mean membrane potential ⟨V⟩ and the firing rate (r), which can be explicitly derived from the continuity equation under the Lorentzian assumption, the expression for ⟨n(t)⟩ in Eq. (26) is formal. In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore, ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”
  
  (10) Considering also the comments reported above, I think that it would make more sense to start from an Izhikevich neuron model as microscopic model and add the equations for the ionic currents as mesoscopic variables (i.e. written as population average variables), instead of starting from the Hodgkin-Huxley single neuron model and trying to make hardly justifiable approximations and simplifications.
  
  We respectfully disagree. While the Izhikevich model is computationally efficient, it lacks the biophysical detail required to capture key ion-driven mechanisms such as depolarization block, slow ion accumulation, and specific burst-initiation dynamics all of which are central to our study. The Hodgkin–Huxley framework, despite requiring approximation, provides the necessary physiological grounding to link microscopic ion exchange with emergent population behavior.
  
  (11) Sec. 2.7 What is the advantage of using six more parameters to fit, like R-,R+,c-,c+,I-,I+?
  
  This is in contradiction with the spirit of deriving a mean-field model, where the number of parameters should be reduced. What is the advantage of this mean-field derivation with respect to other mean-field derivations of Hodgkin-Huxley neurons, like the one in Reference [9]?
  
  The additional parameters (R±, c±, I±) are not arbitrary they compactly parametrize the cubic-like nonlinearity of the membrane potential dynamics in our stepwise-quadratic approximation. This trade-off allows us to preserve essential biophysical features of HH neurons (e.g., bursting regimes, depolarization block) within a tractable analytic framework. Compared to alternative approaches like in ref. [9], which focus on phenomenological reductions and do not yield an ODE system, our model offers more direct interpretability in terms of ion dynamics, providing a closer link between microscopic mechanisms and mesoscopic activity patterns.
  
  (12) Sec. 2.11 The derivation of the mean-field dynamics for the gating variable is rather heavy and difficult to follow. This section could be simplified, whilst also better explaining the underlying approximations and the validity of these approximations, which is currently missing.
  
  We agree that the derivation is technical, but we chose to retain it for transparency, as it follows the Chen and Campbell approach and makes key approximations such as moment closure explicit. We have now added a clarification that n is treated as a collective variable We hope that the current level of detail helps readers understand the assumptions underlying the gating variable dynamics.
  
  (13) Sec. 2.12 The derivation of Eqs. (36) is quite confusing and needs to be re-written in a clearer form. Why are both the variables x and r present in these equations, since they are proportional according to Eq. (25)?
  
  We thank the reviewer for pointing this out. We have adjusted the equations to improve clarity and now consistently express the firing rate in terms of a single variable. This removes the redundancy and simplifies the presentation.
  
  (14) Sec. 2.13 The derivation of Eqs. (37) is quite confusing and needs to be rewritten in a clearer form.
  
  Both the auxiliary variable x and the firing rate r are present in this equation, the same as in Eq. (36). Therefore it is presented as a set of equations for the auxiliary variable x and for the physical variable V. Moreover in the equation for dV/dt, the quadratic term in V has disappeared and it is not clear to me which are the variables corresponding to I- and I+. In particular, in Eqs. (36) there are two different current terms I-,I+ for the two equations related to dy/dt. In Eqs. (37) there is a single term (I_{cl} +I_{Na}+I_K+I_{pump})/C_m which is identical for both equations related to dV/dt. I was expecting two different terms also in Eqs. (37).
  
  We appreciate the reviewer’s close reading. To improve clarity, we now express the dynamics in terms of the firing rate r, replacing \dot{x} with \dot{r} in both Eq. (36) and Eq. (37) to avoid confusion.
  
  As for the current terms: in Eq. (37), we reverse the stepwise quadratic approximation and reintroduce the original ionic currents from Eq. (16). This is why the expressions involving I_{\text{cl}}, I_{\text{Na}}, I_K, and I_{\text{pump}} appear as a single summed term in \dot{V}, rather than the split I_-,I_+ terms used in the stepwise approximation. We now clarify this in the text.
  
  We also write V as \bar{V} to clarify that it refers to the average membrane potential for the neuronal population. Finally, we wrote the final equation in a more compact form to improve clarity (new Eq.38).
  
  (15) Moreover, while the equation for the gating variable n can be considered as a differential equation for a mesoscopic variable since n depends on average values only, it is not clear to me if the remaining variables 𝛥[K+]_{int}, [K+]_g can be considered mesoscopic or not. Since Eqs. (37) represent a mean-field model, I expect every variable to be a mean-field variable. This could be easily achievable for the extracellular potassium concentration, but I do not understand how a site-specific microscopic variable like the intracellular potassium concentration variation can be automatically inserted in a set of mean-field equations without any averaging or intermediate steps. This is a crucial point to be clarified for the validity of the neural mass equations.
  
  We thank the reviewer for raising this important point. In our model, we assume spatial homogeneity at the mesoscopic scale, meaning that ion concentrations — both intra- and extracellular — are uniformly distributed across the population. As a result, variables such as \Delta[K^+]_{\text{int}}, Δ[K+]int and [K+]g are treated as population-level averages, consistent with the mean-field framework.
  
  Moreover, the rate of change of intracellular potassium is tightly coupled to extracellular dynamics via ion exchange mechanisms, justifying its inclusion as a slow, mesoscopic variable. We now clarify this modeling assumption explicitly in the text.
  
  “By locally homogeneous, we mean that all neurons in the population are assumed to share the same extracellular and intracellular ionic environment and are connected with identical coupling rules, allowing us to treat the population as uniform with respect to ion dynamics and connectivity.”
  
  “These slow variables are in addition considered to be mesoscopic, meaning they are identical for every neuron in the population.”
  
  Minor points:
  
  (1) Figure 2, panel d. Please detail the variable on the y-axis, which is not reported in the figure.
  
  Done
  
  (2) Eq. (15) is cited in many parts of the manuscript, while it seems to me it would be more appropriate to reference Eq. (2). Is this a mistake or is there a reason to cite Eq. (15)?
  
  The reviewer is correct, we have had a wrong equation label, which we have now corrected.
  
  (3) Figure 4 Would it be possible to show enlargements of the mean membrane potential traces to directly compare the different bursting types shown by the simulation of the different models?
  
  The panel d already contains enlarged part of the membrane potential traces. For the rest, going back to the Q6, we want to stress again that our intention is not to claim a quantitative match between the experimental and simulated traces.
  
  (4) Figure 5 In the caption the author refers to "the generic model, single neuron model, and epileptor model". Could you please better explain the models referred to and why they are mentioned? Are the generic model and the single neuron model those that are presented in the Materials and Methods section? Or do you refer to completely different models, as for the epileptor?
  
  We have removed the reference to the generic model (we had in mind the canonical model for seizures by Saggio et al. 2017), since it is not mentioned in the paper, and we have clarified that the single neuron model and epileptor model, which were used to simulate seizure like events.
  
  (5) Sec 2.5 As already stated above, the authors need to reduce unnecessary formulations that confuse the reader. Here, for example, Eqs. (6) and (7) are unnecessary, in view of the fact that delta spikes are used (Eq. 8).
  
  We thank the reviewer for the suggestion, but we disagree, and we think it is better to start the derivations from the more general case, as done with Eqs. 6-7.
  
  (6) Sec. 2.6 Could you please better explain why in Eqs. (15) and (16), the variable V0 is introduced, while before and after this, the variable V is used?
  
  We thank the reviewer for the comment. In Eqs. (15) and (16), \dot{V}_0 denotes the free term of the membrane potential equation, i.e., the component driven solely by the intrinsic ionic currents and excluding the synaptic input I_syn. Only this \dot{V}_0 term (a function rather than an independent variable) is approximated by the piece-wise quadratic expression in Eq.(21). In contrast, the variable V represents the membrane–potential variable, which dynamics is obtained by combining \dot{V}_0 with the synaptic current contribution I_syn. In summary, there is no independent variable V_0; only the function \dot{V}_0 is introduced to represent the intrinsic (non-synaptic) component of the membrane–potential dynamics. We have now clarified this in the text.
  
  (7) In the square brackets of the r.h.s. of Eq. (18), for all the intermediate steps, it appears G^n(V,n) ϱ^V, while there should be G^n(V,n) ϱ^n.
  
  We thank the reviewer for catching this typo. We have corrected this in the revised manuscript.
  
  (8) Sec. 2.8 Here the authors affirm that "a double-Lorentzian (or a piece-wise Lorentzian) could be a suitable form for ρ^V (t, V | η). However, it is not clear under which conditions such an assumption would allow a solution to the continuity equation". What are the problems underlying the implementation of the double Lorentzian? It seems to be a more correct form than the single Lorentzian actually implemented.
  
  We thank the reviewer for this thoughtful question. In principle, a double-Lorentzian ansatz for \rho^V can indeed be implemented in several reasonable ways–for example, by enforcing that the combined area of the two Lorentzian components is normalized to one (to preserve the probabilistic interpretation) and by imposing smoothness constraints at their boundaries. However, despite exploring these implementations, we were unable to obtain non-trivial solutions of the continuity equation under this parametrization. The only solvable case we found is the degenerate one in which the two Lorentzians collapse onto each other (i.e., (x_- = x_+) and (y_- = y_+)), which reduces the ansatz to the single-Lorentzian form used in the manuscript. For this reason, although the double-Lorentzian is conceptually appealing, it did not yield practically useful solutions within our framework.
  
  (9) Eq. (28). The symbols used for the flux (especially those used in the second-to-last step once the inner integration is performed) are confusing and it is difficult to understand what they mean.
  
  We thank the reviewer for noting this issue. The problem was due to a LaTeX typo that prevented the vertical lines—indicating that the flux is evaluated at specific points—from rendering correctly. We have now corrected this.
  
  (10) Eq. (29) In the third step there are some misprints that impair comprehension.
  
  We thank the reviewer for noting this. We have corrected these misprints in the revised version.
  
  (11) Line 696. The reference is not displayed.
  
  Fixed.
  
  Reviewer #2 (Recommendations for the authors):
  
  As a really general remark, this manuscript is written in a confusing manner, the authors present their model in a general formulation and their analysis in a complicated way that in the end is not needed, as I will explain in detail in the following.
  
  Another general question is why the authors want to employ the neural mass reduction methodology developed in [23] to obtain exact mean-field evolution for quadratic neurons (like the quadratic integrate and fire (QIF)) for a model that reveals a cubic dependence on the membrane potential, as the FizhHugh-Nagumo neuron (that indeed is a 2d reduction of the Hodkgin-Huxley model), to obtain an approximate neural mass model that somehow works qualitatively only for synchronized dynamics? Why not use another approach more suited to derive the neural mass model for cubic nonlinearity, as the one suggested in [33] and [69] by Di Volo and co-authors? What is the rationale behind the choice of the authors?
  
  We appreciate the reviewer’s critical feedback and the opportunity to clarify our methodological choices. Our decision to base the mean-field model on Hodgkin–Huxley-type neurons stems from the need to retain ion channel dynamics, which are essential to capture the coupling between membrane activity and extracellular ionic concentrations. This biophysical link is central to our study and cannot be achieved using more abstract neuron models such as QIF or FitzHugh-Nagumo alone.
  
  Regarding the mean-field reduction method: while the Ott-Antonsen/Lorentzian framework is indeed exact for QIF neurons, we adopted a stepwise quadratic approximation to apply a similar formalism to the cubic-like dynamics of the HH model. This choice enables us to analytically capture a rich set of behaviors, including bursting, depolarization block, and seizure-like dynamics, in a tractable mean-field system.
  
  We considered the approach of Di Volo and colleagues [33, 69], but their methodology is tailored to asynchronous irregular regimes, whereas our model is specifically designed to capture dynamics in quasi-synchronous or bursting regimes — including epileptiform activity — which are not covered by the assumptions of the Di Volo framework.
  
  We now clarify these modeling choices more explicitly in the revised manuscript.
  
  "Unlike phenomenological or reduced models, the Hodgkin–Huxley framework allows us to retain explicit ion exchange dynamics, which are essential for linking membrane behavior to extracellular potassium fluctuations. This level of biophysical detail is crucial for modeling pathological regimes such as seizure onset and propagation."
  
  Furthermore, the derivation of the neural mass equations is unnecessarily complicated, as a matter of fact, they approximate all the variables (except the membrane potentials of the single neurons) as collective variables (i.e. the gating variable and the potassium concentration) common to all the neurons. The neural network model for which they derive the neural mass model presents microscopic evolutions of the membrane potential cubic-like plus other global variables equal for all neurons, that depend on collective variables such as the mean membrane potential or the mean firing rate. Once clarified, the derivation of the neural mass model is much simpler, and it is not necessary to follow the approach reported in Reference [95] [Chen, L. & Campbell, S. A. Exact mean-field models for spiking neural networks with adaptation. Journal of Computational Neuroscience 50 (4), 445-469 (2022)] which is unnecessarily complicated. The authors can follow a much simpler methodology as explained by Guerriero et al in Reference [R6] (cited below) where the authors consider the same model studied in [95]. Such a methodology has been applied in many cases already, to introduce realistic aspects in the neural mass model [23] (see References [R1-R7] below). I strongly encourage the authors to reformulate their approach in a simpler and clearer manner, by following the approach reported in [R1-R7]. The manuscript will become more readable and it will gain in comprehension.
  
  We thank the reviewer for this helpful suggestion. We agree that, given the assumptions made in our derivation (i.e., shared gating and ion concentration variables across neurons), the mean-field equations could alternatively be obtained using the simpler methodology proposed by Guerriero et al. [R6] and related works [R1–R7]. However, we chose to follow the derivation presented by Chen and Campbell [95] because it makes the approximations (e.g., moment closure, flux boundary assumptions) explicit and generalizable to future extensions. However, we also acknowledge that the assumption of n to be treated as a collective variable is needed, and for clarity, we have now added a remark in the manuscript indicating that the same result could be recovered more directly using the approach of Guerriero et al.
  
  “We note that, under the assumption of globally shared gating and ion concentration variables across the neuronal population, the resulting mean-field equations can also be derived using simpler methods as proposed by Guerriero et al [58]. In this work, we follow the more general formalism of Chen and Campbell [59], which makes the role of key approximations (e.g., moment closure, vanishing flux at boundaries) explicit. This also facilitates potential generalizations to settings with partial heterogeneity or dynamic gating distributions.”
  
  “Finally, putting together these factors and assuming that n can be treated as a collective variable for each neuron”
  
  “Unlike the mean membrane potential ⟨V⟩ and the firing rate (r), which can be explicitly derived from the continuity equation under the Lorentzian assumption, the expression for ⟨n(t)⟩ in Eq. (26) is formal. In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore, ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”
  
  Now I will examine in detail all the manuscript and report comments/remarks/suggestions numbered as (Q#) on how to improve the present manuscript to render it easier to read and more comprehensible, these are not minor remarks, just detailed ones.
  
  Introduction
  
  (Q1) The Introduction section needs a part devoted to the reduction methodology developed in [23] for QIF neurons and a presentation of previous works dealing with the introduction of biologically realistic aspects in the neural mass model derived in [23]. Here is a non exhaustive list of such papers concerning the introduction of the following realistic aspects in the neural mass developed in [23]:
  
  (I) short-term synaptic plasticity :
  
  [R1] Exact neural mass model for synaptic-based working memory H Taher, A Torcini, S Olmi, PLOS Computational Biology 16 (12), e1008533 (2020)
  
  [R2] Bursting in a next generation neural mass model with synaptic dynamics: a slow-fast approach H Taher, D Avitabile, M Desroches, Nonlinear Dynamics 108 (4), 4261-4285 (2022)
  
  [R3] Mean-field approximations of networks of spiking neurons with short-term synaptic plasticity R Gast, K Thomas R, H Schmidt, Physical Review E 104 (4), 044310 (2021)
  
  (II) spike frequency adaptation:
  
  [R4] Gast, Richard, Helmut Schmidt, Thomas R. Knösche. "A mean-field description of bursting dynamics in spiking neural networks with short-term adaptation." Neural computation 32.9 (2020): 1615-1634.
  
  [R5] Population spiking and bursting in next-generation neural masses with spike-frequency adaptation, A Ferrara, D Angulo-Garcia, A Torcini, S Olmi, Physical Review E 107 (2), 024311 (2023).
  
  (III) conductance-based neuron with a slow current (Izekievic model):
  
  [R6] A new generation of reduction methods for networks of neurons with complex dynamic phenotypes,IC Guerreiro, M Di Volo, B Gutkin, preprint arxiv: 2206.10370 (2022)
  
  (IV) spike timing-dependent plasticity:
  
  [R7] Mean-field approximations with adaptive coupling for networks with spike-timing-dependent plasticity, B Duchet, C Bick, Á Byrne, Neural computation 35 (9), 1481-1528 (2023).
  
  (V) random connectivity and noise:
  
  [R8] Mean-field models of populations of quadratic integrate-and-fire neurons with noise on the basis of the circular cumulant approach
  
  DS Goldobin Chaos: An Interdisciplinary Journal of Nonlinear Science 31 (8) (2021)
  
  [R9] A reduction methodology for fluctuation-driven population dynamics DS Goldobin, M Di Volo, A Torcini, Phys. Rev. Lett. 127, 038301 (2021)
  
  [R10] Shot noise in next-generation neural mass models for finite-size networks VV Klinshov, SY Kirillov Physical Review E 106 (6), L062302 (2022)
  
  I think the authors should refer in the introduction to these previous papers, where realistic biological aspects have been already introduced in the neural mass model developed in [23].
  
  We have added a whole pragaraph devoted to the next-generation neural mass models and in particular to the other works introducing biological realism in this class of models:
  
  “Recently, a class of these models, called next-generation neural mass models [42], has been developed based on an analytical approach introduced by [25] that allowed for the exact derivation of mean field parameters for a population of quadratic integrate-and-fire (QIF) neurons. These can be linked to EEG/MEG oscillations [43], including epipeltic seizures [44], and have been used to study various aspects of the whole-brain dynamics such as the low-dimensional manifold of the resting state [45, 46], aging [47] and neural sig natures of consciousness [48]. Number of works dealt with the introduction of biologically realistic aspects in the mostly phenomenological neural mass model derived in [25]. These included short-term synaptic plasticity [49–51], spike frequency adaptation [52, 53], spike timing-dependent plasticity [54], synaptic delay [29], random connectivity and noise [55–57], as well as an extension of the conductance-based neurons with a recovery variable [58–60].”
  
  (Q2) Line 117 - Please specify what you mean by locally homogeneous, here.
  
  Thank you for allowing us the opportunity to clarify this. We now report:
  
  "By locally homogeneous, we mean that all neurons in the population are assumed to share the same extracellular and intracellular ionic environment and are connected with identical coupling rules, allowing us to treat the population as uniform with respect to ion dynamics and connectivity."
  
  (Q3) In this sub-section the authors should clarify all the hypotheses they employ to derive the neural mass models, not only the Lorentzian approximation they did for a cubic model, but also the fact that they assume that the gating variable n is a global variable as well as that the potassium concentration are assumed to be the same for all neurons, that they assume no heterogeneity at this level. This is a fundamental aspect that should be clarified at this stage already.
  
  We thank the reviewer for this important observation. We agree and have revised the text in the derivation section to explicitly state all key assumptions. Specifically, we now clarify that:
  
  (1) The gating variable n is treated as a population-average (global) variable;
  
  (2) The potassium concentrations Δ[K+]int and [K+]g are assumed to be homogeneous across the neuronal population; and (3) No heterogeneity is assumed at the level of the ion dynamics.
  
  This assumption is biophysically motivated: ion concentrations — particularly extracellular potassium — tend to redistribute rapidly due to diffusion and electrochemical forces, leading to an effectively well-mixed environment at the mesoscopic scale. As such, assigning separate compartments to individual neurons is not justified in this modeling context. We now explicitly note this in the manuscript to avoid ambiguity.
  
  “3) We assume that the potassium concentrations, both intracellular(\( \Delta[K^+]_{\text{int}} \)) and extracellular (through the buffering variable \( [K^+]_g \)), are homogeneous across the neuronal population. This is justified physiologically by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforce near-instantaneous equilibration at the mesoscopic scale. As such, assigning separate compartments to each neuron is neither practical nor biologically meaningful in this context. We assume that the potassium concentrations, both intracellular (\( \Delta[K^+]_{\text{int}} \)) and extracellular (through the buffering variable \( [K^+]_g \)), are homogeneous across the neuronal population. This is justified physiologically by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforce near-instantaneous equilibration at the mesoscopic scale. As such, assigning separate compartments to each neuron is neither practical nor biologically meaningful in this context; 4) We assume that the gating variable n, which governs potassium conductance, can be treated as a population-averaged variable. This allows us to describe the neuronal ensemble using a reduced set of collective (mean-field) variables.”
  
  Comparison with neural network simulations
  
  (Q4) The comparison the authors perform between the microscopic model and the neural mass is misleading, From what the authors wrote it seems that you are considering 4 variables for each neuron in the network model (this is unclear from how the model is written in Eq (9)), I guess one for the membrane potential, one for the gating variable and two for the potassium concentration. However, this is not the network model for which the neural mass has been developed, the neural mass has been obtained for a network made of N + 3 variables (N membrane potentials and 3 collective variables for gate, and potassium concentrations) this is a sort of mesoscopic network models, analogously to what done previously in references [R1,R3,R4] above and others. If the authors would compare their neural mass with this mesoscopic model the agreement among the two would be improved.
  
  We agree with reviewer’s observation and we now acknowledge this issue in the Results and in the Limitations. We have already modified the text to explicitly state that for the mean filed derivations n is treated as a collective variable and we have added the following statements:
  
  “Also note that the gating variable n is treated as microscopic in the neural network, while in the derivations for the mean-field it is considered as a mesoscopic and identical for the whole population. This is likely responsible for some of the discrepancies between the two modalities.”
  
  “Moreover, the discrepancy between the two modalities would have likely been smaller if for the neural network we also adopted a gating variable that is mesoscopic and identical across the spiking neurons, as in similar works [49–51]. However, here we demonstrate the validity of the mean-field approximation even for the more natural, microscopic representation of the gating variable in the neural network.”
  
  Comparison with in vitro experiments
  
  (Q5) Experiment -- The experiment is performed in vitro on the intact Hippocampus of mice between postnatal days P5-P7. It is known [R1] that neuronal activity at an early developmental stage is provided in the Hippocampus by a network primarily driven by synchronized GABA_A that provides an excitatory action and generates giant depolarizing potentials (GDPs) [R11]. However, GDPs have frequencies in the range of 1 Hz - 0.1 Hz, not matching the oscillation frequencies reported by the authors. I have several questions here:
  
  (E1) At this stage P5-P7 are the interactions among neurons essentially excitatory? Or not, please explain why, Are the oscillations reported by the authors somehow related to GDPs? The depolarizing action of GABAergic transmission and the presence of GDPs during early rodent brain development, as described by Ben-Ari and some others researchers, are characteristics commonly observed in ex vivo brain preparations, but are not evident under physiological in vivo conditions (see doi: 10.3389/fphar.2012.00065).
  
  In our preparation—intact mouse hippocampus—GABAergic synaptic transmission is not depolarizing. This is evidenced by the fact that inhibition of ionotropic GABA_A receptors with bicuculline triggers interictal-like discharges, which are routinely used as a model of epileptiform activity (see doi: 10.1016/j.nbd.2014.12.013). Therefore, in our experiments at P5–P7, neuronal interactions are not purely excitatory, and the observed low Mg2+ induced oscillations are not related to GDP.
  
  (E2) What is the nature of the oscillations reported by the authors in Figure 4 ? Which is their origin, please explain in the text of the paper clearly.
  
  The model of epileptic discharges presented in our study was first introduced over 20 years ago and has since become a well-established paradigm for screening potential antiepileptic drugs and research on the mechanism of epileptic seizure. A detailed description of this model can be found in doi: 10.1046/j.1460-9568.2002.02143.x, and its pharmacological properties are reviewed in doi: 10.1046/j.1528-1157.2003.19503.x. These references have now been added to the manuscript for clarity.
  
  We have added the following:
  
  “The model of epileptic discharges presented in our study was first introduced over 20 years ago [115] and has since become a well-established paradigm for screening potential antiepileptic drugs and research on the mechanism of epileptic seizure [116].”
  
  (E3) How exactly does the concentration of extracellular potassium ions change, this is not clear even in Methods, please clarify.
  
  [R11] Excitatory actions of GABA during development: the nature of the nurture Y Ben-Ari, Nature Reviews Neuroscience 3 (9), 728-739 (2002).
  
  We have now added a new Subsection in the methods explaining how we use Mg2+ variation to influence the external potasium variation.
  
  “The membrane of hippocampal neurons is equipped with N-methyl-D aspartate type glutamate receptors (NMDARs). These receptors have a very high affinity for glutamate and can, in principle, be activated by ambient glutamate present at low concentrations in the brain extracellular fluid (ECF).Under normal physiological conditions, this activation does not occur because extracellular magnesium ions (Mg<sup>2+</sup>) block the NMDAR channel at membrane potentials more negative than about –50 mV; this voltage-dependent block prevents receptor activation at rest. When extracellular magnesium is removed, the block is relieved, allowing NMDARs to be activated, leading to neuronal depolarization toward the action potential threshold [117]. In addition, as a divalent cation, Mg<sup>2+</sup> interacts with the negatively charged neuronal membrane, contributing to the stabilization of the resting membrane potential. Lowering extracellular magnesium concentration disrupts this effect, resulting in membrane depolarization [118]”
  
  “Consequently, magnesium removal not only facilitates NMDAR-dependent depolarization, but also directly depolarizes neurons. This depolarization increases the driving force for outward potassium currents through K<sup>+</sup> channels, meaning that variations in Mg<sup>2+</sup> can indirectly influence external potassium dynamics during neuronal activity.”
  
  (Q6) Lines 187-191 and Figure 4 -- The authors wrote : "In Figure 4.c we show the membrane potential and external potassium for a simulation of N = 3000 coupled HH-like neurons showing a similar behavior, although the parameters were modified to simulate shorter fluctuations for computational efficiency." This sentence is unclear. What is clear from Figure 4 is that the network simulations gave rise to collective oscillations on a completely different scale seconds with respect to minutes and also the profile of the potassium concentration has a clearly different evolution. From Figure 4 one can conclude that network simulations have nothing to do with the neural mass evolution and the experiment. I think the authors should better clarify and describe the results reported in Figure 4.
  
  We thank the reviewer for the observation. We have revised the relevant section of the manuscript to clarify the interpretation of Figure 4 and avoid any implication of quantitative matching. As stated in our response to Reviewer 1 (comment 6), the comparison is intended to highlight the shared qualitative structure across experimental data, the neural mass model, and the network simulation — specifically, the modulation of fast bursting by slow extracellular potassium fluctuations. The difference in timescale in the network simulation arises from rescaled parameters used for computational efficiency. We now explicitly state this and have updated the figure caption and accompanying text accordingly to reflect these points.
  
  (Q7) Why do the authors consider a purely excitatory network to describe the experimental results? What is the reason for this choice? Why they do not consider as usual balanced excitatory- inhibitory networks? Please clarify this point.
  
  We thank the reviewer for raising this point. We chose to model a purely excitatory network as a first step in isolating the role of extracellular potassium dynamics in generating population-level bursting. This allows us to focus on the ion-driven modulation mechanisms without introducing additional complexity from inhibitory feedback. Similar modeling choices have been made in previous studies of bursting and seizure-like dynamics (e.g., Gutkin et al.,), where inhibition is omitted to emphasize intrinsic or modulatory mechanisms. We acknowledge that incorporating inhibitory populations is an important next step for capturing a broader range of dynamics, but for the current study, the excitatory-only network provides a minimal and interpretable framework aligned with our focus.
  
  (Q8) By comparing Figures 4 (a) and (b) it seems that the bursting activity observed in the experiment and in the mean-field simulations seem quite different, originating from different mechanisms and bifurcations, Can the authors comment on this?
  
  We thank the reviewer for this important observation. We have reorganized the presentation of Figure 4 and revised the accompanying text to better clarify the nature of the comparison (see also our response to Reviewer 1, point 6). Our aim is not to claim that the experimental and simulated bursts arise from identical bifurcation mechanisms, but rather to highlight shared qualitative features — in particular, slow modulation of population activity by extracellular potassium. We now also comment on the potential role of more complex or noise-driven bifurcations (see Saggio et al. 2020) in shaping experimental bursting dynamics, which are not fully captured by the current deterministic model.
  
  Bifurcation analysis: emergent network states and multistability
  
  (Q9) This sub-section will gain interest by reporting simulations of the network and of the neural mass model presenting bistable dynamics.
  
  We agree with the reviewer that this would be an important addition, but we believe that it goes beyond the scope of this work (for the computational reasons among others) and it remains for future work. We have however updated the bifurcation analysis section.
  
  Limitations of the model
  
  (Q10) Lines 276- 280 -- I think that the parameters c+,c_,R+,R_ depend not only on the slow variables, potassium concentrations but also on the actual value of the gate variable n. This should be stressed.
  
  We thank the reviewer for this helpful observation. We agree and have clarified in the revised manuscript. This reflects the mean-field assumption that n is treated as a collective variable, and we now make this dependency explicit in the text.
  
  “Furthermore, the parabola coefficients c_-,c_+, R_-, R_+ were fixed as constants, however, these coefficients could be made functions of the slow variables and the gating variable, which might unveil new dynamical regimes and extend the validity of the thermodynamic limit beyond the regimes described in this work. Also, in the case of constant values, an in-depth exploration of the parameter space is required to fully characterize the model and its bifurcation structure.”
  
  (Q11) The authors wrote: " Other limiting assumptions are the moment closure condition (19) and the assumptions that the functions (3) averaged across the neuronal population can be expressed as functions of the average membrane potential V and gating variable n (which is only true in the cases where the functions (3) can be reasonably approximated as linear functions in a range of V and n." Apart from that a parenthesis is lacking, I think that this last aspect has been already taken into account when performing the fit with 2 parabolas to the sum of the currents, or not? In case, please specify.
  
  We thank the reviewer for catching the missing parenthesis — this has been corrected in the revised manuscript. Regarding the modeling point: the two-parabola fit applies specifically to the membrane potential dynamics and captures the nonlinear dependence of the total current on V (eq.16). In contrast, the moment closure assumption involves approximating averages of nonlinear functions of both V and n, such as those appearing in the gating dynamics (e.g., n∞(V)). This is not directly accounted for by the parabola approximation, but is handled separately via the mean-field approximation of G^n as a function of the average variables (eq.15).
  
  (Q12) A limitation that should be stressed is that the authors in the neural mass model consider the gate variable and the potassium concentrations, as global variable equal for all neurons, and where n depends on the mena membrane potential, to write that the moment closure (19) is a limiting assumption is honestly too clear, please be explicit here.
  
  We have now the following two statements:
  
  “These slow variables are in addition considered to be mesoscopic, meaning they are identical for every neuron in the population.”
  
  “In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore, ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”
  
  Discussion
  
  (Q13) The authors could discuss in this section the further biological ingredients they can introduce in their neural mass based on the previous works [R1-R9] that have already shown how to include plastic synapses, random connectivity, noise, adaptation, spike-timing-dependent plasticity, etc and which of these ingredients they consider more relevant for the whole brain dynamics.
  
  In order not to repeat the same statements from the Introduction, we have now addded the following sentence:
  
  “This approach, taking into account key biophysical details, offers a first step in considering the role of the glia in neural tissue excitability. Following this direction, other ions, such as calcium should be taken into consideration, as well as other effects such as plastic synapses, random connectivity, noise, adaptation, spike-timing-dependent plasticity, as already discussed in the Introduction.”
  
  (Q14) The authors should also discuss why they limited their analysis to purely excitatory networks, and what would change by including excitatory-inhibitory interactions in each single mass and across neural masses, if this makes sense or not.
  
  As stated in our response to Q7, we chose to focus on purely excitatory networks as a first step to isolate and study the core role of extracellular potassium dynamics in driving bursting behavior. This modeling choice allows for a minimal system where the interaction between intrinsic ionic mechanisms and network coupling is most transparent.
  
  We also note that excitatory and inhibitory effects can be modeled within the same formalism by adjusting the synaptic reversal potential — for example, $E_{syn}=0$mV for excitatory, and $E_{syn}=-80$mV for inhibitory interactions. Including inhibitory populations would introduce additional complexity and richer dynamical regimes (e.g., oscillatory instabilities, balance states), which are certainly of interest but beyond the scope of this study.
  
  Materials and Methods
  
  (Q15) Fig.2 - I think a plus is lost in panel (c) where it should be [K+bath];
  
  Thank you. We corrected the figure.
  
  (Q16) Caption of Figure 2- the authors wrote: "In the case where the derivative of the membrane potential is zero for V > V ⋆ (e.g., if the cubic function is shifted up by adding a constant current to the membrane potential derivative), the population is described by the red distribution in the steady state, and the continuity equation is governed by the negative parabola equation." This sentence is unclear, the authors mean in the case where the derivative of the membrane potential crosses zero at V > V*? Please clarify.
  
  We thank the reviewer for pointing this out. Yes, we refer to the case where the membrane potential derivative crosses zero at a point V>V∗. We have clarified this in the revised figure caption.
  
  (Q17) Lines 558-562 -- Eqs (6) and (7) are examples of unnecessary complications of which this manuscript is full of. Since the authors do not consider any synaptic dynamics and homogenous (equal) couplings, these equations are not needed, I strongly recommend removing Eqs (6) and (7) and limiting to the expression reported in Eq (8), which indeed should also be corrected see next remark.
  
  We appreciate the reviewer’s concern regarding clarity. As mentioned in our response to Reviewer 1, the inclusion of Eqs. (6) and (7) was intentional and serves a pedagogical purpose — to present the general structure of the network interactions before introducing simplifying assumptions. While we agree that Eq. (8) suffices for the simulations considered in this manuscript, we believe that showing the more general form helps clarify the model’s extensibility, for instance to cases with heterogeneous coupling or synaptic dynamics.
  
  (Q18) Eq (8) - line 562 - Since the authors assume no synaptic evolution, i.e. instantaneous post-synaptic potentials, they can clarify that Eq (8) represents the population firing rate that later will be one of the fundamental variables of the neural mass model and call it r, as in the following. Furthermore, $s_i$ does not depend on the neuron index $i$ in a fully coupled network with homogenous coupling, as in the present case, this quantity is the same for all neurons. Please drop the index and call it r since it is the population firing rate.
  
  We thank the reviewer for this useful suggestion. We now clarify in the text that under the assumptions of all-to-all homogeneous coupling and no synaptic dynamics, s_i is identical for all neurons and can be interpreted as the population firing rate r. This connection is made explicit in the revised manuscript.
  
  “Under the assumption of instantaneous synaptic transmission and homogeneous all-to-all coupling, the synaptic activation variable (s<sub>i</sub>) is the same for all neurons and corresponds to the population firing rate, which we denote by (r)”
  
  (Q19) Line 564-567 - Here the network model is incomplete, it is not sufficient that the authors report the evolution equation for the membrane potential Eq (9). They should report the evolution equation for the gate variable n and for the potassium concentration as done in Eq (1). This request is fundamental because it is unclear from the present formulation which are the variables that are microscopic (associated with the single neuron evolution) and which are global (common to all the neurons). This is a fundamental aspect and it should be clarified. I guess that n will depend on the neuron index $i$, while the potassium concentration it is unclear how the authors will consider them, global or local. I guess that the internal density should depend on the neuron index $i$ or not ? Anyway, I would like to know exactly which network model has been simulated e.g. to obtain the results reported in Figure 3.
  
  We thank the reviewer for this essential clarification request. In the revised manuscript, we now explicitly state the full network model, including the evolution equations for the gating variable n_i and potassium variables. While in some simulations we consider the full microscopic model involving 4N variables (where each neuron has its own V_i ,n_i ,Δ[K+]int_i ,[K+]g_i), for the mean-field reduction and mesoscopic comparisons we assume that the gating and potassium variables are shared across neurons. This assumption is consistent with prior work (e.g., Chen & Campbell) and is biophysically justified in the case of potassium due to its fast spatial equilibration in extracellular space. We also now mention this explicitly in the Limitations.
  
  (Q20) Continuity equation - Lines 568 - 597 - This part can be largely simplified and rewritten, as a matter of fact, the authors consider the gate variable n, the potassium concentrations as global (collective variables) depending on mean field values of <V> they can directly start from eq 20, by stating that they assume that the other variables (n, $\Delta[K^+]_{int}$, $[K^+]_g$) are collective variables, common to all the neurons, and that depends only on mean field variables as <V> or r. This has been done in many previous cases since the Ott-Antonsen Ansatz can be applied whenever the potential evolution is driven by quadratic terms and in the presence of mean field variables, the first indication of this was reported in 1993 by Watanabe and Strogatz for phase oscillators :
  
  [R12] Watanabe, Shinya, and Steven H. Strogatz. "Integrability of a globally coupled oscillator array." Physical review letters 70.16 (1993): 2391.
  
  Anyway, this approach has been previously employed to derive a neural mass model for networks of QIF neurons in the presence of various further neuronal variables (ranging from slow currents to plastic evolution of the couplings) describing more biologically realistic situations, see references [R1-R7] above. I strongly encourage the authors to reformulate their approach in a simpler and clearer manner, particularly interesting is for them the article [R6] by Guerriero et al, the authors examine exactly the same model as in Ref [95] [Chen, L. & Campbell, S. A. Exact mean-field models for spiking neural networks with adaptation. Journal of Computational Neuroscience 50 (4), 445-469 (2022)]. However, they solve the problem in a much more simple way, I encourage the authors to follow this approach.
  
  We thank the reviewer for the constructive suggestion. We acknowledge that, under the assumption that n, Δ[K+]int , and [K+]g are collective variables shared across the neuronal population, one could directly begin from Eq. (20) and proceed using the simpler approaches found in Guerriero et al. [R6] or related works [R1–R7]. However, we chose to retain the Chen & Campbell formalism, with additional clarification regarding the mesoscopic nature of the gatin variable, as it explicitly highlights the key approximations used in the derivation, which may be beneficial for readers seeking to extend the method. See also general response to reviewer 2 at the beginning.
  
  (Q21) Eq (26) -- I do not think the authors can estimate explicitly <n(t)> from the equation (26), as they do for the mean membrane potential and the firing rate. This is just a formal expression representing a collective variable, I do not think that <n> will coincide with the average of the values of n_i for each neuron. Please discuss this point, and in this case show that <n> indeed coincides with the average of all of the values of the single neuron gate variable n_i.
  
  We thank the reviewer for raising this important point. We agree that Eq. (26) is more formal than operational, as ⟨n(t)⟩ is not directly derived from the continuity equation in the same way as ⟨V⟩ or the firing rate r. Rather, it reflects our mean-field assumption that the gating variable evolves as a collective population-averaged quantity, governed by the dynamics of the average membrane potential. In our formulation, n is treated as a global variable shared across neurons, and thus ⟨n(t)⟩ effectively is the gating variable in the neural mass model — rather than the result of averaging heterogeneous n_i. We have clarified this distinction in the text to avoid suggesting that Eq. (26) provides an explicit estimate of microscopic gating dynamics.
  
  “Unlike the mean membrane potential ⟨V⟩ and the firing rate (r)>, which can be explicitly derived from the continuity equation under the Lorentzian assumption, the expression for ⟨n(t)⟩ in Eq. (26) is formal. In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”
  
  (Q22) Mean-field dynamics for the gating variable - All this sub-section is in my opinion not useful, if the authors assume from the beginning that <n(t)> is a global variable. Indeed in the end they write for <n(t)> the evolution equation Eq (30) which is the same equation as for the single neuron gate variable (1) but for the mean values of n and <V>. I suggest removing this sub-section.
  
  We thank the reviewer for this suggestion. We agree that, under the assumption that n is a global collective variable, the resulting equation for ⟨n(t)⟩\langle n(t) \rangle⟨n(t)⟩ is equivalent in form to the single-neuron gating equation, driven by the average membrane potential. However, we chose to retain this subsection to explicitly demonstrate how the gating dynamics enter into the mean-field formulation, especially for readers less familiar with this type of reduction. This step also mirrors the structure of the derivation used for other state variables in the model and maintains clarity for potential extensions where n may not be strictly global.
  
  (Q23) Line 696 - here an equation reference is lost.
  
  Thank you for pointing this out. We have corrected the text and restored the missing equation reference in the revised manuscript.
  
  (Q24) Eqs (36) -(37) -- Since the variables r and x entered in Eq (36) are essentially the same as Eq (25), apart from a constant R/pi, the use of two different names complicated in a useless manner an already complicated expression, Please decide to use everywhere r or x and then proceed consequently this applies also to Eq (37). This will also allow us to rewrite the equation in x or r in a more compact form.
  
  As noted in our response to Reviewer 1, point 14, we have revised Eq. (37) to ensure consistency in notation by replacing x with r throughout.
  
  (Q25) Eq (37) - This equation is written in a manner that is not careful enough, apart from that the authors are passed now from (x,y) to (pi*r/R,V) , therefore they should substitute everywhere x with r. Furthermore, the equation for the derivative of V is confusing, the authors should use the same approximate expression employed in eq (36) that makes explicit the quadratic dependence on V itself, otherwise, I believe that the equation is incorrect.
  
  In the same response to Reviewer 1, point 14, we also clarified the expression for \dot{V} in Eq. (37), we reintroduced the full current-based formulation (as in Eq. 16), reversing the quadratic approximation used earlier. This is now explicitly stated in the text, and we have improved the equation presentation to avoid confusion.
  
  (Q26) Eq (37) below line 708 - From this expression, it is clear that the gate variable n and the potassium variables are ruled exactly by the same equations as for the single neuron Eq (1) and that the Lorentzian Ansatz enter only in the rewriting of the evolution of the membrane potentials of the neurons in the network. In the end, the authors are doing exactly the same approximation made by many other authors [R1-R7], that these variables are collective, i.e. they are the same for all neurons, and in particular n=n(V) is a function of the mean membrane potential V. The mean field model that the authors derive corresponds to a microscopic model where the single neurons are heterogenous only in the intrinsic currents $\eta_i$, but they are all driven by collective variables, like n(V) and the potassium variables that are identical for all neurons. This should be clarified.
  
  We agree with the conclusion by the reviewer, and as seen through the previous responses, we now explicitly acknowledge the fact that n and the two slow variables are considered as a mesoscopic variables for the mean-field derivation, while for the spiking network, n remains microscopic.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2021.10.29.466427v6
www.biorxiv.org www.biorxiv.org

Large scale prospective evaluation of co-folding across 557 Mac1-ligand complexes and three virtual screens

1
1. Public_Reviews 27 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  The authors conducted a comprehensive benchmarking and evaluation of co-folding platforms, including AlphaFold3, Boltz-2, Chai-1, and the docking algorithm Dock3.7, which employs a physics-based scoring function that incorporates van der Waals interactions, electrostatics, and ligand desolvation energies. The system of interest was the SARS-CoV-2 NSP3 macrodomain (Mac1), an increasingly popular antiviral target, and the ligand sets comprised 557 unseen ligand poses (keeping the training for these co-folding platforms in mind). Additionally, the authors investigated whether the co-folding models could distinguish true ligands from non-binding small molecules. The study is thorough, with extensive statistical support and consensus across multiple metrics (chemoinformatics for quantifying ligand similarity and efficacy). The questions that the authors aim to address are whether the co-folding models struggle with memorization, whether they can distinguish between a true and a false binder, whether they replicate experimental binding affinities and efficacy, and how they compare to the physics-based docking algorithm (Dock3.7).
  
  We thank Reviewer 1 for this thoughtful summary of our work.
  
  Strengths:
  
  Overall, this is a scientifically solid paper. The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment.
  
  Weaknesses:
  
  My main concern is that the study's aim is a bit unclear. Modern benchmarking studies comparing physics-based docking with deep learning-based co-folding approaches (e.g., AF3, Boltz-2, Chai-1, and others) are increasingly expected to go beyond aggregate performance metrics.
  
  Indeed, we have gone into several examples of failures and successes for each of these methods. As we are not developing these methods ourselves, we also think this dataset will be a valuable contribution for improving them further.
  
  In addition to rigorous dataset construction, transparent methodology, and appropriate statistical evaluation, high-impact benchmarks typically provide actionable guidance on when each method class is most appropriate, reflecting their distinct inductive biases and practical constraints. Failure-mode analyses that link performance differences to protein flexibility, ligand chemistry, or binding-site characteristics are particularly valuable, as they move comparisons beyond "scoreboard" assessments toward mechanistic understanding.
  
  Right now, we do not observe meaningful trends that separate the failure modes for any individual method. This is covered in Supplementary Figures 6 and 7.
  
  While full biological validation is not expected, qualitative interpretation grounded in physical and biological principles strengthens conclusions. Providing reproducible workflows or reference pipelines is not mandatory, but it is increasingly viewed as a best practice because it facilitates adoption and helps contextualize results for practitioners.
  
  We note that our code is available (https://github.com/jongbin99/Cofolding/) and all structural data will be publicly accessible in the PDB alongside publication (we only held it back only for “blinding” during peer review to avoid contamination with any new deep learning methods).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript by Kim et al. evaluates the performance of three modern AI-based methods in predicting complex structures and binding affinities between proteins and chemical compounds. An honest 'prospective' evaluation is achieved by studying benchmark structures and chemical compounds that did not exist in the PDB at the time the AI structure prediction models (AlphaFold3, Chai-1, Boltz-2) were trained.
  
  Strengths:
  
  (1) The study addresses an important question in modern computational biology and drug discovery, and establishes the strengths and limitations of the three tools in solving various computational chemistry tasks, including compound pose prediction, active-inactive discrimination, and potency ranking.
  
  (2) The conclusions are based on examination of four separate targets and respective compound datasets, where for one of the targets, the authors also obtained numerous X-ray structures to serve as experimental answers for the binding pose prediction task.
  
  (3) The study reports relationships between structure prediction confidence, predicted energies (DOCK3.7), and affinity predictions (Boltz-2) with the geometric accuracy of compound pose prediction as well as the experimentally measured potency.
  
  (4) One of the key findings is the limited ability of co-folding methods to predict conformational rearrangements, which does not correlate with their ability to predict binding poses of the compounds inducing these rearrangements.
  
  (5) The findings could serve as useful guidelines for computational chemists in selecting appropriate software and scoring schemes for each task.
  
  We appreciate Reviewer 2’s summary of the novelty of the dataset and analysis.
  
  Weaknesses:
  
  While I consider this a solid study, several aspects would need to be addressed to make it really strong:
  
  (1) DOCK3.7 docking and scoring experiments were performed using one experimental structure of Mac1, selected from dozens of structures based on a criterion that is not sufficiently well justified. For sigma2 receptor, dopamine D4 receptor, and AmpC β-lactamase, it is not clear which structures or models were selected for docking at all. It is well known that geometry predictions, scoring, and active-inactive ROC AUCs are all strongly influenced by the selected structure. It would be important to attempt Mac1 docking using all available experimental Mac1 structures, or at least against representative structures in various conformations; it would also be quite insightful to compare results to docking of the same compound sets to AF3, Boltz-2 and Chai-1 predicted structures of Mac1. Same goes for the docking studies of sigma2, D4, and AmpC β-lactamase.
  
  In any program, a decision has to be made as to which template will be used for docking, we justified the choice in the methods:
  
  “We used this structure because the inhibitor (Z5014193706) was the most potent molecule with a structure determined around the same time as the ligands in this dataset were tested.”
  
  We stand by this as a reasonable assumption. Similarly, for sigma2, D4, and AmpC β-lactamase, the template was chosen in the respective papers:
  
  a) The σ2 receptor bound to cholesterol (PDB ID: 7MFI) was used in the docking calculations.
  
  - This structure was determined in the paper, the first structure of sigma2 and therefore a worthy template
  
  b) The D4 receptor campaign used PDB 5WIU
  
  - This was one of two D4 structures available and chosen because it was not bound to sodium
  
  c) For AmpC, the campaign used the structure in the Protein Data Bank (PDB) 1L2S
  
  - This maximizes comparisons to other docking studies that used the same receptor template.
  
  The major goal of this study is to compare different methods under reasonable (but perhaps as the reviewer points out, not optimal) conditions, not to optimize docking score.
  
  (2) For binding affinity predictions, as a control, authors should consider compound co-folding with an unrelated protein, or even with a pseudo-peptide that consists of a few random single amino acids - this would provide an honest baseline for such predictions.
  
  This suggestion would be valuable for understanding the performance for these methods from the perspective of ligand specificity (a valuable, but separate, goal). Surely this will generate some number or some prediction - but what would this baseline mean and how would it be relevant for drug discovery? Therefore, we do not think this suggestion is relevant for the issues being investigated in this manuscript.
  
  (3) ROC curves Figure 3 and elsewhere should be shown, and AUCs quantified/reported on a log or square-root scaled x-axis, to emphasize early enrichment, which is the area of practical significance for these predictions. For example, Figure 3A currently suggests that the pose prediction performance of AF3 exceeds that of Boltz-2 whereas the early enrichment is clearly better for Boltz-2.
  
  We agree with this, and added a semi-logAUC plot for Figure 3A. For Figure 5, we also generated a semi-logAUC plot to see early ligand enrichment clearly, added as Supplementary Figure 11. We added the text:
  
  “Considering its early enrichment performance, Boltz-2 Ligand ipTM was the strongest predictor of pose accuracy based on normalized logAUC (20.5% above random, Fig. 3a). In contrast, although Boltz-2 pIC50 showed poor overall discrimination, it overestimated its ability to enrich true positive poses at low false positive rates, despite having a weak early enrichment behavior”
  
  (4) 'Trained set' in figures and text should probably be 'training set'? Or otherwise explain this new term the first time it is introduced.
  
  Thank you for pointing out this for clarification. ‘Training set’ is the correct word, and we made changes appropriately across all figures and texts.
  
  (5) Figure 1 illustrates a projection onto the first two principal components of a space that apparently had only one (scalar) metric for each compound pair (% maximum common substructure or Tanimoto coefficient); the authors need to better explain the principle behind this analysis and visualization.
  
  This suggestion is valuable, since we often use PCA to reduce dimensionality for more complex features. For clarification, we actually have a full pairwise similarity matrix for all tested Mac1 compounds based on each of Tc and MCS%. PCA for each MCS% and Tc is a representation of each pairwise similarity matrix. We also made a change in Figure 1 caption to make this point clearer:
  
  “projection of compounds represented by their full pairwise similarity vectors (by ECFP-4 Tc and MCS%)”
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This study's core conclusions are well-supported by data. It is shown that co-folding outperforms docking in known ligand pose/affinity prediction (validated by RMSD and IC₅₀ correlation), struggles with false-positive discrimination in virtual screens (lower AUC values), and is complementary to docking (non-correlated errors, distinct strengths in drug discovery stages).
  
  Strengths:
  
  (1) Unprecedented prospective design with 557 novel Mac1-ligand complexes ensures rigorous, independent evaluation of co-folding methods.
  
  (2) Comprehensive comparison of 3 co-folding tools (AlphaFold3, Chai-1, Boltz-2) with DOCK3.7 across diverse targets and metrics enables nuanced performance assessment.
  
  (3) The study clearly demonstrates complementary roles of co-folding (superior pose/affinity prediction for known ligands) and docking (better hit prioritization), and addresses deep learning memorization concerns via ligand similarity analysis.
  
  We thank Reviewer 3 for pointing out the unprecedented and comprehensive nature of our study
  
  Weaknesses:
  
  (1) Limited generalization to diverse protein families (e.g., no ion channels/transporters).
  
  We agree - we have not explored the entire proteome and these are important target classes that will surely be investigated by future studies. We focused on targets here where we had large number of X-ray crystal structures (Mac1) and affinity/inhibition measurements from docking (the other three targets).
  
  (2) Ambiguity in the mechanism underlying co-folding's failure to predict rare conformational changes.
  
  Again, we agree. We are not the developers of these methods. We observe that these methods do not predict conformational changes with high fidelity and this weakness is an area that co-folding methods will surely prioritize in the future.
  
  (3) Virtual screen comparison is unbalanced (docking-prioritized hit lists bias results).
  
  We acknowledge this in the results: “An important caveat is that the hit-lists were composed of molecules prioritized by docking in the first place, giving it an advantage on these particular sets.” and discussion: “Finally, comparing co-folding to docking based on hit-lists themselves selected by docking is arguably unfair to co-folding. Counter-balancing this is the inclusion, in each of the three hit lists, of molecules that had mediocre and poor docking scores intentionally selected to test the correlation between docking score and hit-rate. Here too, the correlation between co-folding score and likelihood to bind, what we sometimes call a “dock-response-curve” was no better than docking’s, often worse (SFig.11).”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Here are suggestions for revisions:
  
  (1) The writing is at times obtuse and hard to follow.
  
  This happens sometimes when multiple authors are writing together. We apologize and are happy to respond to specific areas that can be streamlined to be easier to follow.
  
  (2) In the Results section, "A set of 557 previously unreported Mac1 ligand complexes", the authors have compared the ligand poses across different metrics such as Tc - a standard, highly effective method in chemo-informatics and MCS (maximum common substructures); these are standard metrics for quantifying the structural similarity between pairs of small molecules. This part of the analysis checks whether this is memorization; it is critical to compare the two metrics, but it is not sufficient to draw a conclusion.
  
  Thank you for pointing out about the structural similarity of molecules co-folded to those present in the training set (resolved as Mac1 complexes and deposited in PDB before training dates). We have conducted an analysis where we do a pairwise similarity comparison for all ligands present in the PDB (regardless of the target), by both Tc and MCS, and overlay the cluster of ligands we tested (Mac1, AmpC, sigma2, D4). This should show where our tested benchmark datasets lie in the chemical space covered in the entire PDB. Each cluster (around 500 to 1300 compounds per target system) is overlaid on the cluster of all ligands deposited in PDB (over 50,000 compounds), and each cluster was relatively diverse by both Tc and MCS.
  
  (3) In the "Co folding can accurately reproduce poses of ligands dissimilar to those trained." Subsection under Results, the authors' conclusions are hard to follow; they state that the co-folding models often mispredict or miss the alternative conformation, but they also predict poses that are distinct from the training set. What does that imply?
  
  Our interpretation is actually a somewhat unsettling one: co-folding gets the ligand pose right even when it gets the protein wrong, and even when the ligand is novel. This suggests the models may be anchoring on conserved pharmacophoric interactions (like the adenosine-mimicking purine scaffold) rather than truly modeling the physics of the full complex. We added to the results section:
  
  This result suggests that co-folding reliably recapitulates dominant ligand-binding interactions even in the absence of accurate protein conformational modeling, providing further support to the idea that they are learning specific interaction patterns rather than a deeper physics-based representation (Masters et al. 2025).
  
  (4) The Discussion section connects the results and conclusions, but it can be challenging to grasp the study's overall message.
  
  We think the final paragraph hits on three major points:
  
  - Co-folding accurately predicts ligand poses for known binders, but fails to capture conformational changes
  
  - Co-folding does not reliably distinguish true binders from false positives in virtual screening hit lists
  
  - Docking and co-folding are complementary rather than competing tools
  
  (5) The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment. The value of the paper would be further enhanced by explaining how it differs from seemingly similar results reported in other studies, including the one cited in this manuscript (see https://www.biorxiv.org/content/10.64898/2025.12.04.692352v1).
  
  The Mac1 results are completely unique. However, the docking datasets are exactly the same as those analyzed in the Menon et al manuscript. We don’t think our results differs from conclusions of the Menon et al manuscript as we wrote: These observations are supported by a fascinating study on some of the same ligand sets as investigated here, using AlphaFold3, reaching similar conclusions (Menon et al. 2025).
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Expand target diversity to include ion channels, transporters, etc., beyond enzymes and GPCRs.
  
  (2) Investigate the cause of co-folding's failure in predicting rare conformational changes (e.g., adjust sampling, MSA inputs, or add experimental constraints).
  
  (3) Mitigate docking bias in virtual screens (e.g., re-analyze unbiased compound libraries).
  
  We addressed these three points in the public review above
  
  (4) Test Boltz-2's affinity predictions without linear calibration and compare with FEP.
  
  The data without linear calibration are included in the manuscript. Comparing such a large number of compounds with FEP is currently beyond our capabilities.
  
  (5) Conduct proof-of-concept to test co-folding-docking integration for better hit rates.
  
  We think this is well beyond the scope of this manuscript - but look forward to testing this idea in the future.
  
  We also got one community review that we respond to below:
  
  Summary
  
  This manuscript evaluates the performance of co-folding models when tasked with 1) the recapitulation of a large number of experimentally determined co-crystal structures of Mac1 with a series of Mac1 ligands and 2) the rescoring of hits to identify false positives originally derived from a set of large docking-based virtual screens. The evaluation leverages a dataset of crystal structures and affinity data from high-throughput crystallographic and biophysical screens, respectively. These data uniquely enable this report to focus on the ability of co-folding models to handle ligands, resulting in an analysis that is particularly timely given the wide adoption of co-folding models and the relative scarcity of such ligand-focused benchmarks among existing evaluations, which have primarily focused on protein structure prediction or binder design.
  
  Thank you for this thoughtful summary of our work
  
  Feedback
  
  The experiments and analyses in the manuscript are well thought-out and do not have any significant issues. There are a few high-level points that may improve the clarity and completeness of the results. Importantly, none of the suggested additional experiments will affect the conclusions of the paper, but rather help provide additional context for the results:
  
  The first section presents an exciting opportunity to frame the Mac1 ligands against ligands in the PDB more broadly. It would be informative to assess whether chemotypes that are easier or harder to predict accurately and confidently are over- or under-represented in the PDB as a whole. Note that this is not a recommendation that new scaffold similarity metrics be incorporated into the analysis, but rather that analyses similar to those already performed in the manuscript are performed using all ligands in the PDB. For example, PCA-based analyses similar to those in Fig. 1c could be used to examine Mac1 ligands in the context of all PDB ligands enabling questions such as whether similarity to a nearest PDB neighbor, cluster size in a Tc/MCS PCA space, or other frequency-based measures show any relationship with prediction vs. crystal structure RMSD. Such analyses could provide additional insight into how effectively models leverage ligand information present in the PDB overall, as opposed to biases arising specifically from scaffolds represented in Mac1 structures in the PDB, which are already well covered in the manuscript. The conclusion that Tc/MCS do not correlate with the ligand RMSDs for the ligands already associated with the Mac1 is well supported, and presumably suggests that a correlation would not exist against the backdrop of the PDB, but it would be interesting to see the data using analyses similar to those already done in the manuscript nonetheless.
  
  We are adding new figures in SFig.1 that consider how different clusters of ligands tested for our co-folding analysis are distributed across the chemical space in PDB. This is done by making a similarity comparison between every ligand in PDB and those tested in our analysis by Tc and MCS%, then plotting in PCA space for each metric. We are excited to see that each dataset covers a wide scope in PCA space, but at the same time, there are unexplored areas in the chemical space of PDB by co-folding.
  
  Similarly, even though the four proteins used in this manuscript are not themselves the primary focus of the analysis, it would be valuable to perform a high-level assessment of the precedent for each protein in the PDB (beyond the count of liganded structures in Table S6), either in protein sequence space (e.g., MSAs) or structural space (e.g., FoldSeek). An analysis like this would provide important context about whether any of the proteins in the study have close homologs with liganded structures in the PDB, or are generally overrepresented in the PDB. The fact that the AUC for L-pLDDT for AmpC is higher than σ2 and D4, for example, is notable given the relative abundance of liganded AmpC structures in the PDB (this raises potentially interesting questions related to where DOCK3.7 and AF3 actually place the ligands, given the orthosteric β-lactam binding pocket in AmpC, although this is outside of the scope of this manuscript).
  
  High-level assessment of the precedent for each protein in the PDB will definitely help to understand if proteins we used have close homologs with liganded structures in the PDB. Our Supplementary Table 6 covers the extent to which these liganded structures were available by cutoff dates for AF3, Chai-1 and Boltz-2. AmpC had more homologs than sigma2 and D4, and this may explain a better AUC for AF3 L-pLDDT specifically for this target.
  
  A discussion of the affinity probability results (`affinity_probability_binary`) from Boltz-2 is likely warranted in the second section in addition to the pIC50s that are already reported (`affinity_pred_value`). The former seems like it would be more applicable for section 2 of the manuscript, but both warrant inclusion—they should both be calculated by default when the affinity pipeline in Boltz-2 is turned on, so it wouldn't involve any more inference.
  
  As boltz-2 affinity module outputs both affinity probability binary output and affinity predicted value, we kept track of both metrics. So we tried re-ranking hit lists using both metrics. Where boltz-2 performed better (Sigma2, D4), binary probability values were more representative as a metric to differentiate true actives from non-binders. This was more clear in semi-logarithmic ROC plots. However, in AmpC, both Boltz-2 scoring metrics performed similarly. Such inconsistency in trend made it difficult to draw conclusions.
  
  Minor points
  
  A more detailed description of the experimental methods used to generate the ground-truth data in the introduction (even though these have been explained in prior works) would help orient the reader early on, and ground the benchmarking aspect of the story. In general, the abstract and introduction would benefit from a more cohesive through-line to tie the two complementary but orthogonal sections of the paper together.
  
  We will include a more thorough description alongside the PDB depositions. As for the two sections, we have tried to tie them together from the perspective of drug discovery workflows…
  
  The cutoffs in the "Co-folding can accurately reproduce..." section shift between 2.5 Å (from the ligand center of mass) and 2.0 Å. Is there a reason for this? Along similar lines, mentioning cutoffs for true positives/negatives when introducing the ROC analyses later on in the Mac1 section seems unnecessary since no cutoff should be necessary here.
  
  We used 2.5A distance to COM to just get at “broadly the correct binding site” for fast filtering and 2.0A RMSD because that is the broadly accepted standard in the field for “relatively correct binding pose”.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2025.12.25.696505v3
www.biorxiv.org www.biorxiv.org

Modelling multicellular coordination by bridging cell-cell communication and intracellular regulation through multilayer networks

1
1. EMBOpress 27 May 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Manuscript number: RC-2026-03407
  
  Corresponding author(s): Laura Cantini, Julio Saez-Rodriguez
  
  [The "revision plan" should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.
  
  *
  
  The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.
  
  *
  
  If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]
  
  1. General Statements [optional]
  
  This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.
  
  We thank both reviewers for their thorough and constructive evaluation of our manuscript.
  
  Reviewer 1 highlighted that the manuscript would benefit from 1) a stronger positioning of ReCoN within the existing literature on multicellular modelling and network exploration, 2) a justification of our methodological choices, including the use of Random Walk with Restart (RWR), 3) the choice of input datasets for GRN inference and an assessment of the robustness of ReCoN's predictions to noise in these networks, 4) a more systematic exploration of ReCoN's parameter space (restart probability, layer transition probabilities, filtering thresholds).
  
  Reviewer 2 raised concerns about 1) the generalisability of the α parameter value (by default, 0.8) across independent datasets, 2) the expected contribution of the indirect effect in prediction performances, 3) the robustness of GRN across datasets and systems, and 4) the need for more quantitative validation in the spatial/microenvironment showcase. They also pointed out an unsupported claim regarding gene knockout prediction in the abstract.
  
  Several clarifications on figures, methods, and writing were also requested by both reviewers.
  
  As the main addition to the manuscript, we propose a new showcase based on the recently published Human Cytokine Dictionary (Oesinghaus et al., 2025). This showcase will simultaneously address several reviewer concerns by allowing us to 1) test the robustness and performance of α = 0.8 in an independent dataset, 2) evaluate the impact of different GRN inference methods (HuMMuS, SCENIC+, CellOracle, GRNBoost2) and noise on ReCoN's predictions..
  
  We will conduct a systematic parameter exploration on the Heart Atlas showcase, covering restart probability and inter-layer transition probabilities. We will additionally strengthen the validation of the microenvironment showcase by providing additional comparison to matched single-cell fibroblast data.
  
  Regarding the manuscript, we will substantially expand the discussion to better contextualise ReCoN within existing multicellular modelling approaches and the methods to justify our methodological choices (RWR/MultiXrank, dataset selection). We will remove the unsupported gene knockout claim from the abstract and reframe it as a future direction. In addition, we will clarify the distinction between ReCoN variants and rename them for clarity in the results section 1.2., improve figure legends. Finally, we will also work on the tool's documentation, including new tutorials on using spatial data and on running ReCoN with scRNA-seq-only GRN inference.
  
  We believe these revisions will substantially strengthen the manuscript and address the reviewers' concerns regarding method's robustness, generalisation, and contextualisation.
  
  2. Description of the planned revisions
  
  Reviewers' comments are in blue
  
  Authors' answers are in black
  
  Proposed text modifications are in green
  
  Reviewer #1
  
  R1.1. This is a very well-written paper; the methods used are adequate, and the use cases are relevant and broad, exploiting state-of-the-art datasets and tools.
  
  The author's claims are mostly justified. The authors could make an effort to more explicitly cite other efforts in similar directions. The claim 'We envision ReCoN as an extension to prior multicellular modelling, offering an interesting compromise between prediction of cell type responses and understanding of their molecular coordination.' is very general and could be better substantiated. In fact, the authors do not really give examples of alternative approaches to study systems of interacting cells, other than mechanistic agent-based models, which are clearly very different.
  
  Response:
  
  We thank the reviewer for pointing out the lack of contextualisation for ReCoN in this closing discussion.
  
  We wanted to remind that ReCoN builds notably on multicellular factor decomposition methods. We also want to emphasise the interest in completing cell communication methods that describe the big picture in multicellular interactions.
  
  *
  
  We proposed to *explicitly state these two points with such rephrasing: *
  
  *
  
  Network-based representations of multicellular systems have been an active field for many years, from early conceptual cytokine networks (Frankenstein, Alon, and Cohen 2006) to curated ligand-receptor cascades of hematopoietic tissue (Kirouac et al. 2010, Qiao et al. 2014). In parallel, and from bulk RNA-seq, the consideration of tissue specificities in GRN inference has been another way to consider the importance of the context in molecular mechanisms reconstruction (Sonawane et al. 2017). Single-cell analysis allowed decomposing tissue composition and quantifying gene expression, opening the possibility of scaling the inference of these networks and the inference of multicellular mechanisms in general, to large sets of molecules. Several methods have been developed to recover multicellularity. A first direction extends ligand-receptor interaction inference into the receiver cell response through curated signalling cascades, yielding ligand to target cascades (Browaeys, Saelens, and Saeys 2020, Jin et al. 2021, Zhang et al. 2021, Yan et al. 2025). A second direction leverages spatial context through explainable multi-view models that decompose marker variation in both intra- and intercellular contributions (Arnol et al. 2019, Tanevski et al. 2022), without considering the mediating cascades. Finally, the more recent family of multicellular factor decomposition methods focuses on the coordinated aspect of cellular programs rather than on the mechanisms. ReCoN's methodology proposes a network-based approach based on single-cell data and the philosophy of this last group of methods. Indeed, ReCoN aims to retrieve links between molecular drivers and such coordinated multicellular programs by bridging and exploring CCC inference and GRN modelling (Badia-i-Mompel et al. 2023) within large and coherent heterogeneous multilayer network.
  
  Arnol D, Schapiro D, Bodenmiller B et al. Modeling Cell-Cell Interactions from Spatial Molecular Data with Spatial Variance Component Analysis. Cell Rep 2019;29(1):202-211.e6. https://doi.org/10.1016/j.celrep.2019.08.077.
  
  Badia-i-Mompel P, Casals-Franch R, Wessels L et al. Comparison and evaluation of methods to infer gene regulatory networks from multimodal single-cell data. Preprint, bioRxiv, 21 Dec. 2024, 2024.12.20.629764. https://doi.org/10.1101/2024.12.20.629764.
  
  Badia-i-Mompel P, Wessels L, Müller-Dott S et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023;24(11):739-54. https://doi.org/10.1038/s41576-023-00618-5.
  
  Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 2020;17(2):159-62. https://doi.org/10.1038/s41592-019-0667-5.
  
  Frankenstein Z, Alon U, Cohen IR. The immune-body cytokine network defines a social architecture of cell interactions. Biol Direct 2006;1(1):32. https://doi.org/10.1186/1745-6150-1-32.
  
  Jin S, Guerrero-Juarez CF, Zhang L et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12(1):1088. https://doi.org/10.1038/s41467-021-21246-9.
  
  Kirouac DC, Ito C, Csaszar E et al. Dynamic interaction networks in a hierarchically organized tissue. Mol Syst Biol 2010;6(1):MSB201071. https://doi.org/10.1038/msb.2010.71.
  
  Oesinghaus L, Becker S, Vornholz L et al. A single-cell cytokine dictionary of human peripheral blood. Preprint, bioRxiv, 15 Dec. 2025, 2025.12.12.693897. https://doi.org/10.64898/2025.12.12.693897.
  
  Qiao W, Wang W, Laurenti E et al. Intercellular network structure and regulatory motifs in the human hematopoietic system. Mol Syst Biol 2014;10(7):MSB145141. https://doi.org/10.15252/msb.20145141.
  
  Radig J, Droit R, Doncevic D et al. Tracking biological hallucinations in single-cell perturbation predictions using scArchon, a comprehensive benchmarking platform. Preprint, bioRxiv, 27 June 2025, 2025.06.23.661046. https://doi.org/10.1101/2025.06.23.661046.
  
  Sonawane AR, Platig J, Fagny M et al. Understanding Tissue-Specific Gene Regulation. Cell Rep 2017;21(4):1077-88. https://doi.org/10.1016/j.celrep.2017.10.001.
  
  Tanevski J, Flores ROR, Gabor A et al. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol 2022;23(1):97. https://doi.org/10.1186/s13059-022-02663-5.
  
  Yan L, Cheng J, Nie Q et al. Dissecting multilayer cell-cell communications with signaling feedback loops from spatial transcriptomics data. Genome Res published online 12 May 2025. https://doi.org/10.1101/gr.279857.124.
  
  Zhang Y, Liu T, Hu X et al. CellCall: integrating paired ligand-receptor and transcription factor activities for cell-cell communication. Nucleic Acids Res 2021;49(15):8520-34. https://doi.org/10.1093/nar/gkab638.
  
  R1.2. Moreover, the exploration of the multilayer networks with RWR is a very reasonable choice but could there be other approaches? I think the authors could discuss this issue to briefly support their choice of this method.
  
  Response:
  
  It is a very relevant comment, as this choice has not been discussed in the paper; we propose extending the method section about ReCoN's networks exploration with a justification about this choice.
  
  *
  
  There is currently a limited set of network exploration methods that have been implemented for multilayer networks. It includes notably pymnet (Nurmi et al., 2024), natively adapted to heterogenous multilayer networks, and multinet (Bagavathi et al., 2019) and muxviz (De Domenico et al., 2015), initially developed for multiplexed networks (e.g. social network where the same set of nodes is present in each layer) but adaptable to more complex multilayer networks. However, to our knowledge, only MultiXrank proposes a robust measurement of proximity between each pair of nodes.
  
  Indeed, pymnet does not propose implementation for pairwise distance, similarly for muxViz, which focuses on community and motif detection. Multi-net does propose pairwise distance based on shortest paths, but implements it only for nodes of the same multiplex (e.g. in our network, it would only be two genes, or two receptors, respectively). https://www.rdocumentation.org/packages/multinet/versions/4.3.2/topics/multinet.distance
  
  *
  
  We provide the additional justification for choosing RWR and MultiXrank over a reimplementation of another method or an extension of another method.
  
  *
  
  The total complexity of the RWR is O(δm) - when the number of nodes is negligible compared to the number of edges, with m the number of edges and δ the number of iterations in the walk (Baptista et al., 2022 - Supp Notes 2.A; Jin W. et al, 2019). This linear increase with the number of edges is particularly interesting for large networks, such as ReCoN ones that can contain several million* edges. The number of iteration δ and the computational time increases inversely to the restart probability, which is an important factor to keep this probability high. *
  
  *
  
  *MultiXrank is particularly interesting for its flexibility as it allows to easily attribute different weights to the different layers and to precise the direction of the exploration easily. *
  
  *
  
  It also produces deterministic results by prolonging exploration until convergence.
  
  *
  
  Additionally, in the context of ReCoN, the indirect effect of each cell is run independently. We previously extended the implementation of multiXrank for running RWR in parallel in a previous work (Trimbour et al., 2024), making it already adapted for optimising ReCoN's explorations.
  
  *
  
  For all these reasons MultiXRank implementation seemed to be the best choice for robust and efficient exploration of ReCoN's HMLN.
  
  *
  
  Bagavathi, A., Krishnan, S. (2019). Multi-Net: A Scalable Multiplex Network Embedding Framework. In: Aiello, L., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L. (eds) Complex Networks and Their Applications VII. COMPLEX NETWORKS 2018. Studies in Computational Intelligence, vol 813. Springer, Cham. https://doi.org/10.1007/978-3-030-05414-4_10
  
  Manlio De Domenico, Mason A. Porter, Alex Arenas, MuxViz: a tool for multilayer analysis and visualization of networks, Journal of Complex Networks, Volume 3, Issue 2, June 2015, Pages 159-176, https://doi.org/10.1093/comnet/cnu038
  
  Nurmi et al., (2024). pymnet: A Python Library for Multilayer Networks. Journal of Open Source Software, 9(99), 6930, https://doi.org/10.21105/joss.06930
  
  Jin, Woojeong, Jinhong Jung, and U. Kang. "Supervised and extended restart in random walks for ranking and link prediction in networks." PloS one 14.3 (2019): e0213857
  
  R1.3. Generally the discussion should provide the reader the context in the existing literature in which the work can be set, detailing its impact. I think this could be improved.
  
  Response:
  
  *
  
  We hope that the correction on the context proposed for comment R1.1 offers a first clarification on the context in the literature.
  
  *
  
  We also propose to extend the description of ReCoN's impact with the following sentences in the discussion: "Unlike purely data-driven approaches, ReCoN contextualizes prior knowledge balancing both robustness through literature data, and specificity through new measurements. This mechanistic approach opens new possibilities for understanding how cellular coordination shapes tissue-level responses and for designing targeted molecular interventions."
  
  *
  
  R1.4. Regarding the choice of datasets, it is clear that the method is quite demanding, requiring single cell and different omics to build the model, in addition to the expression dataset that is used as a use case. This inevitably leads to using a mix of datasets.
  
  For example in the mouse experiments the gene regulatory network was inferred from both a lymph node scRNA-seq dataset and a splenic scATAC-seq dataset, presumably due to the lack of multiome data in this setting. However the cell-cell communication network was inferred from the control case of the Immune Dictionary. Why can't the authors use the control data also for inferring GRNs?
  
  Is atac-seq really necessary in the inference of the GRN? What is the impact of the fact that lymph node and spleen samples might be different?
  
  :
  
  *
  
  Is it a very *interesting comment, and we propose to add both 1) an explanation about our dataset choice to generate the GRN as a Supplementary text, and 2) a new experiment about the effect of GRNs built from multi-omics and scRNA-seq alone. *
  
  *
  
  Dataset choice
  
  *
  
  We decided to infer a GRN using multiomics data, as these methods seem to perform better and are becoming the state of the art (Badia-i-Mompel et al. 2023, Trimbour, Deutschmann, and Cantini 2024, Yuan and Duren 2025).
  
  As scATAC-seq data was not produced for the Mouse Immune dictionary, we tried to find an external dataset, used HuMMuS, the method we previously developed, as it is also based on RWR and performs well on unpaired data.
  
  *
  
  scATAC-seq
  
  Our first criteria was to match the mouse model used in the immune dictionary dataset, which reduced importantly the number of multicellular immune cell datasets available. We extended our research to a splenic dataset, as spleen is itself classified as a high specialised lymphatic structure, (check) and contains notably the same cell types than classical lymph nodes.
  
  *
  
  scRNA-seq
  
  While we could technically use the control mice of the Immune Dictionary single-cel RNA-seq data with the spleen scATAC-seq data, the Immune Dictionary only provides 100 or less cells for each cell types per stimulation, which would results in a low number of cells. As GRN quality seems to depend a lot on the number of cell used, we favoured choosing a larger dataset.
  
  *
  
  Our choice to use single-cell multiomics methods was driven by the novelty of these methods over scRNA-seq based ones, the performance improvement that they seemed to offer in several benchmarkings, and the will of developing a pipeline integrating the most complete data available for contextualization (Badia-i-Mompel et al. 2024).
  
  *
  
  GRN impact over the Human Immune Dictionary
  
  *
  
  While it does not relate directly to this showcase, we will also add a new dataset analysis, detailed in the the comment R1.12. In the Human Cytokine Dictionary showcase,, we propose exploring the effect of choosing different GRNs, built from external multi-omics data or from the control scRNA-seq data of the dataset itself. We hope it can partially help users to decide in general wether to use external datasets of higher quality or sample-specific datasets.
  
  *
  
  Finally, we propose to add in the documentation of the tool, a section showing how to use ReCoN with only scRNA-seq for the GRN inference, and the performance of different GRNs for the Human Cytokine Dictionary dataset directly in the paper.
  
  *
  
  R1.5. The code is very clear, we were able to install and run it and it is quite well-documented. However, a few more details should be given in the text regarding how the evaluation of the performance is carried out.
  
  For example: If I understand correctly, when predicting the impact of cytokine perturbations the ReCoN predictions of genes impacted are compared to differentially expressed genes identified through traditional DEG analysis. What is compared is the ranking of these genes from ReCoN with the ranking provided by DEseq2. There is no description of how this comparison of ranking gives rise to AUROC values. Also, is it just the ranking that is predicted or can they also estimate how well they can predict the effect size?
  
  Response:
  
  *
  
  We are thankful for pointing out the unclear technical details. DEG results were binarised, to obtain the list of differentially genes using the thresholds indicated in the section 4.4.4. We considered a gene as perturbed in each cytokine treatment if the comparison of control and treated cells had a t-test p-value below 0.1 and if the log-fold change was above 1.
  
  *
  
  The second, and more general point of the reviewers, ReCoN scores should be considered to provide ranking on the possible regulations, but cannot be considered proportional to the effect size. As they are represent a likelihood more than a score, the binarisation should be the most appropriate transformation for the validation
  
  *
  
  *Moreover, as the scores can be seen as the probability to end up the exploration on each node, they are always summing to one. This also prevents interpreting the scores as the amplitude of change. As an illustration example: if a receptor regulates three genes identically, they would (hopefully) all be having a score of (1 - R)/3, R being the restart probability in ReCoN, whether their expression doubles or is multiplied by 10. *
  
  *
  
  While it can legitimately be seen as a downside, we believe it is similar in practice to most methods inferring GRN methods in practice, where trying to predict the true amplitude of gene perturbations usually results in very low performances (Badia-i-Mompel et al. 2024).
  
  *
  
  We propose changes related to this comment.
  
  *
  
  We would modify the section 4.4.4. of the method with the following paragraph to explicit that it consists in a binary selection: "For each cytokine-cell type pair, differentially expressed genes were binarised: genes passing the significance thresholds (FDR P-val 1) were labelled as positives, and all remaining genes as negatives. ReCoN scores were then used to rank all genes, and AUROC values were computed from this ranking against the binary labels."
  
  *
  
  We will also include a section "ReCoN scores interpretation" on the documentation website, as score interpretation precisions will be particularly useful for users.
  
  R1.6. When describing the use cases, I think a bit more detail would help.
  
  For example 'To identify the cell-type-specific genes associated with HF, we used the MOFAcell scores of the multicellular factor 1 (MCP1) reported in ReHeat236' I supposed the explanation is on the dataset but for the sake of clarity it would be good to expand this sentence to give at least an idea of the approach.
  
  Response:
  
  *
  
  We completely agree that more explanations should be provided, to avoid for the reader having to switching between articles to understand the concepts behind this showcase. As suggested by the reviewer, we propose a general description of the approach with the short paragraph, and to remove the term "loading":
  
  *
  
  "In the ReHeat2 study, the first multicellular factor (MCP1) was associated with heart failure. We used the gene loadings of MCP1 as a proxy for the cell-type-specific transcriptomic changes associated with heart failure, ranking genes by their absolute loading values."
  
  *
  
  We also propose to complete the method section: "MOFAcell is a multicellular factor analysis method that decomposes multi-sample single-cell data into latent factors representing coordinated gene expression patterns across cell types. Each factor is characterised by cell-type-specific gene scores, reflecting their individual contribution to the coordinated program. In this showcase, we use the first multicellular program (MCP1), as it was associated with heart failure"
  
  R1.7. Regarding the calculation of the R matrix from the NichNet matrices L and G, I gather that the R matrix is calculated once and is thus fully data-independent and available just like the L and G matrices from NichNet. This was not very clear in the tutorials.
  
  Response:
  
  *
  
  We are very thankful for the reviewers' involvement in testing the tools itself and its documentation. First, we propose a new website page explaining the pre-computed resources available for receptor - gene links, and added a descriptive paragraph in the tutorial themselves.
  
  *Second, we notice a typo in the equation, where it should actually be L = R * G with the current definition. We corrected it in the next version, and precised that R is fully data independent and solely inferred from prior knowledge. *
  
  R1.8. Also, this might just be a typo in the tutorial: 'The default α = 0.8 gives more weight to direct effects, which has been empirically validated. You can adjust this based on your biological question." I believe the manuscript says alpha>0.5 refers to indirect effects dominating.
  
  Response:
  
  *
  
  We corrected the saying in the tutorials. Indeed, a high alpha represents a stronger indirect effect. Additionally, a similar typo was in the first equation of the paper, we are correcting it too.
  
  R1.9. Same for the pre-processing of the spatial data for the third use case, a little more details on how this was done would help the users and readers.
  
  Response:
  
  *
  
  We propose adding a specific section about the spatial pre-processing and analysis in the methods.
  
  We are also adding a tutorial on spatial data. Since spatial data processing is computationally intensive without GPUs, we will also provide the data already processed, in order to allow anyone to test this tutorial too.
  
  *
  
  R1.10. I don't see issues with the statistical power of the analysis.
  
  Rather, I think the authors should provide some examination of the parameter space for their model. Whereas ana analysis of the impact of the Alpha parameter is provided, I believe there are several more parameters that have a crucial impact and choices for their values should be discussed.
  
  For example 'In the GRN reconstruction only the links with a score above 1.5e-7 were retained in ReCoN's gene regulatory layer. How was this chosen?
  
  We have identified the following parameters that are somehow justified but could be explored to have a better feel for how they impact the results
  
  Restart probability: How often the walker goes back to the starting seed/molecule
  
  Layer transition probability: How often the walker stays in the same layer - different cell? - different layers? Gamma
  
  Node transition within a layer: How often one jumps to a different layer
  
  Response:
  
  This is a very valid point raised by the reviewer about parameters explorations.
  
  *
  
  We focused on exploring the alpha (direct/indirect effect) parameter, as its value was the incertitude when designing the model.
  
  *
  
  We would like to address this comment by adding new explorations for the restart probability and the transition probability between layers. The probability to transition between specific nodes inside a layer directly depends itself on 1) the restart probability, 2) the transition probabilities, and 3) the weights of the edges, that are determined before and independently to ReCoN's exploration.
  
  *
  
  The Heart Atlas showcase allows to evaluate each set of parameters in around 10 min instead of 10h for the Immune Dictionary. We thus propose to evaluate restart probability and layer transition probabilities on the data of this showcase.
  
  *
  
  We would explore the restart probability of 0.1 * N, with N between 1 and 9.
  
  *
  
  For transitions probabilities we propose varying GRN, receptor, and cell communication importance with the following configurations: - Staying in CCC probabilities (- not jumping to receptor layer) among (0.1, 0.3, 0.5, 0.7, 0.9), staying in receptor layer (- not jumping to GRN) of (0.25, 0.5, 0.75), staying in GRN layer (- not jumping to CCC) of (0.25, 0.5, 0.75). It would result in 9 intracellular variations combined with 5 intercellular variations.
  
  *
  
  We envision an evaluation by measuring the correlation between the results of the different configurations, and the time before convergence of the results, as it could potentially increase drastically when decreasing the restart probability. If correlations below 0.9 are observed between some results, we will compare their absolute performances.
  
  *
  
  We would include the figures related to these explorations in the supplementary data. We would highlight the main findings in the method section dedicated to the random walk with restart. Finally, we would briefly describe the parameter exploration design in the first section of the results, for curious readers who would like to verify parameter choice before reading the showcases.
  
  *
  
  R1.11. Weighting parameters: How much weight for direct or indirect effect to account for the combined effect - alpha - this is the only one that is explicitly explored.
  
  Response:
  
  We are very thankful for this comment, and we decided to modify our tutorial guidelines to make this choice more intuitive and general.
  
  *
  
  Indeed, 1.5e-7 would hardly make sense for most methods, which would not produce such low scores. We now propose to select the first 2 million connections of GRNs, in order to keep a complete or a large portion of the network if other methods than HuMMuS are applied.
  
  *
  
  In our case, 1.5e-7 was empirically determined from the distribution of HuMMuS scores, to keep the 2 million top connections as HuMMuS networks are generally almost fully connected, which is a particularity for classical GRN inference methods, and keeping it entirely would make exploration time much longer.
  
  *
  
  R1.12. Finally, this might be considered OPTIONAL but would greatly improve the work in our opinion:
  
  The method crucially depends on the networks that are used in the different layers and to connect layers and cell types. As we know, biological data is noisy and incomplete (FP and FN) at each level and in each datatype. It would be really useful to estimate what is the robustness of the results to this noise. Particularly, from personal experience, we think the GRNs reconstructed from data are often almost fully connected and it is exceedingly difficult to validate them in specific contexts. This means that some 'errors' are likely to be present.
  
  Since several methods exist for inferring GRNs one could simply compare the results using different methods for this part of the network.
  
  A related point involves the characteristics of the RWR algorithm, that will be quite impacted by the presence of hubs in these networks (either in single layers or across several) that is likely to impact the exploration. If proteins that are hub are effectively important, that is not a problem, but in some layers, for example, the receptor-receptor layer that presumably will contain PPIs, there might be biases in hubs being just better studied proteins, and these hubs might have an 'unjustified' weight in the walks.
  
  One potential approach to assess the robustness of the method to these issues could be an empirical one that just randomly perturbs the networks in ReCoN to see to what extent similar predictions are achieved.
  
  *Response: *
  
  *
  
  We are thankful for this relevant comment on GRN and prediction stability, and would like to take it as an opportunity to support the hypothesis that different GRN methods can be used in ReCoN.
  
  *
  
  When developing our previous HMLN-based tool, HuMMuS (Trimbour et al. 2024 - Supp Figure 6), we observed that its multilayer structure provided more robust results than individual layers. We would like to reproduce such an analysis, verifying that ReCoN results have less variability than the GRN layers individually.
  
  We propose to integrate a new showcase on the Human Cytokine Dictionary (Oesinghaus et al. 2025), trying to predict cytokine downstream effects similarly to the Mouse Immune Dictionary showcase.
  
  This showcase would be useful to confirm the contribution of the indirect effect and test the impact of different GRN on the results.
  
  We would generate different GRN with several other GRNs methods: SCENIC+, CellOracle, and GRNBoost2 - the latest using only the scRNA-seq of the control samples in the Human Cytokine Dictionary.
  
  *
  
  The GRN methods produce generally output with very low overlap (Badia-i-Mompel et al. 2024)*. *
  
  *If we observe high correlations between the ReCoN predictions associated with the different GRNS, it would provide already a validation of ReCoN's robustness to GRN noise. *
  
  If lower correlations between ReCoN's predictions are obtained, we will add a specific permutation experience over the HuMMuS GRN, creating different level of artificial noise and assessing more precisely the robustness of ReCoN to GRN stochasticity.
  
  *
  
  Regarding PPI hub justification, our *applications did not use receptor PPI and are not affected by bias at this level in the showcases. This bias could specifically be present in the receptor-gene links, as we derive it from the ligand-gene connections of Nichenet which was itself partially based on prior knowledge. It is thus possible that some receptor are reached more often due to this bias and not a stronger effect. It seems however, hard to control in this context, as ReCoN currently relies on this prior knowledge. Currently, we hope that the combination of personalised, literature-agnostic GRN with literature-based receptor - gene can provide an interesting trade-off. In future development, we could imagine a receptor-gene network based solely on perturbations, but it would require controlling also the bias of ligand - receptor binding couples, which limits even the use of ligand-based experience. *
  
  We propose adding a short point in the discussion about hub effects from RWR-based methods.
  
  *
  
  R1.13. Please add page numbers.
  
  *Response: *
  
  *
  
  We will add the page numbers.
  
  *
  
  R1.14. Figures are nice and clear.
  
  Some specific minor points are listed here below.
  
  Define hMLN on first appearance fig1 caption (no page numbers..
  
  2nd appearance heterogeneous multilayer structure (HMLN) ...
  
  Response:
  
  *
  
  We updated the legend of the figure to include the definition of the acronym, as it arrives before first text occurrence. (Or define at both positions ?)
  
  R1.15. Bi_j not so clear to what it refers when first mentioned
  
  Response:
  
  *
  
  *Bi_j represents a weight that can be attributed to favour some cell-to-cell transitions. It is usually not necessary to use them.
  
  *
  
  *It is of interest notably to model 1) known spatial patterns in situ and hypothesis/design where cell types favour some connections. *
  
  *
  
  E.g.: for modelling the skin, a user might notably want to increase connections between epidermic and dermic cells, and between dermic and hypodermic cells.
  
  *
  
  We propose a new explanation of Bi_j to both explain it's meaning in the modelling, and illustrates situations for using it: "The coefficient B_{i,j} modulates the influence of cell type i on cell type j in the indirect effect computation. By default, all B_{i,j} are set to one, weighting each cell type's contribution equally per cell. However, it can be adjusted to encode additional biological knowledge, such as spatial proximity between cell types or known cooperation patterns. For instance, when modelling the skin, a user might increase B_{i,j} between epidermal and dermal cells, and between dermal and hypodermal cells, to reflect their spatial organisation."
  
  R1.16. personalized interaction specificity. - maybe better word than personalised (contextualised?)
  
  Response:
  
  *
  
  We agree that contextualised explicits better the meaning behind this model. Personalised might notably lead to expect patient-specific data, which is not the case here.
  
  *
  
  We propose to rephrase all the model names to : Receptor-matrix, ReCoN-no-CCC, ReCoN-no-context, ReCoN-complete.
  
  R1.17. ReCoN-genetic and ReCoN, ( generic?)
  
  Response:
  
  *
  
  We will correct this typo.
  
  R1.18. responses. It is expected to observe common behaviors in-between cell-type, that the GRN
  
  and the generic CCC network already contribute captures.
  
  not very clear
  
  Response:
  
  *
  
  We aimed here to provide an explanation to the already good performance of the "ReCoN-no-context" (or its name updated according to comment R1.16), which could be surprising as no cell-type specific information is used. The explanation proposed is the good prediction of several properties shared by all immune cell types, such as similar metabolic pathways, despite their specific roles. If we adopt a quantitative view on their transcriptome like in this showcase, it can be expected that the cell type responses are relatively well predicted through the common properties only.
  
  *
  
  As this is a very relevant comment, and that several comments pre-submission we received were also related to this result, we would like to keep an explanatory sentence.
  
  *
  
  R1.19. Figure 2b the icon of cells with double arrows might suggest phenotype shift when instead this is just communication
  
  Response:
  
  (left side) We are very thankful for paying attention to the details of the paper and fully agree with this analysis. We propose to represent ligand emission instead of arrows, reusing the convention of the Figure 1.
  
  R1.20. eTACs explain acronym and what they are
  
  Response:
  
  *
  
  We update the first occurrence of eTACS to extrathymic Aire-expressing cells (eTACS).
  
  R1.21. Due to very few genes being differentially
  
  expressed, only cDC1 was conserved and evaluated for IL22,
  
  Not so clear
  
  Response:
  
  *
  
  As we are commenting on IL22 stimulation results, we reorganised the sentence to make it less convoluted: "For IL22 stimulation, only cDC1 presented enough genes being differentially expressed."
  
  R1.22. In this showcase (not very clear, use case?)
  
  Response:
  
  *
  
  We perceive "use case" as describing a type of use for the method, while a show case is a specific example of a use case. We thus find showcase more appropriate here. We will however go over all use of the word, to be sure it is only used for the precise examples we provided, and not to describe "use cases".
  
  R1.23. different fibroblast specializations - maybe phenotypes?
  
  Response:
  
  *
  
  It is a very good suggestion, as specialisation would involve functional aspects (that we can't really be sure of), and a chronological evolution*
  
  Phenotype generally includes numerous properties, such as morphology, that we cannot validate here. We think the use of phenotype might be stronger than specialisation here. To simplify, phenotype can work, to be more precise: transcriptomic specialisation? I am honestly not sure of the best change here.
  
  R1.24. Figure 4b
  
  b) Schematic view of the deconvolution process and cell type-specific count inference from the spatial niches.
  
  Not so clear what the heatmap shows, rows and columns
  
  Spots heatmap : label niche on rectangles in cols
  
  And each col is a spot
  
  Rows are cell types or cells?
  
  In the cell types x spot
  
  Response:
  
  This figure can indeed benefit strongly from legend modifications. On both matrix, lines represent the genes, while columns represent the spot / individual cells deconvoluted per spots
  
  *
  
  We would annotate the niche legend (here the colour surroundings) by a symbolic drawing instead of writing it on the matrix
  
  *
  
  Legend "genes" on the first matrix
  
  *
  
  Write deconvolution ON the figure directly
  
  R1.25. Cell2location. Add reference, maybe explain basic functionality?
  
  Response:
  
  *
  
  Cell2location was not referenced in the results section, and was only referenced in the section 4.6.2 of the methods, as the 72th citation. We corrected this oversight, and propose 1) a brief explanation of deconvolution right before, 2) a brief explanation of Cell2location particularity in inferring individual cell profiles - which is not common in spatial deconvolution.
  
  R1.26. reconstructing different patients, tissues, and microenvironments to predict
  
  context-specific molecular treatments.
  
  Unclear
  
  fibrosis in different - at
  
  molecular levels
  
  Response:
  
  *
  
  We will modify this section title according to the reviewer's citation and the different reformulation.
  
  R1.27. Figure 5d myeloid and endothelial colour code inversed from 5 BC
  
  Response:
  
  *
  
  The legends are individually correct, but there is no reason to not make them coherent across panels. We will update the legend of the panel 5.d..
  
  *
  
  R1.28. 5d indicate important pathways in organe should not change the colour of the nodes (purple=common, blue or green specific). Use border colour maybe?
  
  Response:
  
  *
  
  We had forgotten to precise the colour code of this panel, where the choice of orange highlighted here the gene set related to molecular pathways instead of functional annotations. As the name already explicits pathway, we now think that the orange background is redundant informations and may create some confusion. We thus would like to update Wnt and TNFA pathways backgrounds to ___ (more enriched in cell type), and purple (significantly enriched in all cell types).
  
  R1.29. 5e is not a venn diagram
  
  e) Venn diagram showing the overlap between transcription factors (TFs) predicted by ReCoN (green) and those previously
  
  implicated in fibrosis (orange) or cardiac diseases (violet). Only the top 10 TFs were annotated from literature
  
  sources; full sizes of fibrosis- and cardiac disease-related receptor sets can therefore not be represented.
  
  f) also not a venn diagram e/f now in supp
  
  the "NABA ECM collagens" gene set. Nodes are
  
  grouped by molecular type (e.g., transcription factors, receptors, ligands), and links represent the weighted,
  
  direct regulatory interactions present in the ReCoN-constructed
  
  Response:
  
  *
  
  As the diagrams do not indicate the total number of receptor/TF that are in the literature, it cannot be Venn diagrams. We updated the legend to :Venn diagram showing the Overlapp between [...]
  
  *
  
  As we reorganised the paper, these plots are now only in supplementary; we removed the duplicate occurrence in the figure 5 legend.
  
  R1.30. Why Sankey plot? Normally sankey plot represents flow (of regions changing from 1 state to another) but here this is just a weighted network?
  
  No communication from firbos back to other cell types? No communication between ventricular/myeloid/lymphoid?
  
  Response:
  
  *
  
  We are thankful for this useful feedback which helped us realising interesting details were missing from the paragraph.
  
  *
  
  *This is only intended for visualising regulatory cascade, so users have to decide on one receiving cell, a set of target genes, and sending cells. It includes a specific subset of regulatory cells, and only their interactions with the target cells. Here, we illustrated the regulation of some ECM genes produced by fibroblast. *
  
  *
  
  Sankey Diagram might indeed not be the clearest representation, as we are not modelling the all diffusion, and not a flow per se. We propose to replace by another representation that we hope will be more intuitive for biologists (and more aesthetic), such as illustrated below:
  
  R1.31. as a extension to - an
  
  underrepresented in the current. - current framework?
  
  Response:
  
  *
  
  framework works perfectly to fill the missing word in the sentence
  
  *
  
  R1.32. However, it can't represent more - cannot
  
  Borrowing representation from hypergraphs, which introduces
  
  The network exploration implementation of ReCoN also present some limitations.
  
  limitations. While random walks
  
  with restarts offer a stable and fast exploration workflow for multilayer networks, it
  
  currently only considers positive weights to predict regulation strengths. It involves that the
  
  nature of the regulation, as activation or inhibition, has to be identified a posteriori.
  
  check concordance/grammar
  
  Response:
  
  *
  
  We will update the raised grammatical errors
  
  *
  
  R1.33. Only the nodes that are included in one of the layers are present in the
  
  final results, ignoring the ones present only in bipartites.
  
  Unclear
  
  Response:
  
  *
  
  Layers and bipartites are treated differently by the algorithm, and layer presence is necessary to appear in the results.
  
  *
  
  In practice, it just means that receptors/ligands not paired in the CCC, or genes not regulated by any TF in the GRN, won't appear.
  
  *
  
  We propose clarifying with this second explanation
  
  *
  
  "In practice, a node must have at least one connection in its layer to appear in the final results. It thus means that receptors or ligands absent from the CCC network and genes not targeted by any transcription factor in the GRN will not receive a score from the random walk exploration."
  
  *
  
  R1.34. a scATAC - an
  
  *
  
  Barsi et al is published https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013188
  
  Response:
  
  *
  
  We updated the reference with the published article.
  
  R1.35. effects, allowing for modulating in a second
  
  time their contribution. - word order
  
  Response:
  
  *
  
  We propose to formulate "allowing in a second time to modulate their contribution"
  
  R1.36. others. However, it is possible to adjust the Beta coefficient to
  
  represent it based on the available information for each dataset.
  
  Represent- adjust?
  
  Response:
  
  *
  
  We agree with the reviewer's suggestion to use adjust.
  
  R1.37. We use the latter to compare the different models. - what is the latter?
  
  Response:
  
  *
  
  The latter referred to the 25 cytokines of the Immune Dictionary which had at least one connection in the inferred cell communication network with CellPhoneDB. We propose clarifying this formulation to "..."
  
  R1.38. It resulted in the scRNA-seq in 1,789 cells with 13,167
  
  genes, and for the scATAC-seq in 3,759 cells with 254,545 regions.
  
  Check english
  
  Response:
  
  *
  
  We propose replacing this sentence by the following: "It resulted in a scRNAseq dataset of 1,789 cells with 13,167 genes, and a scATACseq dataset of 3,759 cells with 254,545 regions."
  
  R1.39. GRETA pipeline.- reference
  
  Response:
  
  *
  
  We added the citation to the paper of the GRETA pipeline in the section 4.5 of the methods: "Badia-i-Mompel et al., 2026"
  
  R1.40. We kept all the cells whose annotations through unsupervised clustering,
  
  followed by marker gene annotations, through scANVI were coherent.
  
  Word order
  
  Response:
  
  *
  
  We propose the following reformulation to correct the sentence: "We kept all cells whose annotations were coherent between unsupervised clustering with marker-gene labelling and scANVI-based label transfer"
  
  R1.41. In parallel, pairs of ligands and receptors with both associated with scores above
  
  an absolute gene loading of 0.1 were considered potential driver interactions in HF.
  
  Unclear
  
  Response:
  
  *
  
  In the MOFAcell results, factors correspond to linear combination of genes that explain a large part of the data variance; the contribution of each gene is called loading. We chose the factor that classified the best patient with and without fibrosis, and kept all the top genes, all of those with a score above 0.1.
  
  *
  
  We propose reformulating this sentence as the word "loading" could overcomplicate here for most readers: "To identify the ligand and receptors driving heart failure, we considered all of those with an absolute contribution to the multicellular factor of 0.1."
  
  R1.42. gseapy Python - reference?
  
  Response:
  
  *
  
  The gseapy package was indeed not cited, we now include the citation : "Zhuoqing Fang, Xinyuan Liu, Gary Peltz, GSEApy: a comprehensive package for performing gene set enrichment analysis in Python, Bioinformatics, 2022;, btac757, https://doi.org/10.1093/bioinformatics/btac757"
  
  R1.43. and to calculate average for each spatial context the average cell type expression.
  
  Unclear
  
  Response:
  
  *
  
  we propose to reformulate the sentence to: "These cell-type-spot profiles were used later for each spatial context to create a specific cell-cell communication networks and to calculate cell type average expressions."
  
  R1.44. We only used the loadings of all cell
  
  types but the fibroblasts to consider the effect of the sole environment.
  
  Unclear
  
  Response:
  
  *
  
  we propose to use "APART from the fibroblast" to clarify the sentence and "to ONLY consider the environment effect".
  
  R1.45. We realised a downstream - performed
  
  Response:
  
  *
  
  We fully agree with the reviewer's suggestion.
  
  R1.46. The profiles inferred by ReCoN were first very correlated in all three contexts. - unclear
  
  Response:
  
  *
  
  The sentence was missing clarity and deserved being rephrased. We propose: "When looking at the absolute scores of ReCoN in all three contexts, results were initially highly correlated. To focus on context-specific differences, enrichments were performed using the log-ratio of each context profile over the mean of the other profiles."
  
  *
  
  R1.47. Potentially the closest results are models that can predict the effect of perturbations on cell line cultures. Several approaches in the literature employ either transformers or optimal transport to predict the effect of perturbations in single cell datasets. One of the main issues is an underlying necessary assumption that the perturbation effect will be larger than the heterogeneity (in cell lines for example), which becomes increasingly difficult when considering in-vivo experiments. ReCoN obviously goes beyond this by considering explicitly the presence of different cell types but distinctions of cell types are sometimes quite arbitrary and potentially application of ReCoN to some of the in-vitro culture datasets, even on cell lines, could be a way to test its performance and benchmark it against other methods.
  
  The main bottleneck in the application of this framework to 'personalisation' of therapies, mentioned even in the abstract as a potential future goal for such an approach, will be the lack of data. This approach requires single cell level descriptions of the system at hand, plus additional datasets to build the model structure. To a certain extent, public data of related tissues/contexts can be used, but it will be necessary to test the dependence of performance on coherence of the input data to develop sufficient trust to use it for new predictions, especially in a medical field.
  
  *
  
  We thank the reviewer for these reflections, which raise several distinct points that we would like to add in the discussion.
  
  Cell line perturbation is indeed a close and active field of research, with notably numerous models based on optimal transport and VAE and relevant benchmarks(Radig et al. 2025)*. In our view, ReCoN tries to take a complementary angle, by both focusing on the environment effect and using a network-driven approach providing explainability. *
  
  These perturbation methods are typically benchmarked on single cell line screenings, where cell-cell communication is highly limited or absent by design, while ReCoN is specifically designed to exploit multiple cell types interactions. Furthermore, ReCoN relies on a network that aims to provide only explainable hypotheses and molecular cascades. They also typically learn from different data, as ReCoN only uses single-cell data and best perturbation prediction methods learn from a subset of perturbation experiments.
  
  Exploring the performance of ReCoN in perturbation predictions would require designing extensive comparisons with the state-of-the-art taking into account all these nuances which we believe goes outside of the scope of the present study. It however still raises a fundamental question for the development of the next methods and the need to assess whether the perturbation effect is actually larger than the heterogeneity, and we propose to extend the discussion to cover these aspects.
  
  Secondly, this comment raised a point about cell type definition, which can be a hard task and sometimes a wrong description of cells heterogeneity. We note that even if ReCoN relies on grouping cells in some way, it does not impose any particular cell type ontology: users can define their own cell types or cell states, since the CCC layer is typically inferred from single-cell RNA-seq alone and does not require canonical cell-type annotations. This flexibility allows ReCoN to accommodate finer or coarser groupings depending on the biological question. We do not propose a framework to take into account diversity in other ways than homogeneous clusters of cells, but we think that it constitutes an interesting future development of ReCoN or new multicellular modelling methods.
  
  Lastly, we fully agree that an important limitation for ReCoN's use is data availability and generation, which was also a limitation when identifying datasets for the manuscript's applications. We hope that the development of open source atlases will make it easier to leverage tissue-specific prior knowledge and increase potential application, prediction performances, and trust in ReCoN results.
  
  In conclusion, we propose to state in the discussion two new points:
  
  *1) extending multicellular perturbations (including gene knock-out) to conditions where cell types cannot be defined prior to the analysis, or are more to consider across a spectrum, will be an interesting future direction. *
  
  2) there is new a need for broad benchmarks covering both multicellular and single-cell line tasks to evaluate the trade-off between accounting for cell heterogeneity and overall prediction accuracy.
  
  Radig, J., Droit, R., Doncevic, D. et al. scArchon: a scalable benchmarking framework for assessing single-cell perturbation models. Genome Biol 27, 162 (2026). https://doi.org/10.1186/s13059-026-04104-z
  
  R1.48. The authors could comment on how their method compares to others that do not require single cell level information. Despite clear differences, it might be important to show the advantage of using this more complex approach that requires data that is less available. Given the ease with which bulk profiles can be constructed from single cell data, it might be possible to compare the approaches directly. For example, see
  
  Wang, S. Patkar, J.S. Lee, E.M. Gertz, W. Robinson, F. Schischlik, D.R. Crawford, A.A. Schäffer, E. Ruppin Deconvolving Clinically Relevant Cellular Immune Cross-talk from Bulk Gene Expression Using CODEFACS and LIRICS Stratifies Patients with Melanoma to Anti-PD-1 Therapy
  
  Mike van Santvoort, Óscar Lapuente-Santana, Maria Zopoglou, Constantin Zackl, Francesca Finotello, Pim van der Hoorn, Federica Eduati,
  
  Mathematically mapping the network of cells in the tumor microenvironment,
  
  Cell Reports Methods 2025
  
  We propose to extend the discussion with additional methods, notably from before single-cell technology developments. We did not plan to include this two specific methods, as to our knowledge, they don't provide output directly comparable to ReCoN's purpose.
  
  The first work proposes to deconvolute the bulk RNA-seq profile into cell-type-specific expression profiles. It is an interesting reference, as it could allow applying ReCoN even to bulk RNA-seq, but they do not provide comparable results, as their final task corresponds to inferring the ligand-receptor interactions, without providing downstream molecular mechanisms.
  
  The second method proposed in this paper, RaCInG builds cell-to-cell networks for individual patients. They do not explore the molecular interactions inside the cells themselves, which could be used to build personalised ReCoN's model but seem to be more a prior to recent CCC than ReCoN itself.
  
  *
  
  *
  
  Reviewer #2
  
  R2.1. It is not clear how well it performs in independent validations. Authors showed that it can predict the effect of cytokine perturbations in the immune dictionary by selecting an optimal alpha. Authors should validate that using the same alpha value of 0.8, it is possible to accurately predict the effect of cytokine perturbations in independent datasets. This is particularly concerning for cytokine-cell type pairs where the optimal alpha is not known. Therefore, the potential utility of Recon to estimate the effect of multicellular perturbations is not well established.
  
  *
  
  Response:
  
  *
  
  *The reviewers raised a very relevant point by pointing out that the alpha coefficient might vary between datasets. *
  
  *
  
  The value of 0.8 was chosen because it produced the best results in two independent datasets, the immune dictionary and the heart failure showcases. We could here observe some cross-dictionary reproducibility. To complete these findings, we will also verify that 0.8 provides the best performance in a new showcase: the Human Cytokine Dictionary (Oesinghaus et al. 2025)
  
  *
  
  We tried to contrast this choice by opening on the need to confirm the importance of the indirect effect. We propose to add a sentence explicitly commenting on the impact of these new findings on the alpha coefficient and its robustness value.
  
  *
  
  It is also accurate to say that ReCoN cannot currently estimate the alpha parameter autonomously. We proposed this default value as it worked on both datasets, but it is possible that no default value could fit them all. The value of alpha is currently a default value, but users are completely free in the current implementation of ReCoN to modify its value depending on their needs
  
  If it was not the case, one option could be to fit its value using similar prior perturbations, when such data is available. For example, perturbing one or a few cytokines, a user could choose the value that explained the best the gene expression responses.
  
  *
  
  R2.2. Authors claimed that optimal alpha value of 0.8 implies the dominance of indirect effect. But in contrast to this claim, the performance across cytokine-celltype pair only increased from 0.72 to 0.76, which seem to imply that indirect effects do not add much.
  
  *Response: *
  
  *
  
  The range of performance improvement is an interesting point to discuss for us, as it roughly doubles the computational time and consequently a trade-off between resource usage and this improvement.
  
  *
  
  While the average improvement from combining the direct and indirect effects observed on the first showcase was around 5%, it reached more than 10% in some cell types. We consider that it still corresponds to an interesting improvement for the current task. Indeed, it here "only" incorporates the coordination of immune cells to a cytokine stimulation, which should not necessarily change their profile drastically compared to isolated exposition.
  
  R2.3. How does the cell-type specific effects prediction perform by just considering the intracellular layers? The authors constructed multiple variants of ReCoN to estimate unicellular and multicellular effects. How is the variant ReCoN-grn different from full ReCoN where gamma is set to zero.
  
  *Response: *
  
  *
  
  We are thankful for this comment, which will help to restructure the section 2.2.
  
  *
  
  As the ReCoN-GRN differs from the full ReCoN model, even with a gamma value of 0, as the latest include ligand-to-receptor weights. However, the ReCoN-GRN would correspond to the ReCoN-generic with an alpha of 0, which does not weight ligand-to-receptor links.
  
  *
  
  We propose to clarify this detail in the section 2.2.2 by adding after the introduction of the ReCoN-generic model the sentence: "Note that ReCoN-grn corresponds to the ReCoN-generic model with alpha set to zero, where no indirect effects are considered. It differs from the full ReCoN model with alpha set to zero, which still includes ligand-to-receptor weights through the receptor-gene bipartite network."
  
  R2.4. In section 2.2, authors assert that if matching datasets are not available, GRN layer can be extracted from other datasets. How well does the GRN layer from one system generalizes to the other system in terms of perturbation prediction?
  
  *Response: *
  
  *
  
  It is, of course, a complex question, as it probably strongly depends on the studied system. However, we believe while it is important to consider similar systems, using the same samples for the cell-communication and the GRN layer is not necessary.
  
  *
  
  The first showcase that we propose explores exactly this case. We built the GRN from two unpaired datasets, and the cell communication from a third one. It provided convincing performances, justifying our earlier claim. It is additionally something done in most methods contextualising prior knowledge, which usually comes from other samples and sometimes even other organs (Browaeys, Saelens, and Saeys 2020, Jin et al. 2021, Badia-i-Mompel et al. 2023).
  
  *
  
  To provide additional insights, we will run the new Human Cytokine Dictionary showcase using both 1) multiomics methods on external PBMC datasets, and 2) a single-cell RNA-seq only method on the Human Dictionary directly. We will then be able to show performances using both data and corresponding methods.
  
  *
  
  To justify more clearly our claim according to reviewer's comment, we propose highlighting in the showcase itself this justification: ".... this showcase highlights the possibility to combine networks obtained from distinct datasets...".
  
  Related to combining datasets, we propose to clarify the reasons behind our choices for the Immune Dictionary showcase with the additional supplementary text proposed in response to the comment R1.4.
  
  *
  
  Badia-i-Mompel P, Wessels L, Müller-Dott S et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023;24(11):739-54. https://doi.org/10.1038/s41576-023-00618-5.
  
  Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 2020;17(2):159-62. https://doi.org/10.1038/s41592-019-0667-5.
  
  Jin S, Guerrero-Juarez CF, Zhang L et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12(1):1088. https://doi.org/10.1038/s41467-021-21246-9.
  
  R2.5. In the abstract, authors claimed that ReCoN can predict the effect of gene knockouts. But authors did not show any application or validation to support this claim.
  
  Response:
  
  *
  
  We indeed had no showcase that could explicitly measure the performance of ReCoN directly for gene knockout, while the possible application was introduced in the abstract.
  
  * We believe that ReCoN could be used in the future to infer such perturbations, but we fully agree that this claim cannot be presented without justification.
  
  We propose to remove the introduction of gene-knockout there, and to introduce it in the discussion opening instead, specifying that it will require specific experience and constitutes a possible future extension of the work.*
  
  R2.6. The communication between cells might be dependent on their spatial proximity. Is it possible to construct the CCC layer by incorporating the context-matched spatial data? How would that affect the performance of multicellular response prediction?
  
  Response:
  
  *
  
  *This is a very interesting comment as numerous methods using spatial transcriptomic data have been published recently. *
  
  *
  
  In the current formulation, the beta coefficient Bi_j modulates the impact of the cell type i on the cell type j. If the spatial transcriptomic data can inform on the proximity between cell types, and its overall impact on their communication, users could enforce more communication between some.
  
  *
  
  However, as ReCoN is a cell-type centric model, adding spatial information can only be done at a general scale, or by modelling independently spatial regions such as presented in the Microenvironments heart infarction showcase. It means that ReCoN cannot beneficiate from the potential of spatial transcriptomic as much as models representing the tissue structure.
  
  R2.7. In the fibroblast application in Fig 4d, based on the cardiac cell types expression in region type, they are predicting fibroblast gene expression. Wouldn't the most direct benchmarking be comparison with observed fibroblast expression from the ST (after deconvolution perhaps)?
  
  Response:
  
  *
  
  This was a helpful comment to guide the restructuration of the microenvironment heart infarction showcase, as we believe the whole showcase objective was not formulated clearly enough.
  
  *
  
  We aim at modelling the impact of the environment on the transcriptome. As the complete transcriptome of a cell results from numerous interacting variables, we believe that comparing the correlation between ReCoN's scores and the transcriptome would not evaluate the prediction of the environment impact.
  
  *
  
  For this reason, we wanted to compare the results to the specific differences from the microenvironment. We focused on gene set enrichment that seemed less noisy for such a comparative experiment, in particular from Visium10X data that has a particularly high dropout rate.
  
  *
  
  We propose to strengthen the validation by providing molecular insights into the three groups of cells studied.
  
  The spatial data themselves are bulk, adding a layer of noise over the small number of genes captured by Visium. Instead of a correlation with the deconvoluted spots, we have equivalent single-cell RNA-seq fibroblast data annotated in the same study, which matches the three modelled niches. We propose to conduct a differential expression here and try to compute a correlation between these groups and ReCoN scores, providing a quantitative analysis.
  
  If the correlation was low because of the noise in the data (notably leading to the permutation of individual gene orders even if overall biological signals and gene set orders are conserved), we will additionally do a pathway enrichment over this data, enriching also the qualitative validation.
  
  R2.8. Section 2.6 Besides the cytokine section, it is difficult to assess the added value of this approach. Likely there is a lot of valuable findings here but difficult to say because the assessment is very qualitative.
  
  Response:
  
  *
  
  One of the challenges around this work was to find relevant dataset to evaluate ReCoN. We tried to complete the direct quantitative evaluation from the Immune Dictionary with another quantitive evaluation from the heart atlas multicellular programs, despite a much less direct validation.
  
  *
  
  We hope that the production of new perturbation experiments over multicellular datasets, especially cell-type targeted perturbations, will provide more opportunities to validate the different findings and claim from our current manuscript.
  
  *
  
  On a similar note, no method seemed proposing similar predictions to be compared to. It led to the use of Nichenet score and the current decomposition of the ReCoN model in the section 2.2.1 to evaluate the contribution of the model.
  
  R2.9. The article is dense and writing should be reorganized for better readability.
  
  Minor issues -
  
  No p-values in figures.
  
  *Response: *
  
  *
  
  We agree that integrating values directly in the panels would make the reading of the figure easier. We would like to introduce the p-values in the panels 2d, 2e, 2f, 2g. We had forgot to indicate in the legend of the panel 4.d that all bold scores were associated with a p-value *
  
  R2.10. Typo - ReCoN-genetic should be - ReCoN-generic.
  
  *
  
  Response:
  
  *
  
  We are thankful for noticing the typo and corrected it in the new version.
  
  *
  
  R2.11. Authors may consider adding figures to describe their results on balance between direct and indirect effects in section 2.2.2.
  
  *
  
  Response:
  
  *
  
  Depending on the new findings on the indirect effect iterations, we propose adding an additional panel on their combination or a supplementary figure.
  
  *
  
  R2.12. Redundancy in the following two lines -
  
  o While these approaches effectively describe what tissue-wide programs are coordinated, they generally offer limited insight into the molecular mechanisms that establish or regulate these programs.
  
  o Despite their ability to identify coordinated tissue-wide programs, multicellular program analyses typically offer limited insight into the underlying molecular mechanisms that orchestrate these programs.
  
  *
  
  Response:
  
  *
  
  We propose in the version of the manuscript to remove the first sentence. In our opinion, starting the next paragraph by this clarification seems more helpful to guide the reader than having it at the end of the previous one.
  
  3. Description of the revisions that have already been incorporated in the transferred manuscript
  
  Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.
  
  *
  
  4. Description of analyses that authors prefer not to carry out
  
  Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.
  
  R2.13. The direct and indirect effects are treated in two separate steps. In reality of course these effects are operating simultaneously. I wonder if this could be better modelled by iterating through the two steps. It might be worthwhile
  
  trying to see if that improves the performance.
  
  We thank the reviewer for this interesting idea, and propose to add a supplementary text to present the result of this discussion to the readers.
  
  *
  
  The direct effect is supposed to be measurable from the first iteration only, as we try to represent the effect of direct receptor binding. Regarding the indirect effect, iterations could be done to model the indirect effect, which could represent more distant effect in time.
  
  *
  
  On an algorithmic note, the indirect effect already allow several "iterations" of this effect, as each random walk can loop between all cell types until restart. However, it does not allow to control the weight of the different successive transition. In practice, with a high restart probability, an extreme weight is given to the first "iteration" over the second, as there is three layers to cross to explore the next cell.
  
  *
  
  First, we propose clarifying this section of the manuscript, to explain the depth of the indirect effect explorations.
  
  *
  
  Biologically, it is highly possible that these iterations have an important role to explain the complete reaction of the cells. However, we believe that it hits a major limitation of our modelling, and RWR based exploration in general, as it goes against the enforcement of restarts.
  
  *
  
  We aim to represent pairwise measurements, representing the impact of one node on another. But random walks without restart are not naturally well fitted to this problem, as they naturally converge to a stationary distribution ((László, Lov, and Erdos 1996)). In the case of ReCoN, it means that each gene and receptor, if we pushed the exploration indefinitely, would have the same probability to end up on each node of the system.
  
  *
  
  The restart mitigates this impact and enforces the impacts of the seeds by ensuring that the walkers stay close to the seed. (Tong, Faloutsos, and Pan 2006). By iterating successively from the new distribution obtained from the RWR, we would go against this important probability and progressively converge toward the stationary distribution from classical random walks.
  
  *
  
  So we completely share the opinion of the reviewer that the iterative nature of the indirect effect should be explored too, but we don't believe that ReCoN can model them accurately. We hope that new exploration methods will be able to decipher the importance of these iterations, once additional arguments have been gathered to justify the global interest of considering the indirect effect.
  
  *
  
  Bibliography:
  
  *
  
  László L, Lov L, Erdos O. Random Walks on Graphs: A Survey. 1 Jan. 1996:1-46.
  
  *
  
  Tong H, Faloutsos C, Pan J yu. Fast Random Walk with Restart and Its Applications. Sixth Int Conf Data Min ICDM06 Dec. 2006:613-22. https://doi.org/10.1109/ICDM.2006.70.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.64898/2026.01.20.700561
acd.pressbooks.pub acd.pressbooks.pub

1.3 Ethnocentrism and Cultural Relativism

1
1. amiller38 27 May 2026
  
  in Public
  
  Culture and Society – Diversity and Multi-Cultural Education in the 21st Century Some travelers pride themselves on their willingness to try unfamiliar foods, like the late celebrated food writer Anthony Bourdain (1956-2017). Often, however, people express disgust at another culture’s cuisine. They might think that it’s gross to eat raw meat from a donkey or parts of a rodent, while they don’t question their own habit of eating cows or pigs. Such attitudes are examples of ethnocentrism, which means to evaluate and judge another culture based on one’s own cultural norms. Ethnocentrism is believing your group is the correct measuring standard and if other cultures do not measure up to it, they are wrong. As sociologist William Graham Sumner (1906) described the term, it is a belief or attitude that one’s own culture is better than all others. Almost everyone is a little bit ethnocentric. A high level of appreciation for one’s own culture can be healthy. A shared sense of community pride, for example, connects people in a society. But ethnocentrism can lead to disdain or dislike of other cultures and could cause misunderstanding, stereotyping, and conflict. Individuals, government, non-government, private, and religious institutions with the best intentions sometimes travel to a society to “help” its people, because they see them as uneducated, backward, or even inferior. Cultural imperialism is the deliberate imposition of one’s own cultural values on another culture. When people find themselves in a new culture, they may experience disorientation and frustration. In sociology, we call this culture shock. In addition to the traveler’s biological clock being ‘off’, a traveler from Chicago might find the nightly silence of rural Montana unsettling, not peaceful. Now, imagine that the ‘difference’ is cultural. An exchange student from China to the U.S. might be annoyed by the constant interruptions in class as other students ask questions—a practice that is considered rude in China. Perhaps the Chicago traveler was initially captivated with Montana’s quiet beauty and the Chinese student was originally excited to see a U.S.- style classroom firsthand. But as they experience unanticipated differences from their own culture, they may experience ethnocentrism as their excitement gives way to discomfort and doubts about how to behave appropriately in the new situation. According to many authors, international students studying in the U.S. report that there are personality traits and behaviors expected of them. Black African students report having to learn to ‘be Black in the U.S.’ and Chinese students report that they are naturally expected to be good at math. In African countries, people are identified by country or kin, not color. Eventually, as people learn more about a culture, they adapt to the new culture for a variety of reasons. Cultural relativism is the practice of assessing a culture by its own standards rather than viewing it through the lens of one’s own culture. Practicing cultural relativism requires an open mind and a willingness to consider, and even adapt to, new values, norms, and practices. Perhaps the greatest challenge for sociologists studying different cultures is the matter of keeping a perspective. It is impossible for anyone to overcome all cultural biases. The best we can do is strive to be aware of them. Pride in one’s own culture doesn’t have to lead to imposing its values or ideas on others. And an appreciation for another culture shouldn’t preclude individuals from studying it with a critical eye. This practice is perhaps the most difficult for all social scientists.
  
  Delete this entire section it is a repeat from begining of 1.4.
Visit annotations in context

Annotators

amiller38

URL

acd.pressbooks.pub/teachingwithcompassion/chapter/ethnocentrism-and-cultural-relativism/
www.biorxiv.org www.biorxiv.org

A pilot study for whole proteome tagging in C. elegans

1
1. Public_Reviews 26 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  The nematode C. elegans is an ideal model in which to achieve the ambitious goal of a genome-wide atlas of protein expression and localization. In this paper, the authors explore the utility of a new and efficient method for labeling proteins with fluorescent tags, evaluating its potential to be the basis for a larger, genome-wide effort that is likely to be very useful for the community. While the evidence for the method itself is solid, carrying out this project at a large scale will require significant additional feasibility studies.
  
  We appreciate the editor’s recognition that the evidence for our method is solid and that a genome-wide protein atlas in C. elegans would be highly valuable to the community. However, we respectfully disagree that “significant additional feasibility studies” are required. Take the yeast proteome-wide GFP tagging project (Huh et al., Nature 2003). It achieved ~75% coverage of ~6,000 proteins directly from an established protocol without any prior significant feasibility studies, at least to our knowledge. While the C. elegans genome is 3 times in size, we would argue that our tagging protocol may even be less labor intensive as it does not involve any cloning and the screening is visual, requiring no molecular biology skills. Reviewer 3 notes: ‘They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.’
  
  Our pilot study validates all key parameters for genome-wide scaling: editing efficiency at novel loci with untested reagents, viability of tagged worms, and detectability of multiple spectrally separated fluorophores across expression ranges. These address the core technical, biological, and practical challenges of large-scale endogenous tagging in a multicellular organism, leaving no fundamental barriers in our view.
  
  The proposed cost and timeline align quite favorably with established large-scale consortium projects: e.g., ENCODE pilot analyzed 1% of the human genome at ~$55 million over 4 years; Mouse Knockout Consortium scaled to ~20,000 genes over 20 years (ongoing) with ~$100 million; Human Protein Atlas mapped ~87% of proteins with antibodies in fixed cells (through much more labor intensive methods) over 20+ years at >$100 million. With ~8% of C. elegans genes already tagged (WormTagDB) and labs already tagging entire gene classes (PMID: 40463100), scaling our protocol to the proteome is feasible, potentially covering the genome in 5-6 years by a single lab or faster with distributed effort at a reagent cost of merely $2.2 million. The main barriers now are funding commitment and assembling collaborators, not further feasibility testing.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Eroglu and Hobert demonstrate that injecting CRISPR guides and repair constructs to target three genes at a time, tagging each with a different fluorescent protein, and selecting which gene to tag with which fluorophore based on genes' expression levels, can improve the efficiency of gene tagging.
  
  Strengths:
  
  This manuscript demonstrates that three genes can be targeted efficiently with three different fluorophores. It also presents some practical considerations, like using the fluorophore least complicated by agar/worm autofluorescence for genes with low expression levels, and cost calculations if the same methods were used on all genes.
  
  Weaknesses:
  
  Eroglu has demonstrated in a previous publication that single-stranded DNA injection can increase the efficiency of CRISPR in C. elegans while inserting two fluorescent proteins and a co-CRISPR marker into three loci. The current work is, therefore, an incremental advance. In general, I applaud the authors' willingness to think ahead to how whole proteome tagging might be accomplished, but I predict that the advance here will be one of many small advances that will get the field to that goal.
  
  Our manuscript indeed builds on prior multiplex editing (including our own co-CRISPR work), but the manuscript's primary contribution is not a novel technical breakthrough per se. Instead, our main goal was to pilot and strategize a feasible path to whole-proteome tagging in C. elegans and, most critically, test the following key parameters: (1) success rate of triple pools with prior untested reagents at novel targets; (2) utility of fluorophores across expression levels; (3) major effects on tagged protein function. In prior multiplexing, we used two targets which we already knew could be edited quite efficiently, with the 3rd target a point mutation with nearly 100% efficiency. Thus, it was not at all clear that picking 3 random genes and replacing the 3rd highly efficient locus with another less efficient large insertion would work or be sufficiently scalable for thousands of novel genes with unvalidated reagents at first pass.
  
  The title vastly oversells the advance in my view, and the first sentence of the Discussion seems a more apt summary of the key advance here.
  
  Some injections target genes on the same chromosome together, which will create unnecessary issues when doing necessary backcrossing, especially if the mutation rate is increased by CRISPR.
  
  We disagree with the reviewer’s assessment of the need for backcrossing, for two reasons: (1) Prior studies have shown that off-target mutations are not a serious concern in C. elegans (reviewed in PMID: 26336798). For instance, WGS of strains after CRISPR/Cas9 found negligible off-target effects (PMID: 25249454, PMID: 30420468 – using similar RNP/ssDNA method and multiple guides; PMID: 23979577, PMID: 27650892 using other methods). Targeted sequencing studies have reported similar findings, using various CRISPR/Cas9 methods, with essentially no mutations at sites other than the intended target (PMID: 23995389; PMID: 23817069). (2) If the goal is to tag the entire genome, the introduction of backcrossing should not reasonably be a routine part of the initial tagging.
  
  Lastly, if one really does want to backcross, the existence of tags on the same chromosome is actually an advantage because it permits selection for recombinants with wild-type chromosomes.
  
  Also, the need for backcrossing and perhaps sequencing made me wonder if injecting 3 together really is helpful vs targeting each gene separately, since only 5 worms need to be injected.
  
  Apart from our disagreement regarding backcrossing, we are puzzled by the reviewer’s comment. Why would one do single tagging at a time, rather than triple tagging if the whole point is to scale up tagging? It is important to keep in mind that the rate limiting step for tagging the whole genome is the number of injections that can be done per day. Since there is no cloning to generate the repair templates/guides and all other reagents are commercially available and not sample specific, these can be prepared quite rapidly. Being able to isolate multiple lines (together or independently) from the same injection increases throughput 3-fold and in our view does not provide any disadvantages as individual tags can be isolated independently if desired.
  
  Beyond the numerous technical advantages pooling provides (also lower cost and throughput for making injection mixes as well as imaging), our results show that it yields epistemic benefits as well: we would never have noted the subcellular pattern in Fig. 6B, C with different sets of mitochondria being marked by different mitochondrial proteins had we imaged them separately or even aligned to a pan-mitochondrial landmark. As we mentioned in the discussion, grouping proteins predicted to localize to the same compartment together can simultaneously test how uniform or differentiated such compartments are during the screen.
  
  The limited utility of current blue fluorescent proteins makes me wonder if it's worth using at all at this stage, before there are better blue (or far red) fluorescent proteins.
  
  We do not think that the utility of current BFPs is that limiting. At least the theoretical brightness of mTagBFP2 is comparable to that of EGFP (PMID: 30886412), which was useful for the bulk of currently tagged proteins. Due to modestly higher autofluorescence in the blue spectrum, the practical brightness is somewhat less ideal, but we have shown that many proteins are expressed high enough to be detected quite well with mTagBFP2 by eye at low magnification. We also note that many tags that are not visible by eye under a dissection scope become visible with long exposure cameras of widefield microscopes or modern confocal (GaAsP) detectors, so the list of genes detectable with mTagBFP2 is likely to be much higher. We routinely use mTagBFP2 to super-resolve subnuclear structures with endogenous tags (e.g., in the nucleolus), with some tags having lower annotated FPKMs than the genes tested here.
  
  Some literature reviews, particularly in the Introduction and Abstract, rely too much on recent examples from the authors' laboratory instead of presenting the state of the field. I'd like to have known what exactly has been done with simultaneous injection targeting multiple loci more thoroughly, comparing what has been accomplished to date by various laboratories' advances to date.
  
  We are not sure what the reviewer is referring to. In the Abstract, we do not refer to any literature. In the Introduction, we cite 28 papers, 6 of those from our lab (4 of which providing examples of protein tags). We do not believe that this can be fairly called an unbalanced presentation of the state of the field.
  
  This being said, we have gladly expanded our Introduction to provide more background on co-CRISPRing. Labs have routinely used co-conversion (“coCRISPR”) markers for picking out their intended edits (e.g., point mutations or insertions), as it has been shown by multiple groups that a CRISPR/Cas9 edit at one locus correlates with efficiency at other simultaneous targets (PMID: 25161212). Generally, making point mutations with the Cas9/RNP protocol is highly efficient, especially at specific loci such as dpy-10. However, multiple FP-sized insertions have not been routinely attempted. We and only one other group have successfully attempted it using previously working targets and reagents (e.g., 28% in PMID: 26187122). Importantly, the efficiency of such multiple insertions has never been assessed at scale and using entirely untested reagents at novel sites – critical parameters to determine for a whole genome approach. So, we test here (1) the efficiency of triple insertions and (2) the chance of getting them with new and untested guides and reagents.
  
  In our view, since we have to use some injection/coCRISPR marker anyway for those genes which are not expressed at dissecting-scope visible levels (likely most genes), using highly expressed intended targets as improvised markers in a pooled approach makes our approach much more efficient. It allows us to find the worms with the highest chance of yielding CRISPR insertions, which we can screen with higher power methods for the dimmer targets, while enabling us to co-isolate other intended targets. Insertions, being often heterozygous in F1, can be segregated independently if desired, or homozygosed together to facilitate maintenance then outcrossed individually by those interested in studying specific genes in more detail.
  
  In the revised version of this manuscript, we now discuss some of these points in the introduction section:
  
  “Currently, around 1554 proteins representing 8% of the proteome are estimated to have been endogenously tagged (Leyhr et al., 2025). However, at current rates, tagging the proteome is projected to take around 100 years and likely involve numerous duplicate attempts on a small number of commonly studied proteins (Leyhr et al., 2025). It will thus be crucial for the field to coordinate tagging efforts and scale up tagging protocols to enable coverage of the entire genome at a reasonable timescale and cost. Given the number of injections is a major time-limiting factor, pooling multiple injections into one would at minimum cut tagging time by a factor of 3. In C. elegans, screening for novel CRISPR/Cas9-induced genomic edits is already facilitated either by use of co-injection markers (i.e., plasmids that form extrachromosomal arrays) that yield phenotypes or fluorescence in progeny of successfully injected worms, or co-editing well characterized loci using established and highly efficient reagents which likewise yield visible phenotypes. In the latter approach, termed “co-CRISPR”, worms edited at the marker locus are most likely to also carry the intended edit (Arribere et al., 2014). Recent methods for CRISPR/Cas9 mediated genomic insertions have pushed efficiencies to sufficient levels to simultaneously insert multiple fluorophores (e.g., mNeonGreen and mScarlet) as well as a co-CRISPR marker (dpy-10) at three independent loci in a single injection (Eroglu et al., 2023; Paix et al., 2015). These attempts pooled reagents previously established to work efficiently and targeted genes that were known to yield functional fusion proteins when tagged. Thus, while in principle current methods could allow tagging of at least 3 independent loci in one injection if a co-CRISPR marker is omitted, it is not known to what extent such an approach could be generalized across the genome with previously unvalidated reagents (i.e., guides and repair template homology arms) at novel loci to yield functional tags”
  
  Reviewer #2 (Public review):
  
  The manuscript by Eroglu and Hobert presents a set of strains each harboring up to three fluorescently tagged endogenous proteins. While there is technically nothing wrong with the method and the images are beautiful, we struggled to appreciate the advance of this work - who is this paper for?
  
  We consider this paper to have two purposes: (1) motivate the community to come together to consider such genome-wide tagging approach; (2) provide a reference point for funding agencies that such an aim is not unreasonable and will provide novel interesting insights.
  
  As a technical method, the advance is minimal since the first author had already demonstrated that three mutations (fluorophore insertion and co-CRISPR marker) could be introduced simultaneously.
  
  We agree that the basic principle is similar. However, it was not clear that triple pooling three novel large edits would work, given the numbers in our original paper or that it would be scalable.
  
  The dpy-10 coCRISPR marker previously used is a highly efficient single site, with close to 100% hit rate. We also knew in the earlier study that the two pooled insertions already worked quite efficiently and did not disrupt the function of targeted proteins. Exchanging these plus dpy-10 for three novel tags was not guaranteed to succeed for many potential reasons, including both biological and technical. For instance, such a “marker free” approach necessitates that a significant number of targets in the genome should be expressed highly enough to be visible by fluorescence stereomicroscopy when tagged with current best fluorophores. The chance of disrupting gene function by tagging was also not explored in detail in C. elegans, nor whether one untested guide is generally sufficient. We think that establishing these parameters was meaningful and necessary for the goal of whole genome tagging. We have clarified some of these points in the text.
  
  As a pilot for creating genome-scale resources, it is not clear whether three different fluorophores in one animal, while elegantly designed and implemented, will be desired by the broader community.
  
  The usage of three different fluorophores is largely driven by the ability to co-inject and therefore cut injection effort by a factor of three. Moreover, having all three fluorophores together facilitates imaging and maintenance. Lastly, co-labeling has the potential to reveal unexpected patterns of co-localization or lack thereof (example: two mitochondrial proteins that we found to not have overlapping distribution). We clarified this point in the revised text in both the results and discussion.
  
  Finally, the interpretation of the patterns observed in the created lines is somewhat lacking. A Table with all the observations must be included. This can replace the descriptions of the observations with the different lines, which could be somewhat laborious for the reader, and are often wrong. There are numerous mistaken expectations of protein expression here, but two examples include:
  
  We are not convinced that our expectations are mistaken. Below we respond to the reviewer’s specific examples, and we are open to hear from the reviewer about additional cases.
  
  (1) The expectation that ACDH-10 is enriched in the intestine and epidermal tissues (hypodermis).
  
  There are multiple paralogs of this protein (see WormPaths or WormFlux) that may share functions in different tissues. There is also no reason to assume that fatty acid metabolism does not occur in other tissues (including the germline). Finally, there are no published studies about this enzyme, so we really don't know for sure what it's doing.
  
  The expression of acdh-10 is annotated in multiple scRNA datasets as intestine and epidermal enriched (CeNGEN/Taylor et al. 2021, highest in epidermis; Ghaddar et al 2023 highest in intestine). We did not mean to imply that fatty acid metabolism does not occur in the gonad, nor that a paralog of acdh-10 could not be performing the same function in tissues where acdh-10 is not expressed.
  
  However, this raises an important question: why have different paralogs doing the same thing? Duplicate genes with the same function are generally not evolutionarily stable (PMID: 11073452, PMID: 24659815). That there are such striking tissue specific expression patterns of an essential or widely expressed protein class suggests that paralogs of the gene likely differ in some meaningful parameter that might align with tissue-specific functional needs or regulation. The reviewer’s statement that ‘there are no published studies about this enzyme, so we really don't know for sure what it's doing’ is in fact an excellent demonstration of our point; finding out where the duplicates are expressed can provide a starting point to uncover potential differences between the paralogs. At the very least it can delineate to what degree paralogs diverge in their expression across the proteome and identify which such cases merit further study. In a more ideal scenario, prior information of protein function could indicate that the involved pathway requires tissue specific regulation.
  
  (2) The expectation that HXK-1 is ubiquitously expressed.
  
  Three paralogous enzymes are all associated with the same reaction, and we have shown that these three function redundantly in vivo, perhaps in different tissues (PMID: 40011787).
  
  The cited paper (PMID: 40011787) does not show where they are expressed. We discussed redundancy/paralogs above in point 1, and in our view the same applies here. They may perform the same reaction but are likely to differ in some meaningful way, be it regulation or rate of activity, for them to be stably maintained as functional genes over evolution.
  
  Moreover, single-cell RNA-seq data (PMID: 38816550) also show enrichment of hxk-1 in gonadal sheath cells.
  
  The Ghaddar et al. and CeNGEN/Taylor et al. datasets do not show this. The scRNA paper cited (PMID: 38816550) also shows enrichment in neurons, pharynx, coelomocyte and germ cells which we did not note. In our view, these in fact further support our goals: often, transcript datasets alone (frequently used to infer tissue function) do not sufficiently predict protein expression. One can post hoc find an scRNA-seq dataset that aligns somewhat with our protein observations, but how does one know which to trust a priori? Disagreements between transcript datasets will ultimately require resolution at the protein level, in our view.
  
  To clarify these points, we added the following to the discussion section:
  
  “We also noted unexpected cell type dependent distributions of proteins involved in broadly important metabolic processes such as ACDH-10, which was depleted from the germline compared to other tissues, and HXK-1, which was highly enriched in the gonadal sheath. Notably, for these as well as other cases, scRNA-seq datasets were not sufficient to deduce a priori the observed cell type specific differences at the protein level. Importantly, many genes encoding metabolic enzymes including acdh-10 and hxk-1 have paralogs that likely perform similar catalytic functions. Yet, duplicate genes with identical functions are generally not evolutionarily stable (Adler et al., 2014; Lynch and Conery, 2000); thus such genes are likely to differ in some meaningful parameter (e.g., regulation or activity) that might align with tissue-specific functional needs. Fully annotating the expression patterns of paralogs at the protein level could indicate which tissues require unique metabolic needs and indicate which paralogous genes have undergone sub- versus neo-functionalization. For those proteins that are less functionally understood, unexpected distributions might indicate which merit further study.”
  
  The table should have at least the following information: gene/protein name - Wormbase ID - TPM levels of single cell data assigned to tissues for L2, L4, and adult (all published) - tissues in which expression is observed in the lines presented by the authors.
  
  We added some of this information such as annotated expression levels in young adults from various scRNA datasets (but not larval datasets as we did not image these). We note that each of these studies use different pipelines and report different metrics (scaled TPM/Z-score versus Seurat average expression versus TPM), so comparisons between them are not informative unless they are integrated and analyzed together.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors argue that establishing the expression pattern and subcellular localisation of an animal's proteome will highlight many hypotheses for further study. To make this point and show feasibility, they developed a pipeline to knock in DNA encoding fluorescent tags into C. elegans genes.
  
  Strengths:
  
  The authors effectively make the points above. For example, they provide evidence of two populations of mitochondria in the C. elegans germline that differ qualitatively in the proteins they express. They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.
  
  We appreciate the referee’s recognition that whole proteome tagging is feasible.
  
  Weaknesses:
  
  Cell biology in C. elegans is challenging because of the small size of many of its cells, notably neurons. This can make establishing the sub-cellular localisation of a fluorescently tagged protein, or co-localizing it with another protein, tricky. The authors point out in their introduction that advances in light microscopy, such as diSPIM, STED, and ISM (a close relative of SIM), have increased the resolution of light microscopy. They also point out that recent advances in expansion microscopy can similarly help overcome the resolution limit.
  
  (1) Have the authors investigated if the three fluorescent tags they use are appropriate for super-resolution microscopy of C. elegans, e.g., STED or SIM? Would Elektra be better than mTAGBFP2? How does mScarlet3-S2 compare to mScarlet 3?
  
  All three tags work for ISM (i.e., Airyscan). We previously tried Electra (not for the genes tested here) but could not isolate positive tags. Given Electra is not that much brighter on paper than mTagBFP2 we did not pursue it further, though we recognize that these may simply have been unlucky injections. mScarlet3-S2 is quite a bit dimmer than mScarlet3 on paper – the advantage is that it has higher photostability. In our view, the limiting factor will be having FPs that are bright enough to screen, image and scale to the whole genome, so brightness will likely provide an advantage over photostability at this stage.
  
  (2) Have the authors investigated what tags could be used in expansion microscopy - that is, which retain antigenicity or even fluorescence after the protocol is applied? It may be useful to add different epitope tags to the knock-in cassettes for this purpose.
  
  mSG and mSc3 retain fluorescence after fixing with formaldehyde. We have not tested mTagBFP2 fluorescence in fixed worms. We agree that adding different epitope tags would be useful.
  
  The paper is fine as it stands. The experiments above could add value to it and future-proof it, but are not essential. If the experiments are not attempted, the authors could refer to the points above in the discussion.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Merged figures appear saturated, and use colors that won't work for red-green colorblind viewers.
  
  For all figures, we also show individual channels separately, which is common practice for making fluorescence images accessible to colorblind readers (PMID: 33788834). Figures highlighting non-overlap like 6B and C are already in accessible colors when merged (blue/green) and include a numerical quantification. 3-color RGB images preserve the greatest information for the highest number of individuals.
  
  (2) Targeting ubiquitously expressed genes as a proof of concept gives me some concern that this might underestimate the challenges that may be experienced with less widely expressed genes.
  
  While the genes were predicted to be ubiquitously expressed, many were not in practice, like HXK-1 and F54C8.1, which were also among the lower expressed genes on our list and highly cell type restricted. As discussed, the more tissue restricted a gene, the likelier that bulk RNA levels underestimate expression. Such genes are therefore more likely to be detected in a specific tissue. We routinely isolate tissue restricted endogenous tags, including those expressed in only a few neurons, with bulk FPKMs lower than the ranges tested in this manuscript.
  
  (3) Some results are not shown or referenced (autofluorescence, for example, is shown using a schematic in Figure 1C).
  
  We now provide representative images alongside what would be expected to be observed by eye during screening.
  
  (4) It would be useful to describe how to recover worms from what is shown in Figure 1A.
  
  In the revised version, we added the following in the caption for Fig. 1A:
  
  “Selected worms expressing the brighter tag can be screened for dimmer tags by higher magnification and long exposure imaging. Worms can be recovered directly from slides if immobilized by levamisole as described (Ghanta et al., 2021). Alternatively, single hermaphrodite worms can be isolated, allowed to lay eggs, then screened.”
  
  (5) A blue bar of data must be missing from Figure 3B injection pool 5.
  
  As stated in the text, “All but one tag (cox-6B::mTagBFP2) was visible in the F1 generation of injected P0 animals, and these were subsequently isolated among F2 worms positive for the other tags in the pool.”
  
  To clarify that data points are not unintentionally omitted, we added the following text to the caption of Fig. 3B:
  
  “For group 5 including cox-6B::mTagBFP2, worms with detectable levels of mTagBFP2 fluorescence were not recovered in the F1 generation but were isolated among progeny of F1s positive for mStayGold and mScarlet3; we were thus unable to quantify efficiency for this locus at F1.”
  
  (6) Some expression or localization patterns were unexpected, but complications like germline silencing and protein mislocalization, with a small fraction localizing normally and rescuing function, were not presented as possibilities. Viability is used to confirm function, but without presenting whether this means 100% viability, less, or just the ability to maintain a strain.
  
  We already do discuss mislocalization and functionality issues in the Discussion, as well as tradeoffs of alternate methods. Any existing method to observe biological molecules, be it protein, RNA or DNA, has multiple drawbacks and sources of artifacts, which are unlikely to be fully eliminated in the foreseeable future.
  
  In regard to germline silencing of endogenously tagged genes in C. elegans, there is actually very little evidence for this. Collectively, various labs have now generated over 200 reporter alleles of germline-expressed genes (WormTagDB), with robust expression throughout the germline and retention of function. Likewise, numerous of our tags across fluorophores showed robust germline expressions including EEF-1A.1::mTagBFP2, Y22D7AL.10::mStayGold, and HAT-1::mScarlet3. In fact, overall transcript levels generally tended to underestimate germline enrichment at the protein level. We note that single-copy transgenes driven by eef-1A.1/eft-3 promoter by itself are frequently not expressed in the germline (PMID: 31064766); that we could detect EEF-1A.1 robustly in the germline when tagged endogenously is evidence that silencing is unlikely to be a widespread concern, and at the least less of a concern than single copy transgenes. We appreciate that for a transgene, presence/absence of specific sequence elements and genomic loci play a role in expression, but an endogenous tag captures all such information at a given locus.
  
  Indeed, we found only two reports of endogenous tags being silenced in the germline, the first being a novel tag (not fluorophore) which initially prevented expression at the tagged locus (PMID: 30109984), but after making changes to the sequence to avoid silencing signals the authors could rescue expression and thereafter saw robust expression in various novel contexts with this tag. The second example (PMID: 34547227) leaves open the possibility that germline repression of that particular gene might be a part of its endogenous regulation.
  
  Nevertheless, given it is probably rare if occurs at all, it will likely take a large scale tagging effort to uncover such cases at sufficient numbers to study. In our view, this further justifies tagging at large, ideally genomic, scales. If we do discover that there are numerous annotated germline proteins which we don’t observe by tagging, that would be interesting to study on its own.
  
  (7) Halotag is presented in the Discussion as a small tag, but it is bigger than GFP.
  
  Thank you for catching this. We have removed the discussion of Halotag. Given the comparable size to FPs, it would be unlikely to alleviate issues of tag functionality.
  
  (8) It would be useful to include FPKMs and viability percentages in Table 1.
  
  FPKM is included in column 6, but the title for this column is cut off. In the revised table FPKM values are now shown more clearly across stages.
  
  We did not quantify viability percentage. In our view it does not yield an informative metric when there is little information about the protein’s required dosage for function, which was the case for most proteins here. A haplosufficient gene might yield a full brood size even if 50% of protein function is lost; conversely, a highly dose sensitive protein could yield penetrant and severe inviability with mild perturbation of function. It also is not actionable information at this stage if there is no alternate tagging strategy as a baseline of comparison. The worms we picked to image all have viable embryos as adults, so in those individuals the genes were likely to be sufficiently expressed and functional.
  
  (9) Because establishing that a guide works well is a limiting step for many CRISPR experiments (once a guide works well, it's easy to inject 5 worms and get lines), I wondered if testing that for many genes is what is really needed in the field at this stage.
  
  Guide quality is rarely an issue in C. elegans, as for all the genes here we tried only one guide, all of which were previously untested. We now clarified this in the discussion section:
  
  “Notably, we find that previously untested guide RNAs and homology arms perform exceptionally well at novel loci, as we only tested one set of reagents for each locus which yielded satisfactory tagging rates.”
  
  (10) For a manuscript where the injection is so central to what was done, I was surprised to read in the Acknowledgments that all of the injections were done by someone who is not included as an author.
  
  We are likewise surprised by such a comment but gladly clarify: Chi Chen has been with us as an expert microinjection specialist for more than 25 years and her very important technical contributions have been acknowledged in many dozen papers. Multiple authorship guidelines, including COPE’s and ICMJE’s, state that technical contributions alone do not qualify for authorship.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) We would encourage the authors to provide systematic validation of the reported insertions. The manuscript reports that 24 of 30 tags were isolated and visible, but does not clearly state whether each isolated line was confirmed by sequence‑level validation to be correctly in‑frame and free of unintended mutations at the target locus.
  
  We appreciate the reviewer’s concerns on fidelity. These parameters have been assessed in prior published work (e.g., PMID: 30504364, PMID: 34748534) and in our hands are in the range of 80% whenever we sequence non-fluorescent tags of similar sizes. The efficiencies we observed are high enough that one can expect to recover numerous worms with the exact intended sequence for each target, though we would argue mutations within the FP reporter are less likely to matter if it retains high fluorescence.
  
  (2) The manuscript presents aggregated success counts (e.g., 8/10 mTagBFP2 tags, 9/10 mStayGold, 7/10 mScarlet3) and useful narrative descriptions of injection outcomes. We also suggest including per‑locus success rates.
  
  Figure 3B shows per locus success rate and source data is provided for this figure. Each dot is an individual injection and the Y axis is per locus rate. We now worded this more clearly in the figure’s caption.
  
  “Total insertion efficiencies per locus for the indicated targets across injection pools.”
  
  (3) For pools that required re‑injection after initial failures, we would like to see a description of the specific changes that were made to the injection mixes or procedures (e.g., new repair template prep, different Cas9 reagent lot, guide redesign). This will be useful troubleshooting information for others.
  
  We re-made the exact same injection mix but with nanodrop to ensure the purity of the repair templates as assessed by absorbance ratios (A260/230 and A260/280) were sufficient after each purification step. No other changes were made. This is now specified in the methods section in the following way:
  
  “For re-runs of pools 4, 6 and 10 which failed initially, we regenerated the repair templates and ensured that after each column purification, the A260/230 ratio of the purified DNA was ≥2.2 and A260/280 was 1.8 ± 0.05 when measured with a Nanodrop spectrophotometer.”
  
  (4) The authors state that the fluorophore sequences are codon-optimized for C. elegans. We suggest they provide the exact donor/tag sequences, specifically state whether the fluorophore sequences contain any synthetic/artificial introns, or whether other sequence modifications (e.g., silent PAM‑disrupting mutations) were included in the donor templates.
  
  This information is provided in Supplementary Table 1.
  
  (5) Page 3: Include a reference for "The C. elegans genome encodes around 20,000 genes"
  
  We added a reference to the most recent release of the genome (WS237, May 2013). Spieth et al., 2014.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.02.09.704846v2
www.biorxiv.org www.biorxiv.org

Molidustat Targets a Synthetic Lethal Vulnerability in APC-Mutant Colorectal Cancer through GSTP1 and PHD2 Co-Inhibition

1
1. Public_Reviews 26 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors aimed to uncover novel therapeutic vulnerabilities in APC-mutant colorectal cancer (CRC), which constitutes the majority of CRC cases. They hypothesized that modulating oxygen-sensing pathways (via PHD inhibition) could disrupt adaptive stress responses in these tumours.
  
  Strengths:
  
  The study employs a powerful, two-pronged approach to identify Molidustat's targets. By using both Thermal Proteome Profiling (TPP) and an orthogonal chemical proteomic competition assay, the authors provide compelling evidence that GSTP1 is a genuine, direct off-target, effectively addressing the common limitation of indirect effects in proteomic screens.
  
  Weaknesses:
  
  (1) In Figure 1, the current data rely on a single guide RNA (sgRNA). To make the data solid, at least two independent sgRNAs targeting different regions of PHD2 should be used.
  
  We thank the reviewer for raising this. Clarity on the CRISPR strategy was missing from the original submission and we have now added the following to the Methods (Page 4). We did not use a single sgRNA. PHD2 was targeted with a pool of three chemically modified crRNAs:
  
  (IDT Alt-R; target sequences: 5'-TACAACCAGCATATGCTACA, 5'GTGGCTGCCGAAGCCGAGCC, 5'-GATAAGATCACCTGGATCGA)
  
  Delivered as in vitro assembled ribonucleoprotein complexes with high-fidelity Cas9. This format has been reported to achieve high on-target efficiency while minimising off-target cutting [1,2] such that any residual stochastic off-target events are distributed across the population and are not expected to manifest as a coherent phenotype at the population level. Working with pooled, unselected knockouts rather than single-cell clones also avoids the confounds of clonal heterogeneity that normally motivate the use of multiple independent guides and rescue experiments in single-clone workflows. We have previously validated this approach for GSTP1 knockout in a separate single-cell proteomics study [3], where loss of GSTP1 protein was observed in over 90% of single cells and GSTP1 was the most significantly altered protein between sgControl and sgGSTP1 populations.
  
  (2) Figure 3E: Asn205 site should be mutated to prove that whether Molidustat inhibits GSTP1 activity via Asn205 or not.
  
  This is a good suggestion, and we explored it in silico before concluding it was not tractable. We used PyMol mutagenesis to model Molidustat binding to GSTP1 variants at the predicted contact residues: Asn205 was mutated to Ala, Gly and Ser; Trp39 (predicted to hydrogen-bond Molidustat) was mutated to Ala, Phe and Thr; and a Tyr8Phe/Asn205Ser double mutant was also modelled. In every case, Molidustat reoriented within the active site and adopted an alternative hydrogen-bonding configuration (most commonly with Tyr8), yielding a docking score equal to or better than binding to native GSTP1 (Author response image 1– Author response image 4). The model therefore does not predict any single or double point mutant that would ablate Molidustat binding in a clean, interpretable way, and we could not design a rational loss-of-interaction mutant on this basis. Given this limitation, and that definitive mapping of the binding interface would require co-crystallography, which is beyond the scope of the present study, we have moved the docking model to the supplement and flagged it as predictive rather than definitive.
  
  Author response image 1.
  
  Molidustat in native GSTP1
  
  Author response image 2.
  
  Molidustat docking with mutated GSTP1, Asn205 mutated to Gln205
  
  Author response image 3.
  
  Molidustat docking with mutated GSTP1, Tyr39 mutated to Phe39
  
  Author response image 4.
  
  Molidustat docking with mutated GSTP1, Asn205 mutated to Ser205 and Tyr8 mutated to Phe8
  
  (3) Figure 5B and 5C: The metabolic imbalance phenotype observed upon dual knockout of PHD2 and GSTP1 requires rescue experiments to confirm on-target specificity.
  
  We thank the reviewer for this important point and agree that rescue experiments could represent the most direct demonstration of on-target specificity for the metabolic phenotype observed in Figures 5B and 5C. These rescue experiments are necessary when working with single clones, as they allow for comparing a knock-out clone with a reconstituted pool and sidestep the issue of clonal heterogeneity.
  
  In our case, we think that there is no advantage to doing so, as we work with pooled knockouts, so any clonal heterogeneity is diluted in the pool.
  
  One could even make the case that such a rescue experiment would introduce additional artefacts. Combined loss of PHD2 and GSTP1 leads to reduced cellular viability, with decreased proliferation and increased apoptosis, consistent with a synthetic lethal interaction. To devise a rescue experiment, we would have to isolate a single-cell clone (the pool is not a complete 100% knock out, WT cells would outgrow the knock out cells). The isolation of such a clone that has overcome the anti-proliferative insult of the double knockout is likely to have a phenotype distinct from the original, pooled population, as would the rescued have from the WT cells. For these reasons, we have not performed rescue experiments in the current study. We have added the absence of a rescue as a limitation to the study in the discussion
  
  “While genetic rescue experiments would provide definitive confirmation of on-target specificity, the pronounced loss-of-fitness and apoptotic phenotype observed upon combined PHD2 and GSTP1 loss limited the feasibility of establishing stable rescued double-knockout populations, and therefore represents a limitation of the current study.”
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors aimed to determine Molidustat targets and the potential utility of these findings. They clearly demonstrate that Molidustat interferes with GSTP1 and some other proteins on top of PHD2. They also demonstrate that PHD2 deletion is not sufficient to recapitulate Molidustat effects in cells and proteomes. Finally, they demonstrate synthetic lethality in organoids for Molidustat and APC deletion.
  
  Strengths:
  
  The data on Molidustat proteomes, GSTP1 binding, inhibition and metabolic health of organoids is really clear. All biochemical, docking and omic data are really strong. The potential impact of these findings could be the use of Molidustat in APC null tumours and awareness of potential off-target effects.
  
  Weaknesses:
  
  A main but minor weakness is that Molidustat also inhibits other PHDs, although these are less expressed. PHD1 has been shown to control the cell cycle and be expressed in the colon, where it is needed for viability. Although this does not explain the lack of effect of other PHD inhibitors, it does warrant some discussion. The use of MTT is not very good to detect viability when it measures metabolism; this also needs to be discussed and perhaps supplemented with colony or cell number measurements.
  
  Great point, for this reason, we have assayed apoptosis throughout. In addition, we have added a clonogenicity assay with APC organoids. Organoid cells were treated with an acute dose of Molidustat. We subsequently measured the level of Lgr5 (a stem cell marker) and of the ability of the cells to generate organoids (these data have been added as Figure 5 F-G.)
  
  Reviewer #3 (Public review):
  
  In this paper, the authors revealed that Molidustat can induce a dose-dependent increase in Caspase-3/7 activity in the HT29 cell line, which is an APC-mutant colorectal cancer cell line. More importantly, they found that targeting PHD2 alone cannot cause cell death. By using thermal proteome profiling (TPP) and orthogonal chemical proteomic competition assays, they determined GTSP1 as a previously undiscovered off-target of Molidustat. They also revealed that combined PHD2 and GSTP1 loss leads to an increase in intracellular ROS and apoptosis. Moreover, they evaluated the effects of Molidustat in colonic organoids and showed that
  
  Molidustat has a high selectivity for colonic organoids with activated WNT signaling and/or KRAS pathway alterations, and this effect is not reproduced by hydroxylase inhibition alone, providing a new potential approach to targeting both PHD2 and GTSP1 for the treatment of APC-mutant CRC.
  
  Specific comments:
  
  (1) What is the possible molecular mechanism of dual GSTP1/PHD2 loss, inducing cell death?
  
  This is an important question. Our data support a model in which combined loss of GSTP1 and PHD2 disrupts cellular redox homeostasis, leading to accumulation of reactive oxygen species, increased GSSG/GSH ratios, and depletion of antioxidant buffering capacity. This redox imbalance is accompanied by downregulation of pro-survival pathways. In this context, activation of apoptotic signalling, as evidenced by increased caspase-3/7 activity and proteomic enrichment of apoptosis-associated pathways, contributes to the observed cell death phenotype.
  
  While apoptosis is supported by our data, the magnitude of oxidative stress suggests that additional oxidative stress-associated cell death mechanisms may also contribute. We have clarified this point in the Discussion (Page 11).
  
  (2) Can the authors mutate the binding site of Molidustat on GTSP1 to verify the in silico docking results?
  
  This is a very important question. Currently, the model is of limited value. Reviewer 1 had a similar question. Can we refer you to Reviewer 1, question 2.
  
  (3) Evidence for Molidustat inhibiting PHD2 activity or stabilising HIF-1α should be provided.
  
  We thank the reviewer for this suggestion. Data showing HIF-1α stabilisation and evidence of downstream signalling is now added to Supplementary Figure 1.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  I only have minor suggestions:
  
  Molidustat also inhibits other PHDs, although these are less expressed. PHD1 has been shown to control the cell cycle and be expressed in the colon, where it is needed for viability. Although this does not explain the lack of effect of other PHD inhibitors, it does warrant some discussion. The use of MTT is not very good to detect viability when it measures metabolism; this also needs to be discussed and perhaps supplemented with colony or cell number measurements.
  
  This is correct, PHD1 is of particular interest, given the effects inhibition/knock-out has on the inflamed colon. We have added a new paragraph to the Discussion (Page 13) that addresses the isoform selectivity of Molidustat. We note that, although developed as a PHD2 inhibitor, Molidustat retains appreciable activity against PHD1 and PHD3 [4], and we discuss the non-redundant and in some contexts opposing roles of PHD1 and PHD2 in the colon, PHD1 loss is protective in DSS colitis [5] and restrains colitis-associated tumour growth, whereas PHD2 loss in the tumour and stroma is reported to inhibit metastasis and treatment response [6]. We further note that this pattern of isoform engagement is shared with other pan-PHD inhibitors that did not phenocopy Molidustat in our screens, indicating that PHD isoform profile alone is insufficient to explain Molidustat’s distinctive activity and pointing to GSTP1 off-target engagement as the key distinguishing feature. We argue that localised colonic delivery (as discussed earlier in the Discussion) would concentrate drug at the APC-mutant epithelium while limiting systemic exposure.
  
  We fully agree with the reviewer, MTT measures metabolic activity/NADH levels rather than viability in the strict sense, and that this is particularly relevant for a compound that perturbs redox metabolism. We have added a clonogenicity assay in APC organoids (Fig. 5 F-G) to supplement the MTT and Cleaved Caspase 3 assays already present in the manuscript.
  
  (1) Lee, J. K. et al. Directed evolution of CRISPR-Cas9 to increase its specificity. Nat. Commun. 9, (2018).
  
  (2) Sakovina, L., Vokhtantsev, I., Vorobyeva, M., Vorobyev, P. & Novopashina, D. Improving Stability and Specificity of CRISPR/Cas9 System by Selective Modification of Guide RNAs with 2′-fluoro and Locked Nucleic Acid Nucleotides. Int. J. Mol. Sci. 23, (2022).
  
  (3) Makar, A. N., Holkham, J., Lilla, S., Wilkinson, S. & von Kriegsheim, A. Overcoming preservation challenges to enable single-cell proteomics of fixed cell and tissue samples with retained proteome integrity. Preprint at https://doi.org/10.1101/2025.03.10.642380 (2025).
  
  (4) Flamme, I. et al. Mimicking hypoxia to treat anemia: HIF-stabilizer BAY 85-3934 (molidustat) stimulates erythropoietin production without hypertensive effects. PLoS One 9, (2014).
  
  (5) Tambuwala, M. M. et al. Loss of prolyl hydroxylase-1 protects against colitis through reduced epithelial cell apoptosis and increased barrier function. Gastroenterology 139, (2010).
  
  (6) Leite de Oliveira, R. et al. Gene-Targeting of Phd2 Improves Tumor Response to Chemotherapy and Prevents Side-Toxicity. Cancer Cell 22, (2012).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.31.702998v2
www.biorxiv.org www.biorxiv.org

Pupil size reveals the perceptual quality and effortless nature of synesthesia

1
1. Public_Reviews 26 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This study used pupillometry to provide an objective assessment of a form of synesthesia in which people see additional color when reading numbers. It provides convincing evidence that subjective color ratings are matched by changes in pupil size that recapitulate brightnessmediated changes when exposed to the real color. The work provides a valuable contribution to the literature on both synesthetic perception and the use of pupillometry to probe perception and related psychological processes.
  
  We were pleased to learn that our manuscript was of interest to the reviewers and the editor. We thank the reviewers for their useful feedback and have addressed all their comments in the revised version. We here give the most prominent changes as quotes.
  
  We thank all reviewers and for their very helpful input.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Knowing that small pupil-size variations accompany brightness variations (even when these are illusory), the authors asked whether pupil constrictions would accompany the synesthetic perception of a brighter color (compared with a darker one), induced by the presentation of a blackwhite character. This grapheme-colour synesthesia is only experienced by a few participants, sixteen of whom were enrolled in this study. The results reliably showed that a relative pupil constriction would "betray" the perception of a brighter color in these participants, while no such effect would be observed in control participants who were asked to report a color in association with each grapheme, even though they did not perceive any.
  
  Strengths:
  
  The main strength of the study lies in its combination of psychophysics (brightness ratings) and pupillometry, which allowed for showing clear-cut results.
  
  Weaknesses:
  
  Some relatively minor weaknesses concern the ancillary analyses, which tackle secondary questions and are not entirely convincing.
  
  (1) The linear mixed model approach is a powerful way to identify important variables, but it does not clarify whether the key factors are between-subject or between-trial variations. Some variables are inherently defined at a subject level (e.g., PA scores), others are not. I would strongly recommend an alternative visualisation of the results to examine inter-individual variability.
  
  Visualizing the highly idiosyncratic effects is indeed challenging. Addressing R1’s point 4 and a point brought up by R2, we updated all figures to now visualize pupil size in millimeters instead of arbitrary units. Furthermore, we added a supplementary figure (supplementary figure 4) that visualizes pupil size change without demeaning (please see reply to point 4).
  
  To get a better grasp of the interaction between lightness and coupling strength, we further included the supplementary figure 5 that splits by lightness and coupling strength in synesthetes.
  
  Furthermore, as this review and response will be publicly available, Author response image 1 provides participant-mean traces per lightness bin in addition to the overall means and hopefully makes the stability/variability of effects visually clearer (in addition to the strip plots that attempt this for the average response).
  
  Author response image 1.
  
  We hope that these additional visualizations make the effects of interest more transparent. Ultimately, however, the LME figure likely provides the information best, albeit at the cost of complexity.
  
  (2) It is not clear why taking the first derivative of pupil size in Figure 5 would isolate the effect of arousal, eliminating those of luminance and contrast changes (in fact, one could argue for the opposite, since arousal effects are generally constant for extended periods of time while contrast effects are typically more local and transient).
  
  First, please note that the results in 2.3.1 cannot be explained by task or context effects such as luminance and contrast: the exact same active color reporting task (same task and context) was presented to synesthetes and non-synesthetes.
  
  Indeed, the reviewer is correct that the first derivative does not eliminate other concurrent pupil-driving effects, that was expressed wrongly in our original text. Indeed, any stimulus-locked effect, such as the luminance and contrast effects, but also the effort effect will reflect similarly in the derivative measure.
  
  We did take the derivative because pupil responses driven by other non-trial related activity, such as increasing tiredness or excitement over the course of trials differ almost by necessity between participants, thus creating variability. However, these effects are most likely happening at a slower timescale and thus show less in the derivative measure. Accordingly in past research, we previously found clearer response-locked effects in the past when using a derivative measure (Douze et al., 2025; Ten Brink et al., 2024). This way, we also hoped to get rid of such variability that happens between participants for this between participant analysis.
  
  Even if we were to use the same baseline corrected analysis, we would arrive at the same conclusion: we here directly compared baseline-corrected pupil sizes by taking individual differences into account (using a LME). In other words, we tested for the same question, but not relying on the derivative. We thus compared baseline-corrected pupil sizes using over-time LMEs. Group (active control vs. synesthete) gained significance between ~1.7s and 3s, aligning with the derivative-based result.
  
  Author response image 2.
  
  t-values of a per-time point LME predicting pupil response from group (synesthete/active control) Group reached significance.
  
  In sum, we deem the derivative more powerful/more appropriate in this context, but the interpretation of findings does not hinge on that analysis choice (as can be seen in the Author response image 2).
  
  We corrected the claims on the derivative as a measure cleaning out other effects that indeed was oversimplified as it stood. We now write:
  
  “Mental effort presents in task-evoked pupil dilations, yet other factors simultaneously affect the pupil, such as luminance and contrast changes at trial onset, as well as slower trends across the session (e.g., fatigue). To reduce the influence of these slower, non-trial-locked fluctuations while retaining the trial-evoked dynamics, we calculated the first derivative of the pupil time course to assess the velocity of pupillary changes (Butterworth filter, 18 Hz, order 3, 2.5 Hz lowpass, following our previous works [60, 61]).”
  
  Douze, B. T., Ten Brink, A. F., Dijkerman, H. C., & Strauch, C. (2025). Pupil responses objectively index pharmacologically altered tactile sensitivity. Cortex, 193, 90-104.
  
  Ten Brink, A. F., Heiner, I., Dijkerman, H. C., & Strauch, C. (2024). Pupil dilation reveals the intensity of touch. Psychophysiology, 61(6), e14538.
  
  (3) It is a pity that responses to physical brightness modulations were only measured in the synesthete group, not in controls, as this would have allowed for ruling out differences in pupil reactivity across the two populations.
  
  The reviewer is correct that this would allow additional comparisons, but argue that light responses in healthy control samples are very well documented and stereotypical. For instance, Bergamin & Kardon (2003) provide very systematic latency estimations, for low-luminance change stimuli in the realm of about 320ms that can accelerate to about 250ms for very strong luminance changes. Our relatively small luminance increments should thus be expected in this range. Indeed, this also well describes the response latencies we observed in synesthetes when exposed to the colored disks. While there is no detailed information about participants in Bergamin & Kardon (2003), data from previous studies shows very similar pupil light response profiles in a healthy student control population that matches our synesthetes well demographically (Strauch, Romein et al., 2022 Figure 2a, exact same lab as for the present study; Koevoet et al., 2025 Figure 3a). See also the further responses, baseline pupil size in millimeters across groups did not differ.
  
  Together, we can safely conclude that pupil light responses in synesthetes are not different from pupil light responses in controls. We agree with the reviewer that this is a sensible point to also make in the manuscript:
  
  “Specifically, pupil size first responded significantly to physical luminance after 330 ms (see Supplementary Figure 7 for per-timepoint LME; in line with response latencies of similar control populations, see Bergamin & Kardon [52], Koevoet et al. [40], and Strauch et al. [53]), but only responded significantly to synesthetic lightness at about 870 ms (see also Figure 3c vs e and Figure 4 for per-timepoint LME)”.
  
  Bergamin, O., & Kardon, R. H. (2003). Latency of the pupil light reflex: sample rate, stimulus intensity, and variation in normal subjects. Investigative Ophthalmology & Visual Science, 44(4), 1546-1554.
  
  Koevoet, D., Naber, M., Strauch, C. & Van der Stigchel, S. Presaccadic Attention Shifts Up-and Downwards: Evidence From the Pupil Light Response. Psychophysiology 62, e70047 (2025).
  
  Strauch, C., Romein, C., Naber, M., Van der Stigchel, S., & Ten Brink, A. F. (2022). The orienting response drives pseudoneglect—Evidence from an objective pupillometric method. Cortex, 151, 259-271.
  
  (4) Another concern is with the visualisation of the pupil traces in Figure 3 (main results); these were heavily pre-processed (per-participant demeaned), losing any feature besides the effect of interest and generating the unrealistic expectation that perception of dark/bright colors generate a net dilation/constriction of the pupil - whereas perception-related modulations of pupil size are always relative and generally small compared to the numerous other effects registered in pupil size. It would be far better to see the actual profiles, preserving the unfolding of dilations and constrictions over time, especially since these are further analysed in Figures 4 and 5.
  
  Indeed, the expectation that any dark synesthetic experience would lead to pupil dilation whereas any bright synesthetic experience would lead to constriction is not warranted – it would only do that relative to the counterfactual of not having that experience.
  
  Many factors affect the pupillary signal at the same time, and often differently across individuals (think of tiredness etc.), making merely baseline corrected traces seemingly noisy. Our visualization highlights that there is a systematic part to that variation that lies in the synesthetic brightness experience.
  
  Visualizing the effects of idiosyncratic experiences, varying within and between participants is challenging. For the theoretical insight brought about through our paper in Figure 4 (synesthesia being sensory in nature), demeaning is favorable in our opinion as it isolates the effect of interest in visualization. However, for methodological reasons and to better show effect sizes etc., there is certainly use in additional transparency. We now thus provide non-demeaned traces in the supplementary material as the reviewer suggested and also refer to these in the main manuscript. Furthermore, all figures are now provided in millimeters, with all pupil related analysis being rerun and updated to this end (without qualitative changes to the results). This should further rectify possibly inflated expectations about the absolute size of effects and allows to put effects into perspective across studies. We now added:
  
  “Pupillary data were transformed from arbitrary eyelink units to millimeters using a conversion factor obtained with an artificial eye (see Hayes & Petrov, 2016).”
  
  Hayes, T. R., & Petrov, A. A. (2016). Mapping and correcting the influence of gaze position on pupil size measurements. Behavior research methods, 48(2), 510-527.
  
  Impact:
  
  Despite these weaknesses, and especially if they are adequately addressed in the review, this work is likely to improve our understanding of synesthesia, providing a new tool to quantify the subjective sensations; an interesting potential extension would be using pupillometry for tracking changes over time of the synesthetic experiences, opening up the possibility to evaluate the importance of learning for this peculiar experience.
  
  We were happy to read our manuscript was evaluated this positively and hope that our replies can address the remaining smaller concerns and make findings more transparent to the readers.
  
  Reviewer #2 (Public review):
  
  Synesthesia is a neurological condition where stimulation of one sensory channel leads to involuntary, automatic, and consistent experience of another, unrelated percept. For example, Sir Francis Galton (1880, Nature) famously described the robust tendency of some individuals (synesthetes) to associate numerals with a distinct color. Ever since, synesthesia has continued to attract a broad interest in the cognitive neurosciences in light of its implications for the study of domains such as perception, consciousness, and brain connectivity, among others.
  
  Strauch, Leenaars, and Rouw measured pupil size in a group of 16 grapheme-color synesthetes and two matched control groups. The participants were presented with gray digits - that is, visual stimuli having identical physical properties in terms of brightness. Each participant subsequently rated the corresponding evoked color and brightness: unlike controls, synesthetes did so in a very consistent and reliable fashion. Accordingly, this was also shown in their pupils: despite the same objective luminance, digits associated with brighter percepts caused their pupils to constrict, and digits associated with darker percepts caused their pupils to dilate more than controls. These results highlight how crossmodal correspondences are deeply rooted in synesthetes, and put forward pupillometry as a particularly appealing biomarker for some phenomenological experience (at least those grounded in "brightness").
  
  Further strengths of the technique are its temporal resolution and its responsiveness to several constructs. Across several tasks, the authors show, for example, that responses to synesthetic light are somewhat slower than responses to real light (i.e., they are likely mediated), but at the same time faster than responses to mental imagery. The role of mental imagery can also be reasonably dismissed when considering the second feature of pupil size: its responsiveness to mental effort and cognitive load. The pupils tend to dilate with demanding, challenging tasks, and this was the case when control participants were asked to report the color of a digit for which they did not consistently experience a synesthetic association. The same task was, instead, seemingly effortless for synesthetes, again speaking in favor of the automaticity of number-color correspondences in their case.
  
  Overall, the findings by Strauch, Leenaars, and Rouw are highly significant for the field and likely to be impactful. The strength of their evidence, when accounting for the relatively small sample size and the inherent variability of both phenomenology (color perception and subjective reporting) and physiology (pupil size), is adequate and sufficiently convincing.
  
  We were glad to read this overall very positive assessment of our work and thank the reviewer for the additional non-public suggestions for improvements.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In the present study, the authors examined pupillary responses to uncolored stimuli (number graphemes) among number-color synesthetes and non-synesthetes. After seeing a digit, the synesthetes and active control participants were asked to indicate which color they perceived using three dimensions of hue, saturation, and lightness. The lightness values were the primary independent variable for follow-up analyses. To see how the pupil responded to psychologically "bright" and "dark" digits, the authors split the reported lightness values at the median and plotted them. The synesthetes showed a pupillary constriction to digits they perceived as bright and dilation to digits they perceived as dark. Active control participants did not show that effect. In a subsequent block, only the synesthetes were shown the colors they reported perceiving as colored discs. Their pupillary responses were similar. The authors also found that the differences in pupillary responses between light and dark perceptions (with digits) were only slightly delayed in their onset to the perception of a colored disc, and therefore, the color perception accompanying a digit is unlikely to be effortful or a retrieved association, but occurs rather automatically.
  
  Strengths:
  
  The authors employed a well-controlled and designed quasi-experiment comparing colorgrapheme synesthetes to non-synesthetes and showed convincingly that the color perceptions accompanying graphemes alter the physical perception of brightness. They also made a reasoned attempt to rule out the possibility that color associations are occurring effortfully via retrieved associations.
  
  We appreciate the positive assessment and useful suggestions for revision.
  
  Weaknesses:
  
  There are some areas in which the implications of these findings could be elaborated upon. I had the following questions:
  
  (1) Are the pupillary responses among synesthetes, which objectively do not seem to match the degree of physical stimulation entering the retina, in any way maladaptive for eye functioning? I understand the constriction/dilation of the pupil to not only benefit visual acuity but also to protect the retina from damage. Are synesthetes at any risk of retinal damage due to over-dilation of the pupil to brighter stimuli? Or are these effects of a magnitude that is too small to matter? As reported in arbitrary units, it was hard to know how large these effects were in terms of measurable changes in dilation (e.g., millimeters).
  
  This is an interesting point. Some argue that pupil size changes in a mid-range mildly affect optics thus affecting detection performance, contrast perception, and depth of field (Eberhardt et al., 2022, Mathôt & Ivanov 2019, Ruuskanen, Boehler, & Mathôt, 2025), rather than serving a protective role for the retina (Mathôt, 2018). Indeed, any effects reported here were quite small. We agree with the reviewer that this can be made more accessible by reporting effects in millimeters. We thus now adjusted all figures accordingly and write in the methods section:
  
  “Pupillary data were transformed from arbitrary eyelink units to millimeters using a conversion factor obtained with an artificial eye (see Hayes & Petrov, 2016).”
  
  Note that even the largest effects here (those elicited by physical luminance change in block 2 for the synesthetes) only caused differences in pupil size of about 0.3mm. This lies below the maximal pupil dilations observable in response maximal effort (about 0.5mm), for instance, and substantially below the full range of pupil size changes elicited through strong luminance stimulation (several millimeters). We therefore deem the changes in pupil size as obtained in our study too minor to be practically maladaptive for optics/perception.
  
  Eberhardt, L. V., Strauch, C., Hartmann, T. S., & Huckauf, A. (2022). Increasing pupil size is associated with improved detection performance in the periphery. Attention, perception, & psychophysics, 84(1), 138-149.
  
  Hayes, T. R., & Petrov, A. A. (2016). Mapping and correcting the influence of gaze position on pupil size measurements. Behavior research methods, 48(2), 510-527.
  
  Mathôt, S., & Ivanov, Y. (2019). The effect of pupil size and peripheral brightness on detection and discrimination performance. PeerJ, 7, e8220.
  
  Mathôt, S. (2018). Pupillometry: Psychology, physiology, and function. Journal of cognition, 1(1), 16.
  
  Ruuskanen, V., Boehler, C. N., & Mathôt, S. (2025). The Interplay of Spontaneous Pupil-Size Fluctuations and EEG Power in Near-Threshold Detection. Psychophysiology, 62(3), e70035.
  
  (2) Likewise, is the automatic synesthetic merging of two percepts something that could be learned such that natural synesthetes and "artificial" synesthetes would look similar? For example, if a group of non-synesthetic participants were to learn a color-grapheme association to automaticity, would you expect their pupillary responses to the graphemes look similar to the synesthetes'? If so (or if not), what would this tell us anything about the phenomenology of synesthesia?
  
  We find this question most interesting. Likely, different synesthesia researchers wouldn’t even fully agree on the most plausible answers to these questions. Training studies have shown that nonsynesthetes can be trained to associate particular colors to particular graphemes, as revealed in the synesthetic Stroop effect: interference effects of the learned color onto reporting the typeface color of the grapheme. The degree to which non-synesthetes can be trained to become similar to synesthetes is however still topic of debate.
  
  We now discuss as follows:
  
  “Future studies could examine to what degree training a non-synesthete to associate specific colors to particular inducers (e.g., digits), can provide similar patterns of results as genuine synesthesia (Bor et al., 2014, Colizoli et al., 2012, Rothen & Meier, 2014). Could learning produce similar brightness-related pupil effects in non-synesthetes? Similarly, would effort-linked responses diminish with increased training duration? The perhaps most interesting question relates to response latencies: Would a trained participant ever be able to produce brightnessrelated pupil effects as fast as a synesthete?”
  
  Bor, D., Rothen, N., Schwartzman, D. J., Clayton, S., & Seth, A. K. (2014). Adults can be trained to acquire synesthetic experiences. Scientific reports, 4(1), 7089.
  
  Colizoli, O., Murre, J. M., & Rouw, R. (2012). Pseudo-synesthesia through reading books with colored letters. PloS one, 7(6), e39799.
  
  Rothen, N., & Meier, B. (2014). Acquiring synaesthesia: insights from training studies. Frontiers in human neuroscience, 8, 109.
  
  (3) Do the synesthetic perceptions of digit graphemes merge in a sensible way? For example, if a synesthete sees a particular color with the digit 1, and a different color with the digit 9, what do they perceive when they see 19? or 1-9, or 1 9? Is there color blending, or an altogether different color perception?
  
  This is a very interesting question indeed. While each synesthete will have their own specific expression of synesthesia, there are regularities in how a combination of digits evokes synesthetic color. First, if asked about the color of a specific digit, each digit keeps its own color, as the color of a digit is linked to the identity of the digit (Dixon et al., 2006). Context effects are however possible, in particular when context alters the interpretation of the digit (Myles et al., 2003). A particularly common context in a multi-digit number is a dominant first digit, spreading its color to the subsequent digits in the number. However, as the digit color is linked to digit identity, what does ‘not’ happen is a mixing of colors into a qualitatively new color; for example, a yellow "1" and blue "9" do not merge into a green "19".
  
  Dixon, M. J., Smilek, D., Duffy, P. L., Zanna, M. P., & Merikle, P. M. (2006). The role of meaning in grapheme-colour synaesthesia. Cortex, 42(2), 243-252.
  
  Myles, K. M., Dixon, M. J., Smilek, D., & Merikle, P. M. (2003). Seeing double: The role of meaning in alphanumeric-colour synaesthesia. Brain and Cognition, 53(2), 342-345.
  
  Many thanks for the constructive assessment of our work.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) I am not sure I'd use the term 'cross-modal' given that the case considered here (graphemecolor) is purely visual.
  
  The reviewer is absolutely right: the term 'cross-modal' has a historical background rather than reflecting an exact factual accuracy. The term is still commonly used however, as it readily reflects how the induced additional experience is always of a different (sub)type than the inducing experience. There is a cross-over between experiences that might occur within the same sensory modality, or even induce awareness of a particular concept. But key to synesthesia is the crossover experience as the inducer and concurrent are different (sub)types of experiences. For example, seeing a letter can evoke a synesthetic experience of seeing a color, or evoke awareness of a particular gender or personality of that letter, but does not evoke another letter. To remain consistent with literature, we refer to 'cross-modality' when explaining the link to previous literature, but generally switched to using 'cross-over experience':
  
  “Therefore, synesthesia might provide a unique window into how the brain’s constructive processes can generate additional, conscious content, in cross-over experiences, often across modalities, going all the way down to the level of sensory phenomenology.”
  
  We adjusted throughout the manuscript accordingly.
  
  (2) I would not recommend focusing the introduction on the problem of qualia; this is a much more general and complex question than the one addressed in the study; the space of the introduction may be better used to present the actual object of study, giving a better picture of the synesthetic phenomenon and of previous work aimed at characterising it (behavioural, including PA scores and consistency measures, and neuroimaging). It is important to discuss how the pupillometric approach differs from the previously adopted neuroimaging techniques and what it can add to those.
  
  We agree that qualia is a very general and complex question. However, we respectfully disagree that this complex question is not the object of the study. What is remarkable about synesthesia is not the presence of an additional perceptual association per se, but the presence of a specific perceptual experience. As illustration, think of a test where an unconscious color association to the word 'banana' was tested. While a generic 'yellow' could semantically be linked and would likely be obtained in the (e.g. priming) experimental results, a follow-up question of picking on a color wheel the exact shade of yellow to this association, or describing the perceptual sensation of the color, would be non-sensical to the participants.
  
  This sharply contrasts with the current study: synesthetes, but not non-synesthetes, indicate a perceptual sensation of additional colors, and subsequently indeed the sensory properties of this percept (experienced brightness) affects the objective reflection of this sensation (pupil size) in synesthetes but not in non-synesthetes. In our view, the presence of additional qualia is key in understanding what sets synesthetic apart from non-synesthete associations, including so-called cross-modal correspondences (unconscious consistent associations across modalities, common to us all). We even believe that the reported qualia is what makes synesthesia so interesting in the first place. We now more clearly explain this link to qualia better in the introduction.
  
  "The most remarkable aspect of synesthesia is the subjective perceptual phenomenology of the induced colors, setting these sensations apart from color memory, thought, or amodal association. The contrast between synesthetes and non-synesthetes can thus offer an interesting doorway into examining qualia, the subjective perceptual phenomenology or first person (what's-it-like) perspective."
  
  We also improved the explanation of the synesthetic phenomenon, including a more detailed characterisation of behavioural measures (including consistency scores) and added neuroimaging studies. These changes have been incorporated into the text in response to previous comments (point 1- reviewer 1).
  
  Please note that we have chosen not to include more detailed discussion of PA scores. Our results show a trend but do not allow for a conclusive interpretation on PA scores, and we feel that placing greater emphasis on this topic might therefore be confusing or even misleading. Still, it would be a very interesting topic for follow-up research to examine how alterations in characteristics of the synesthetic experience influence pupil responses.
  
  The different synesthesia types all share the defining characteristics of an additional conscious and consistent experience. Synesthetes can verbally report their additional experience, and synesthetic sensations can be measured in behavioral paradigms such as the ’synesthetic Stroop’ effect, or brain activation patterns in sensory cortex [15]. Furthermore, test-retest paradigms show how synesthetic, but not non-synesthetic associations are highly specific and consistent [16-18]. Thus, over the past decades, research has established synesthesia as a ’real’ condition that can reliably be identified using behavior, neurophysiology, and neuroimaging [11, 13, 15–21]. The most remarkable aspect of synesthesia is the subjective perceptual phenomenology of the induced additional sensation, i.e., color in grapheme-color synesthesia. This sets synesthetic sensations apart from (color) memory, thought, or amodal association. Synesthesia can thus offer an interesting doorway into examining qualia, the subjective perceptual phenomenology or first person (what’s-it-like) perspective.
  
  We now discuss the pupillometric approach as it differs from the previously adopted neuroimaging techniques as follows:
  
  “Compared to neuroimaging studies [12,15,51], pupillometry may offer a more direct window into synesthetic phenomenology, as the directionality between pupil light reflex and perceived brightness is straightforward. Finally, improved understanding of the underlying processes can be obtained by contrasting responses to perceived versus actual (physical) brightness, given that the pupil light reflex is a well-characterised reflex arc involving few inferential steps.
  
  This adds to the explanation that was already present on how the current approach differs from previous techniques, and what it can add to those techniques:
  
  "Instead, current paradigms capturing synesthesia employ objective measures, but fail to capture its phenomenology [16, 17, 21, 23]."
  
  (3) There are a few typos and word repetitions.
  
  Many thanks – we identified typos and repetitions after another set of careful reads and hope to have eradicated them completely now.
  
  Reviewer #2 (Recommendations for the authors):
  
  I am overall very supportive of this work, but addressing the following points may enrich it further:
  
  (1) Paragraph 2.2.1. Here, models do not seem to compare synesthetes versus controls but rather assess the effects of interest separately in the two groups. The fact that experimental effects are significant in synesthetes, but not in controls, does not tell us much about differences between groups. Controls (e.g., Figure 3) do show a similar trend, albeit clearly smaller. There is one passage in which this issue appears to be tackled (page 10): "Critically, in an LME ran on synesthetes and controls and using only graphemes and the interaction of group and lightness as predictors, we found lightness to predict pupil size in synesthetes (t = -2.754, p = 0.006), but not controls (t = -1.134, p = 0.257)." But I am not sure that the reported statistics belong to the interaction - they seem to refer to the lightness effect within each group, not the difference.
  
  This is an important point, power for between-group comparisons is inherently limited for n = 16 per group (while still feasible for overall responses, things become trickier when less trials remain). A simple model of pupil ~ grapheme + group * lightness_scaled + (1 | participant) shows no significant interaction (despite one group showing the effect and the other not showing the effect significantly). The additional negative effect for group is in line with the effort-related effect reported later in the manuscript. Where does this leave us? Based on the lightness responses alone, the group difference can be characterized as a quantitative distinction, but the degree in which it is also a qualitative distinction cannot clearly be determined from current data. We revised the manuscript to make sure that such an interaction is not implied/ point to the absence of the significance of that interaction.
  
  The sensory nature of synesthetic color is supported by within-synesthete analyses, where coupling strength parametrically modulates the lightness-pupil relationship in a theoretically predicted manner. Importantly, the effort-related findings provide a complementary and statistically robust group comparison: synesthetes and controls performing the identical colorreporting task showed significantly different pupil dilation rates, directly demonstrating that the two groups differ in how they access color information. Together, these two independent pupillometric signatures, one tracking perceptual quality, one tracking effort, converge on the same conclusion and mutually reinforce the interpretation that synesthetic color constitutes genuine sensory phenomenology.
  
  Author response image 3.
  
  We now make this more explicit in the manuscript as follows:
  
  “We found significant modulations of pupil size by the lightness of the grapheme's synesthetic color - sustained and in the to-be-expected time window. Specifically, the pupil constricted more for brighter reported colors, and dilated more for darker reported colors, as predicted (Average pupil size 800-4000ms, t = -3.601, p < 0.001). In an LME ran for synesthetes and controls and using only graphemes and lightness as predictors, we found lightness to predict pupil size in synesthetes (t = 2.844, p = 0.004), but not controls (t = 0.606, p = 0.544). However, when taking group as interacting factor in a joint LME, there was no interaction of lightness and group (t = -0.949 p = 0.342).”
  
  and
  
  “For controls a separate model was run, now without the PA score as predictor (not assessed for controls). Neither lightness (t = -0.815, p = 0.415), coupling strength (t = 0.438, p = 0.661), nor their interaction gained significance (t = -1.058, p = 0.290; all for average pupil size between 800 ms and 4000 ms). Critically, we also ran a LME with the three-way interaction of coupling strength, group, and lightness (Wilkinson notation: pupil = grapheme + group + lightness * group + coupling strength * lightness * group + (1 | participant)). This analysis revealed a significant three-way interaction between lightness, coupling strength, and group (F = 3.86, p = .021), indicating that the lightness × coupling strength effect on pupil size was not equivalent across groups. Decomposing this interaction by group, the lightness × coupling strength slope was significant in synesthetes (t = 2.59, p = .010) but not in controls (t=-1.01, p=.311), suggesting that reported lightness and its coupling strength were more consistently related to pupil size in synesthetes than in controls. Note however, that this decomposition does not directly test whether the two slopes significantly differ from each other, however. Lastly, pupil size was marginally larger in controls than in synesthetes (t = 1.94, p = .062; see later sections for more in-depth analyses)”
  
  (2) The authors choose to analyze pupil size in arbitrary eye tracker units. This is fine, although I would recommend assessing and reporting whether the average pupil size (e.g., during the baseline) is roughly comparable between groups. The size of the effects may be difficult to compare between groups in the presence of very different baseline pupil size.
  
  Please see Author response image 4 for Baseline pupil sizes per group in millimeters. There were no differences between groups.
  
  Author response image 4.
  
  F2, 45) = 0.707, p = 0.499 (One-way Anova).
  
  We now write:
  
  “Baseline pupil sizes did not differ between groups (F(2, 45) = 0.707, p = 0.499).”
  
  We agree with the reviewer that millimeters are a more intuitive measure and updated all figures throughout manuscript and supplementary materials accordingly. We also briefly added to signal processing that this conversion was applied.
  
  “Pupillary data were transformed from arbitrary eyelink units to millimeters using a conversion factor obtained with an artificial eye (see Hayes & Petrov, 2016).”
  
  Hayes, T. R., & Petrov, A. A. (2016). Mapping and correcting the influence of gaze position on pupil size measurements. Behavior research methods, 48(2), 510-527.
  
  (3) If I understand correctly, the main task counted 120 trials overall (12 per digit). It seems, however, that only 3 and 4 participants remained with at least 50 trials (or 25 per median split by lightness) after preprocessing. This appears to be quite a massive data loss: is there a reason behind it? Please also clarify: the overall percentage of discarded trials; whether the median split by lightness was computed on all responses or only on those of the remaining, valid trials.
  
  This is an important point for clarification indeed. The exclusion of participants in Figure 3 applies only to that particular visualization, not to the statistical analyses. The linear mixed effects models (LMEs) used all available valid trials from all participants, with no participant-level exclusions. The figure-specific threshold (≥25 trials per median-split bin) was applied purely for display clarity, as plotting participants with very few trials per bin would produce unreliable/noisy and thus visually misleading traces (as we note in the figure caption and point readers to Supplementary Figure 1, which shows the same visualization without any exclusions).
  
  Since the paradigm required participants to repeat discarded trials until 120 valid trials were collected, all participants thus contributed exactly 120 valid trials to the analyses. There was therefore no data loss at the analysis level for the LME that is central to the claims of the manuscript (albeit more complex to grasp than the t-tests between bins).
  
  Why were there sometimes so little trials per brightness bin?
  
  First, participants differed in how dark or bright (synesthetic or forced-report) colors were overall, meaning that differing proportions thereof would fall above or below the 0.5 cutoff that overall, well represented the sample (but not necessarily every single participant). Note that this median split was not performed per individual but across all color reports to allow an apples-to-apples comparison.
  
  Second, participants often reported colors that differed in Hue and Saturation, but not Lightness. This is in line with synesthetes picking certain colors more often than others, as compared with non-synesthetes (Rouw & Root, 2019; Ward et al., 2025).
  
  We now include a new Supplementary Figure that visualizes responses on the Hue and Saturation dimensions of HSL space for both synesthetes and controls; fully saturated reports appear on the outer edge. We refer to the supplementary figure in the caption of Figure 2 as follows:
  
  "See Supplementary Figure 1 for color reports on the hue and saturation axes.”
  
  Rouw, R., & Root, N. B. (2019). Distinct colours in the ‘synaesthetic colour palette’. Philosophical Transactions of the Royal Society B: Biological Sciences, 374(1787).
  
  Ward, J., Maciel, S., Rouw, R., Simner, J., & Root, N. (2025). Synaesthesia is linked to differences in music preference and musical sophistication and a distinctive pattern of sound-color associations. Psychology of Music, 53(3), 453-473.
  
  Minor points:
  
  (1) "Building on this evidence, we hypothesized that the cross modal color phenomenology in synesthesia can, if truly sensory in nature, could likewise be (...)" -> may need rephrasing (can/could).
  
  Many thanks, fixed.
  
  (2) Caption of Figure 1: "Block 2 (synesthetes only): a colored disk and gray central patch, matching the average indicated color per digit, and the number and luminance of pixels of said digit were presented to assess externally triggered light responses." -> I find this sentence a bit hard to follow; perhaps consider rephrasing it.
  
  Agreed, we rephrased to:
  
  Block 2 (synesthetes only): a colored disk was presented, colored according to the synesthete's average indicated color for that digit. At its center sat a gray patch matching the luminance and pixel area of the original digit from Block 1, together allowing assessment of externally triggered light responses.
  
  (3) Figure 2 b: Consider truncating the y-axis to 1 if that improves the visualization.
  
  We adjusted the axis accordingly and added a bit more detail in the caption for the interpretation of the measure.
  
  (4) Caption of Figure 3 points to "see Supplementary Figure 1", but it should probably be SF2.
  
  Many thanks for spotting, all references to supplementary figures have been checked and are corrected now.
  
  Elvio Blini
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) As a minor comment, there are some terms that felt overused in the manuscript. For example, the words "extraordinary" and "exceptional" were used multiple times throughout. I believe I understand the authors to mean them in their descriptive sense (i.e., outside the realm of typical experience), but in context, those words make it seem like they are touting their own experiment as "exceptional" or "extraordinary," which I don't believe was their intention.
  
  We agree. We removed words such as exceptional and extraordinary when they do not directly refer to the sensation throughout the manuscript (which is indeed how we intended to use it). We hope that this removes unnecessary and convoluting hyperbole.
  
  (2) It seemed counterintuitive to me that the color consistency score would be reverse-coded. In this case, the scores actually seem to indicate inconsistency, rather than consistency. Perhaps the raw scores can be inverted for a more intuitive interpretation that aligns with the terminology. I understand that they were following a previous publication in their method (Rothen et al., 2013).
  
  This manner of coding is counter-intuitive indeed. However, there are both logical and practical reasons to this approach. Importantly, this is indeed the standard way of reporting color consistency in synesthesia research (Carmichael et al., 2015; Eagleman et al., 2007; Root et al., 2025; Rothen et al., 2013). The calculation is based on a simple logic; a higher number reflects a larger distance in color space. An additional advantage is the clear and intuitive zero- reference: a score of zero implies choosing the exact same color. Finally, it intuitively reflects the distinction between synesthetes and non-synesthetes; there is by definition little variation across synesthetes (visualized at the bottom of the graph), then a 'cut-off line' (if consistency is used as diagnostic tool), and then the height of the range shows how large the range in consistency is, in that particular sample of non-synesthetes. In a way we therefore inherit a confusing definition/standard, but changing it would lead to new confusion instead. We now specifically clarify this in the caption as follows:
  
  “Note that higher consistency is reflected in lower color distance, hence lower values [17].”
  
  Carmichael, D.A., Down, M.P., Shillcock, R.C., Eagleman, D.M., Simner, J., 2015. Validating a standardised test battery for synesthesia: does the synesthesia battery reliably detect synesthesia? Conscious. Cogn. 33, 375–385
  
  Eagleman, D.M., Kagan, A.D., Nelson, S.S., Sagaram, D., Sarma, A.K., 2007. A standardized test battery for the study of synesthesia. J. Neurosci. Methods 159 (1), 139–145.
  
  Root, N., Chkhaidze, A., Melero, H., Sidoro -Dorso, A., Volberg, G., Zhang, Y., & Rouw, R. (2025). How “diagnostic” criteria interact to shape synesthetic behavior: The role of self-report and test–retest consistency in synesthesia research. Consciousness and Cognition, 129, 103819.
  
  Rothen, N., Seth, A.K., Witzel, C., Ward, J., 2013. Diagnosing synaesthesia with online colour pickers: maximising sensitivity and specificity. J. Neurosci. Methods 215 (1), 156–160.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.11.24.690102v2
www.biorxiv.org www.biorxiv.org

Cortical layer 6b mediates state-dependent changes in brain activity and effects of orexin on waking and sleep

1
1. Public_Reviews 26 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  (1) All outcomes are attributed specifically to L6b neurons, but the genetic manipulation is not specific to L6b neurons. The authors acknowledge this as a limitation, but in my view, this global manipulation is more than a limitation - it affects the overall interpretations of the data. The Hoerder-Suabedissen et al., 2018 paper shows sparse, but also dense, expression of Drd1a+ neurons in brain regions outside of the L6b. Given this issue, the results are largely overstated throughout the paper.
  
  We appreciate the reviewer’s careful reading and concern that some of our statements may have overstated the implications of our data. The Drd1a Cre mouse model used (FK164) has a relatively selective expression of Drd1a Cre in cortex, but indeed some expression is seen subcortically. This is an acknowledged limitation which is now explicitly addressed in the revised manuscript.
  
  (2) It is not clear to me that the "silencing" of Drd1a+ neurons was verified.
  
  In our previous publications, we showed confirmation of the loss of regulated synaptic vesicle release from the Cre-positive neuronal population (Marques-Smith et al., 2016; Hoerder-Suabedissen et al., 2018; Messore et al., 2024). This has now been described in the revised manuscript.
  
  (3) There were various discrepancies (and potentially misattributions) between the stated significant differences in Supplementary Table T1 data and Figure 3a & S2 spectral plots. This issue makes it difficult to effectively evaluate the main text and stated outcomes.
  
  We thank the reviewer for their careful attention to the statistical analyses and for noting the inconsistencies in how the results of the spectral analysis were presented: in the text we described two-way ANOVAs with according posthoc tests but in the figures significance markers were positioned based on multiple t tests. We have now carefully revised the spectral results and implemented a consistent approach in statistical reporting and spectral plots. We have updated Supplementary Table T1, Figure 3a and S2 to ensure that all statistics are presented consistently throughout the manuscript, i.e. with two-way ANOVAs and accompanying posthoc tests. Please note that we performed all spectral analyses in the range between 0.5 and 128 Hz (excluding the range between 49-51.5 Hz due to electrical noise from the power grid) but only plot the range between 0.5-30 Hz as the spectral bands most relevant for sleep neurophysiology are contained in this range.
  
  Related, the authors stated that post hoc comparisons of EEG spectral frequency bins were not corrected for multiple testing. Instead, significance was only denoted if changes in at least two consecutive frequency bins were significant. However, there are multiple plots in which a single significance marker is placed over an isolated bin (i.e., 4c, 6, S5, S6). Unless each marker is equivalent to 2 consecutive frequency bins, these markers should be removed from the plots. Otherwise, please define the frequency and size of these markers in the main text.
  
  In line with the previous comment, we have adjusted markers to reflect the results from posthoc tests after two-way ANOVAs.Please note that Figure 6 and the related supplementary figures S5 and S6 have now been removed from the manuscript, as careful re-analysis indicated that the sample size was too low to support a strong conclusion regarding the comparison of orexin effects between genotypes. We stated in the text that we would only include posthoc significance when at least two consecutive bins were significant, but this was indeed not supported in our figure, where each marker reflects one 0.25 Hz bin. We have now adjusted our code to ensure that only markers are plotted when at least two consecutive bins are significant in bin-wise posthoc comparisons.
  
  (4) A rainbow color scale, as in Figure 3, we've now learned, can be misleading and difficult to interpret. The viridis color scale or a different diverging color scale are good alternatives.
  
  Thank you for pointing this out, we have adjusted the colour scale.
  
  (5) How much time elapsed between vehicle/orexin A & B infusions?
  
  There were 2-4 non-infusions days between infusions. We have added this information to methods.
  
  (6) For Figure 6, there are statistical discrepancies between the main text and the plots (pg. 10):
  
  (a) The text claims post hoc differences for relative ORXA frontal EEG, but there are no significance markers on the plot.
  
  (b) The text states that there were no post hoc differences for the relative ORXA occipital EEG, but significance markers are on the plot.
  
  (c) The main test for the relative ORXB frontal EEG was not significant, but there are post hoc significance markers on the plot.
  
  (d) For relative ORXB occipital EEG, there are significant markers on the plot outside of the stated range in the text.
  
  We agree with the reviewer, and we decided to exclude this figure from the manuscript as the sample size for some key comparisons was too low to support any strong conclusions and therefore presenting this analysis is potentially misleading. We explain the rationale for excluding this analyses in the revised manuscript.
  
  (7) Some important details are only available in figure captions, making it difficult to understand the main text. For example, when describing Figure 3c in the main text on page 7, it is not clear what type of transitions are being discussed without reading the figure caption. Likewise, a "decrease," "shift," and "change" are mentioned, but relative to what? Similar comment for the EEG theta activity description on pages 7 - 8. Please add relevant details to the main text.
  
  We have adjusted the wording in the main text to reflect more precisely which comparisons are shown in the figures.
  
  (8) Statistical comparisons for data in Figure 3e, post hoc analyses for data in Figure S7a-b REM data, and post hoc analyses for Figure S7c (not b) occipital EEG should be included to support differences claims. Please denote these differences on the respective plots.
  
  Please note that the previously named Supplementary Figures S5 and S6 have been removed from the manuscript, and that the Supplementary Figure S7 in this comment refers to the figure currently named Supplementary Figure S5.
  
  We have added the statistical comparisons for Figure 3e, Supplementary Figure S5A and Figure S5b to the results section. In Figure S5c, there was an overall genotype difference, but there was no significant time x genotype interaction, so we have not performed posthoc tests and did not plot posthoc significance markers for this figure. We have adjusted the wording in the results section to make this clearer. We have adjusted the reference to the figure S5c which was incorrect, thank you for your careful attention.
  
  (9) In the subsection titled "Layer 6b mediates effects of orexin on vigilance states (pg. 8)," there does not seem to be any stated differences between control and L6b silenced mice. A more accurate subtitle is needed.
  
  We agree with the reviewer and the title of this sub-section has now been changed accordingly.
  
  Reviewer #2 (Public review):
  
  Weaknesses:
  
  (1) Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.
  
  We thank the reviewer for this important comment. The ICV route of orexin administration cannot guarantee that only cortical Drd1a-Cre–expressing neurons are reached by orexin, and the Drd1a-Cre driver line is highly selective but not entirely specific for layer 6b neurons (see also response to reviewer #1, comment 1). We have therefore changed the wording of the stated effects and addressed this consideration in the Limitations section of the manuscript. Please note that, as mentioned above, Figure 6 has now been excluded from the manuscript.
  
  (2) The rationale for using only male rats is not provided.
  
  We thank the reviewer for highlighting this omission. We now provide the rationale for using only male mice in the methods section as follows: “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Better descriptions of L6b connectivity will improve clarity in the second paragraph of the Introduction (pg. 3). For example, it is not explicitly stated that L6b projects to L5 before the authors describe L5. Therefore, the L5 description seems irrelevant.
  
  We thank the reviewer for this request for clarification. We mention the connectivity between L6b and L5 because L5 pyramidal neurons have recently been found to play a key role in sleep-wake regulation (Krone et al., Nat. Neurosci. 2021; Honjo et al., 2025; Wasilczuk et al, 2025; Krone et al., 2025). We have now amended the corresponding section of the introduction to emphasise the potential functional relevance of this connection as follows:
  
  “L5, the major output layer of the cortex, is also bidirectionally communicative with higher order thalamic nuclei (Hoerder-Suabedissen et al., 2018) as well as layer 5 pyramidal neurons (Zolnik et al., 2024). Since several subtypes of L5 pyramidal neurons have recently been shown to play important roles in distinct aspects of sleep-wake regulation (Krone et al., 2021, 2025; Hong et al. 2023; Wasilczuk et al. 2025; Honjo et al., 2025; Chouafeev et al., 2025); depth of anaesthesia (Wasilczuk et al. 2025), and the influence of stress on sleep (Chouafeev et al. 2025) the projections of orexin-sensitive L6b to L5 pyramidal neurons may be a key circuitry in the top-down regulation of brain states.”
  
  (2) There are plots where the y-axis tick label appears to be offset from the tick mark (4a, S5b, S6a).
  
  Thank you for spotting this graphical issue. We have removed the y-axis tick labels from Figure 4a to avoid confusion. Please note that we decided to remove Figure S5 and Figure S6, because after careful re-analysis we concluded that the group size was too small to draw conclusions on orexin spectra and that any results could be potentially misleading.
  
  (3) The 2-h time constant, I believe, is depicted in Figure 4H (not 4G).
  
  Thank you for spotting this. We have corrected the figure legends accordingly and double-checked that Figure 4G depicts the 2-h time constant and Figure 4H the 6-h time constant.
  
  (4) "...although there was an indication of a higher absolute theta-peak power in layer 6b silenced mice (Figure S6)," pg. 10. It is not clear to me how the data lead to this conclusion.
  
  Thank you for identifying this inconsistency, which resulted from a preliminary statistical analysis subsequently corrected. We have now improved the statistical analysis of spectral data (for more details see comments to both reviewers in public response) and removed this statement, which in fact is no longer supported by the data.
  
  (5) Exclusion of female mice is not listed as a limitation.
  
  We now discuss this limitation as follows:
  
  “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”
  
  (6) A brief description of why Cplx3 and Tbr1 antibodies are being used will be helpful to include in the Methods (pg. 21) in addition to what is in the figure caption.
  
  We have added the following information to the methods section to clarify why we used these two antibodies: “rabbit α-Cplx3 to distinguish between L6a and L6b” “mouse α-Tbr1 to identify the L5-6 boundary”
  
  (7) Including a label/title for the Figure 2c spectral plots will be helpful. It is not immediately clear if these are light period & dark period data or frontal & occipital data.
  
  Thank you for pointing this out, we have updated the figure legend to clarify what is shown on this Figure
  
  Similar comments for S2 and S3a plots. Including a state label on the plots will be helpful in addition to the caption description.
  
  We have now added the state labels for Figure panels S2 and S3a for improved clarity.
  
  Reviewer #2 (Recommendations for the authors):
  
  This is a soundly conducted and well-written study that enhances our understanding of the cortical control of states of consciousness. I do not have any major concerns, but would like the authors to consider some alternate possibilities as suggested in my comments below:
  
  We thank the reviewer for this positive assessment of our manuscript and the helpful suggestions.
  
  (1) Given that the inactivation of layer6b neurons did not affect the time spent in sleep-wake states, to me it appears that these neurons likely have a role in creating the background neural conditions/oscillations supportive of an activated state rather than a direct role in behavioral state control.
  
  We completely agree with the reviewer and have made the wording more consistent throughout the manuscript, now using “brain state control” rather than “behavioural state control” to clarify that the main effect observed in the L6b-silenced mouse model is a change in spectral characteristics reflecting brain oscillations, rather than effects on vigilance states, which were modest.
  
  (2) Does the observed shift in REM sleep-related theta-peak frequency in the occipital derivation suggest changes in local neural processes, or could it be just a matter of better signal detection because theta is most prominent at or around the hippocampal region, which is approximately the location of occipital electrodes in this study.
  
  The source of the shift in REM sleep–related theta peak frequency in the occipital derivation cannot be established with EEG recordings alone. Additional intracortical or intrahippocampal recordings would be necessary to distinguish between the two possible explanations proposed by the reviewer. We have discussed this further in the revised manuscript.
  
  (3) Orexinergic system innervates multiple subcortical sites and widely covers the cortex too, because of which the effect of ICV orexins cannot be attributed to just layer6b neurons as described in the manuscript ("Layer 6b mediates effects of orexin on brain activity.").
  
  We agree with the reviewer that this is a limitation. We have now adjusted the subtitle of the paragraph describing the results from the ICV administration of orexin and further mention this important consideration in the ‘limitations’ section of the discussion.
  
  (4) While the current study is focused on sleep-wake mechanisms, the findings reported here have much broader implications for behavioral and/or brain state arousal and provide a mechanistic bridge between different states of consciousness, including general anesthesia. Therefore, the authors may consider tying these findings with the recent work on the role of the prefrontal cortex in arousal from general anesthesia and slow-wave sleep (PMID: 35436248, PMID: 29937348, PMID: 33328847).
  
  We thank the reviewer for this excellent recommendation. We are now citing these papers in the revised manuscript.
  
  (5) It's up to the authors, but I do not see the need for the section on Clinical Implications. It's very speculative, and it makes the entire discussion section heavy.<br />
  
  We have considerably shortened the discussion of potential clinical implications to make the manuscript more concise.
  
  (6) Figure 1: It's difficult to compare the EEG power the way figures are set up right now. I think it would enhance clarity if the authors separate the plots based on state and show power from the control and silenced neuronal group in the same plot. Also, the colors are too similar (essentially a shade of green/blue) to provide effective visual resolution. This is especially true in panel d. Please consider changing the color scheme.
  
  This comment seems to refer to Figure 2 and subsequent figures with analysis of vigilance states and EEG spectra (Figure 1 contains histological images). We have selected the colour scheme for colour-blind individuals. Therefore, the main difference is in the saturation, not the colour of the plots. We have tested the visibility of the colour scheme on a high-resolution screen with the original image files and can reassure the reviewer that the genotype differences, which are slightly blurred in the reduced-resolution figures provided within the combined text file for the review process, are easily distinguishable in the final figure quality.
  
  (7) I don't understand the y-axis scale in Figure 1. How can this be 500% and if it is, then 500% of what?
  
  This comment also seems to refer to the analysis of slow wave activity (SWA) in Figure 2 rather than to Figure 1 (histology figure). The percentage of SWA is normalised to the average SWA across the recording. Since NREM sleep is characterised by considerably higher SWA than wakefulness and REM sleep, the level of SWA during NREM sleep is in the range of 200-300%, and can be even higher after long wake episodes which are followed by a rebound of NREM sleep SWA. Hence, the upper limit of the y-axis in these (and subsequent) plots of SWA is 500% (of the average SWA). We have amended the figure legend to clarify that SWA is presented here as percentage of average SWA across the recording.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.26.620399v3
www.biorxiv.org www.biorxiv.org

SPEx: Compartment-Resolved Proteomics via Expansion Microscopy–Guided Microdissection

1
1. Public_Reviews 26 May 2026
  
  in eLife
  
  Author Response:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors present a novel approach to subcellular spatial proteomics by combining laser microdissection with expansion microscopy and LC-MS/MS analysis (SPEx). They implement two different workflows for LMD and LC-MS/MS quantification:
  
  (1)The standard approach, where an area of interest is cut out by LMD, subjected to proteomics analysis, and compared to the rest of the cell without the dissected ROI.
  
  (2) The subtraction approach, where ROIs are removed, and the remaining cellular material is compared to samples containing both the surrounding material and the ROI.
  
  The authors assess the technique by applying it to subcellular targets of various sizes, volumes, and protein compositions such as the nucleus, nucleoli, and Golgi. They demonstrate that SPEx can identify proteins enriched or reduced in ROIs.
  
  Strengths:
  
  The broad, relatively easy, and inexpensive applicability of this approach to potentially many cell types and subcellular areas of interest provides an exciting alternative to subcellular fractionation, native immunoprecipitation, or genetically encoded proximity labeling constructs. Moreover, by visually selecting ROIs for subsequent analysis, subcellular context or organelle morphology can be taken into account, as discussed by the authors in the discussion section.
  
  Weaknesses:
  
  While strongly supporting the sharing of this approach, we have a number of comments and questions that will improve the impact of the manuscript:
  
  We thank the reviewer for the careful evaluation of our manuscript and the generally positive assessment. We plan on improving our manuscript based on the reviewers’ comments.
  
  (1) General:
  
  a) The manuscript would benefit from restructuring and language revision. In its current form, the writing is sometimes dense and verbose (in particular, the Results section). This makes it difficult to follow the authors' arguments.
  
  We will improve readability and clarity of the results section in the revised manuscript.
  
  b) The authors mention the possibility of selecting organelles based on morphology. This is left for the discussion, but it seems like a missed opportunity - the authors could compare individual organelles in different morphological states, e.g., connected vs. fragmented mitochondria.
  
  The authors agree with the reviewers’ assessment that investigating proteome of organelles based on morphology or cellular state is an exciting application of SPEx. While we plan experiments along this line in the future, we think that these experiments are beyond the scope of this manuscript, which is meant to describe the method and its general usefulness.
  
  (2) Technical:
  
  a) Why do the authors strive and optimize for a 10x expansion factor? Is SPEx compatible with a more standard 4x expansion, as e.g., used in the classic U-ExM approach (https://www.nature.com/articles/s41592-018-0238-1)? This could be added to the discussion.
  
  We aimed for 10x expansion solely because our ultimate goal is to cut out very small structures. Isolating structures as small as nucleoli would not be as reliable with a lower expansion factor (i.e. 4x) expansion. We did not assess the compatibility with U-ExM. We would assume that SPEx would also work with U-ExM as expansion method; omitting protease treatment, however. Still, we performed pilots with just 4x expansion (using TREx) in the early stages of optimization. We were able to isolate single cells and obtain similar protein coverage as with 10x expansion. We will further clarify our motivation to use 10x expansion in the discussion.
  
  We would also like to point out whether to U-ExM the standard method or not is rather subjective. Even though TREx was published three years later, it is also very widely used. The original expansion microscopy method was published three years prior to U-ExM.
  
  b) The U-ExM approach shows improved ultrastructural preservation when using 3%FA with 0.1% glutaraldehyde fixation (GA). Is SPEx compatible with the use of low amounts of GA for fixation?
  
  We tried different fixation methods in the early stages of this study (where expansion was not yet close to 10x). We saw a mild negative effect of GA on the expansion factor, so we avoided it in the later experiments since it also did not seem necessary to preserve the structure of our organelles of interest. However, the use of GA would generally be compatible with SPEx, potentially at the cost of a mild negative effect on expansion factor (see Author response image 1) and proteome coverage. We can add this information to the discussion.
  
  Author response image 1.
  
  Fixation methods mini-screen. Cells were fixed with the indicated reagents for 10 minutes at 37°C. After TREx expansion, the diameter of the nucleus was measured (A) and the resulting expansion factor compared to the non-expanded control was determined (B).
  
  Related to the above, was the anchoring efficiency reduced only to achieve a 10x expansion factor or does this additionally affect the proteome coverage?
  
  We solely lowered the anchoring in order to allow for higher expansion factors. In earlier pilots we performed proteomic analysis on samples that were just expanded 4x using standard TREx expansion (also using the original anchoring strategy from the TREx publication, consisting of 0.2 mg/ml AcX for overnight at RT). We presented the results of this pilot in Fig S1A. We still detected over 2,000 proteins from 10 cells, a coverage, which is highly similar to what we found in the final experiments (Figure 2F), in which the anchoring was lower yielding 10x expansion. Based on these data, we hypothesize that anchoring (and expansion factor!) has a negligible impact on protein coverage. We will clarify this in the manuscript.
  
  d) Have the authors considered using alternative anchoring approaches, such as GMA (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291506#pone.0291506.s001), which potentially increase the amount of sample retained in the hydrogel, thus allowing for better proteome coverage? This could be added to the discussion.
  
  We did not use alternative anchoring approaches. We modified the TREx protocol to fit our purposes and since this was sufficient, we did not explore alternatives. However, using anchoring approaches, in which higher amounts of sample could be retained in the gel might be beneficial for the proteomics coverage. We will keep this suggestion in mind for future experiments. Thank you for the suggestion!
  
  e) The limitation of the approach to near-2D samples should be mentioned, and alternative approaches for more 3D samples could be discussed.
  
  The authors agree that SPEx is limited to near-2D samples at this point. We suggest that SPEx is applicable for 3D samples (e.g. in tissues) by performing cryosectioning. TREx has been shown to be compatible with sectioned tissue (Damstra et al., 2022). We will elaborate this in the discussion.
  
  f) How are peptides that are directly anchored to the hydrogel dealt with during LC-MS/MS analysis? Are they excluded, or can they be identified during the spectral search? The latter would allow us to get a deeper structural understanding of how proteins are actually anchored into hydrogels, which so far has not been assessed.
  
  The reviewer raises an interesting point. In general, peptides carrying the anchoring modification are analysed by LC-MS, but we did not include these specific modifications in the database search. Overall, we assumed that the labeling would be low and stochastic and hence should, if at all, only minimally affect the detection of peptides. Nevertheless, in response to the reviewers’ comment, we searched the MS data again for the crosslinking reagent linked to lysine residues. However, we could not get any confident hit for any peptide containing this modification. Since we cannot exclude that the modification precludes the identification of the corresponding peptides, we compared the number peptides generated by trypsin cleavage after arginine and lysine. As the human genome contains similar proportions of both amino acids, one would expect similar numbers of both peptide types being identified. Any modifications of lysine by the anchoring reagent used, would prevent tryptic cleavage and thus reduce the number of lysine peptides. As shown in Author response image 2, the number of lysine terminating is only slightly lower compared to arginine terminating peptides. Notably, the proteomics results of a different fixed human tissue sample directly extracted by laser capture micro dissection without expansion showed a very similar lysine to arginine peptide ratio. This indicates that the large majority of lysine residues is not modified and affected by the hydrogel anchoring.
  
  Author response image 2.
  
  Number of peptides identified either terminating with lysine (K) or arginine (R) across all samples shown in Figure 5F.
  
  An alternative approach to address this question would be to investigate if the peptide coverage of proteins detected by SPEx is enriched for peptides representing the folded core of proteins as opposed to the surface-exposed regions, which likely get more anchored into the hydrogel.
  
  Because of the negligible amounts of modified peptides, we did not investigate this potential bias of surface-exposed versus folded-core peptides.
  
  g) Same question regarding peptides with NHS labeling. Can they be identified, or do they just compete for ionization and thus negatively affect coverage and dynamic range of the LC-MS/MS approach?
  
  The reviewer raises a similar point as above for another lysine labeling used during the SPEx protocol. Again, we specifically looked for this modification by re-searching the raw MS data, but still could not identify any peptides, carrying this modification on a lysine residue. Even though we cannot exclude that this rather large modification prevents detection, considering the high number of lysine terminating peptides in our dataset (see Figure 2), we would expect that also this labeling step is stochastic and affects only a minor proportion of the proteins.
  
  h) How are the primary and secondary antibodies affecting the proteomics analysis identified as contaminants?
  
  We thank the reviewer for this comment. Since antibodies bind to proteins in a non-covalent manner, they will be released during the denaturing steps of the protocol. Of course, the antibodies will stay in the sample, be digested and analyzed and could, if very abundant, affect the analysis of the proteins from the samples. To check this possibility, we re-searched the MS data including the sequences of the antibodies used. To our surprise, we could not detect any peptides of these antibodies. This suggests that the concentrations of the antibodies used are much lower than those of the sample proteins and thus should not have any impact on the proteomics results. We interpret this result also as a benefit of our method compared to organellar-IP.
  
  i) Have the authors observed differences in proteomics coverage of only antibody vs NHS-labeling? Depending on the questions above, could pure antibody-based labeling increase proteomic coverage?
  
  We did not perform this comparative analysis, since we always used NHS dyes. In the experiments presented in this manuscript, NHS dyes allowed easy visualization of the whole cell without the use of antibodies. This NHS staining was essential for this particular setup for sample acquisition. We cut out entire cells, cells lacking the nucleus and cells lacking the Golgi apparatus, which served as critical controls. However, other ways of detecting cell boundaries could be used to avoid NHS staining. As shown above, both, the anchor and NHS labeling are likewise sparse and stochastic. Moreover, we could not detect any impact of the antibody labeling to our results. Thus, we assume that both labeling procedures could be used.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study introduces a method that combines physical expansion of cells, imaging-guided isolation of defined regions, and protein identification to enable compartment-resolved analysis of protein composition at the subcellular scale. The authors aim to address a central limitation in existing approaches, namely the loss of spatial information during sample preparation or the indirect nature of proximity-based labeling methods. Using several cellular compartments as examples, they demonstrate that their approach can recover compartment-enriched protein sets and identify candidate proteins with previously unassigned localization.
  
  Strengths:
  
  A major strength of this work is the conceptual simplicity and accessibility of the approach. By combining established techniques in a modular way, the method avoids the need for genetic manipulation or specialized labeling strategies, making it broadly adaptable across experimental systems. The ability to directly select regions of interest based on imaging represents a clear advantage over indirect enrichment strategies and allows flexible targeting of both membrane-bound and non-membrane-bound compartments.
  
  The experimental design is also a strong aspect of the study. The use of complementary comparison strategies-analyzing isolated compartments alongside matched "subtracted" controls-provides an internal framework for assessing enrichment and depletion, increasing confidence in spatial assignment. The application of the method across multiple organelles of different sizes and properties demonstrates versatility, and the reported specificity for several compartments is encouraging. In particular, the ability to profile small and biochemically challenging structures highlights a potentially important niche for the approach.
  
  Weaknesses:
  
  Despite these strengths, several methodological limitations constrain the interpretation of the results. The most important relates to spatial accuracy in three dimensions. While lateral resolution is improved through physical expansion, the lack of depth resolution introduces uncertainty regarding contributions from structures above and below the selected region. Although the authors argue that this does not substantially affect specificity, the current evidence is largely indirect, and a more rigorous quantification of potential contamination would strengthen this conclusion.
  
  Quantitative interpretation also remains challenging. Because the measurements reflect total protein abundance rather than local concentration, differences in compartment size and protein density can influence enrichment values, particularly for small structures embedded within larger volumes. This issue is evident in the analysis of smaller compartments and complicates direct comparison across conditions. Additional normalization or modeling would help clarify how to interpret these measurements.
  
  Another limitation concerns variability in the expansion process and its downstream consequences. Differences in expansion factor across samples may affect the definition of regions of interest and introduce variability in sampling, yet the impact of this variability is not fully explored. Similarly, the use of a modified chemical treatment to preserve proteins for downstream analysis is central to the workflow but is not extensively validated with respect to preservation of spatial organization.
  
  While the identification of previously unannotated proteins is an appealing aspect of the study, validation is limited to a small number of examples, and broader support from independent datasets or literature context is lacking. In addition, the study primarily focuses on steady-state measurements in a single cell type, and therefore does not yet demonstrate the ability of the method to capture dynamic or condition-dependent changes in protein localization.
  
  Finally, the positioning of the method relative to existing approaches could be more clearly articulated. Although qualitative comparisons are provided, a more systematic and quantitative benchmarking against alternative strategies would help readers better understand the specific advantages and trade-offs.
  
  We thank the reviewer for the careful evaluation of the manuscript and for the constructive feedback. We think the reviewer raises valid points and will address them in the revised manuscript.
  
  Reviewer #3 (Public review):
  
  Franziscus et al. describe an elegant approach for spatially specific proteome analysis. To achieve this, they expand fixed cells and subsequently use a laser to micro-dissect a region of interest, which is then analyzed by mass spectrometry.
  
  They demonstrate the effectiveness of their approach by analyzing the nucleus, nucleolus, and the Golgi, and benchmark their hits against previous datasets for these organelles.
  
  The manuscript is very well written and nicely guides the reader through the applied methods. The presented data is convincing, and I do not see the need for additional experimental verification of the protocol. The only minor concern is the novelty of the method and the presentation. A combination of expansion, laser microdissection, and proteomics has been applied in the past (PMID: 36450705, PMID: 39477916). In the manuscript, one of these studies is cited, though it does not become clear that this approach is already described. However, Franziscus et al. describe the approach better and make it more accessible to the reader, especially since the other studies described this methodology in combination with tissue expansion and not in combination with single cell expansion as it is done here. I would ask the authors to be clearer in the introduction about what others have already done and what their contribution is here. In general, I am convinced that the community will benefit from the presented protocol to analyze organelle proteomics in detail.
  
  We thank the reviewer for the careful evaluation of our manuscript and overwhelmingly positive assessment. We apologize for the omission of the mentioned citations, and will adjust the introduction to make it clearer what has already been done and what the advance our method provides.
  
  References
  
  Damstra HG, Mohar B, Eddison M, Akhmanova A, Kapitein LC, Tillberg PW. 2022. Visualizing cellular and tissue ultrastructure using Ten-fold Robust Expansion Microscopy (TREx). eLife 11:e73775. DOI: https://doi.org/10.7554/eLife.73775
  
  Gambarotto D, Hamel V, Guichard P. 2021. Ultrastructure expansion microscopy (U-ExM). Methods in Cell Biology 161:57–81. DOI: https://doi.org/10.1016/bs.mcb.2020.05.006, PMID: 33478697
  
  Liffner B, Silva TLA e., Vega-Rodriguez J, Absalon S. 2024. Mosquito Tissue Ultrastructure-Expansion Microscopy (MoTissU-ExM) enables ultrastructural and anatomical analysis of malaria parasites and their mosquito. BMC Methods 1:13. DOI: https://doi.org/10.1186/s44330-024-00013-4
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.28.714993v1
www.biorxiv.org www.biorxiv.org

PDL-1+ Neutrophils mediate susceptibility during endotoxemia in Metabolically Dysfunctional-Associated Fatty Liver Disease

1
1. Public_Reviews 26 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank the editor and reviewers for their constructive questions, valuable feedback, and for approving our manuscript. We truly appreciate the opportunity to improve our work based on their insightful comments. Before addressing the editor’s and each referee’s remarks individually, we provide below a point-by-point response summarizing the revisions made.
  
  Duplication of control groups across experiments
  
  We appreciate the reviewers’ concern regarding the potential duplication of control groups. In the revised manuscript, we have explicitly clarified that independent groups of control mice were used for each experiment. These details are now clearly indicated in the Materials and Methods section to avoid any ambiguity and to reinforce the rigor of our experimental design (Page 15, Line 453-455): “Furthermore, knockout animals and those treated with pharmacological inhibitors or neutralizing antibodies shared the same control groups (chow and HFCD), as required by the animal ethics committee.”
  
  Validation of the MASLD model
  
  To strengthen the metabolic characterization of our MASLD model, we have now included additional parameters, including liver weight, Picrosirius staining and blood glucose measurements. These data are presented as new graphs in the revised manuscript and support the metabolic relevance of the HFCD diet model (Figure Suplementary S1). The corresponding description has been added to the Results section (Page 5, Lines 116-117) as follows: “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C)”
  
  Assessment of liver injury in RagKO and anti-NK1.1 mice
  
  We fully agree that assessment of liver injury is essential for these models. For mice treated with antiNK1.1, ALT levels are shown in Figure 4G, confirming increased liver injury after treatment. Regarding Rag⁻/⁻ mice, the animals exhibit exacerbation of liver injury when fed a HFCD diet and challenged with LPS (Page 7, Lines 183–184). The corresponding description has been added to the Results section (Page 7, Lines 175-176) as follows: “Interestingly, Rag1-deficient animals under the HFCD remained susceptible to the LPS challenge (Fig. 4C) with exacerbation of liver injury (Fig. 4D) ”
  
  Discussion of limitations
  
  We have expanded the Discussion section to provide a more comprehensive and balanced perspective on the limitations of our model and experimental approach (Page 13-14, Lines 401–414) “Our study presents several limitations that should be acknowledged and discussed. First, we cannot entirely rule out the possibility that our mice deficient in pro-inflammatory components exhibit reduced responsiveness to LPS. However, our ex vivo analyses using splenocytes from these animals revealed a preserved cytokine production following LPS stimulation. These results suggest that the in vivo differences observed are primarily driven by the MAFLD condition rather than by intrinsic defects in LPS sensitivity. Second, the absence of publicly available single-cell RNA-seq datasets from MAFLD subjects under endotoxemic or septic conditions limited our ability to perform direct translational comparisons. To overcome this, we analyzed existing MAFLD patients and experimental MAFLD datasets, which consistently demonstrated upregulation of IFN-y and TNF-α inflammatory pathways in MALFD. In line with these findings, our murine model revealed TNF-α⁺ myeloid and IFN-y⁺ NK cell populations, thereby reinforcing the validity and translational relevance of our results.”. This revision highlights the constraints of the MASLD model, the inherent variability among in vivo experiments, and the interpretative limitations related to immunodeficient mouse strains.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) In Figure 4 the authors are showing the number of IFN+ positive CD4, CD8, and NK 1.1+ cells. Could they show from total IFNg production, how much it goes specifically on NK cells and how much on other cell populations since NK1.1 is NK but also NKT and gamma delta T cell marker? Also, in Figure 2E the authors see a substantial increase in IFNg signal in T cells.
  
  While we did not specifically assess IFNγ production in NKT cells or other minor populations, our data indicate that the NK1.1+CD3+ cells (NKT cells) cited in Page 7, Lines 188-192 were essentially absent in the liver tissue of LPS-challenged animals, as shown in Supplementary Figures 3C and S10. The corresponding description has been added to the Results section (Page 7, Lines 188-192) as follows: “We observed that the number of NK cells increased in the liver tissue of PBS-treated MAFLD mice compared with mice fed a control diet (Fig. 4E). LPS challenge increased the accumulation of NK1.1+CD3− NK cells in the liver tissue of MAFLD mice and the absence of NK1.1+CD3+ NKT cells (Fig. S3C and 4E)”.
  
  This absence was consistent across all experimental conditions, corroborating our focus on NK1.1+CD3− cells as the primary source of NK1.1-associated IFNγ production. Furthermore, data demonstrated in Figure 2E illustrate the presence of IFNγ primarily in NK cells. Therefore, the observed IFNγ signal, attributed to NK1.1+ cells, predominantly reflects conventional NK cells, with minimal contribution from NKT or γδ T cells.
  
  (2) In Figure 4C, the authors state that the results suggest that T and B cells do not contribute to susceptibility to LPS challenge. However, they observe a drop in survival compared to chow+LPS. Are the authors certain there is no statistical significance there?
  
  The observed decrease in survival is consistent with our expectations, as T and B cells are not the primary source of interferon-gamma (IFNγ) in this context. Even in their absence, animals remain susceptible to LPS challenge due to the presence of other IFNγ-producing cells that drive the observed lethality. We have carefully re-examined the statistical analysis and confirm that it was correctly performed.
  
  (3) Since the survival curve and rate are exactly the same (60%) in Figures 3F, 3G, 4C, 4F, 5G, and 5H I would just like to double-check that the authors used different controls for each experiment.
  
  The number of mice used in each experiment was carefully determined to ensure sufficient statistical power while fully complying with the limits established by our institutional Animal Ethics Committee. To minimize animal use, the same control group was shared across multiple survival experiments. Despite using shared controls, the total number of animals per experimental group was adequate to produce robust and reproducible survival outcomes. All groups were properly randomized, and the shared control data were rigorously incorporated into statistical analyses. This strategy allowed us to maintain both ethical standards and the scientific rigor of our findings.
  
  (4) In Figure 5 the authors are saying that it is neutrophils but not monocytes mediate susceptibility of animals with NAFLD to endotoxemia. However, CXCR2i depletion and CCR2 knock out mice affect both monocytes/macrophages and neutrophils. And in Figures 5E, 5G, and 5H they see that a) LPS+CXCR2i decreases liver damage more than LPS+anti Ly6G, b) HFCD mice challenged with LPS and treated with anti-LY6G do not rescue survival to levels of CHOW LPS and c) anti Ly6G treatment helps less than CXCR2i. Therefore, from both knock out mice and depletion experiments the authors can conclude that most likely monocytes (but potentially also other cells) together with neutrophils are substantial for the development of endotoxemic shock in choline-deficient high-fat diet model.
  
  While neutrophils express CCR2, our data clearly show that CCR2 deficiency does not impair neutrophil migration, as demonstrated in Supplemental Figures 5A and 5B (added to the manuscript, page 8, lines 213–217). The corresponding description has been added to the Results section (Page 8, Lines 213217) as follows: ``Interestingly, animals deficient in monocyte migration (CCR2-/-) showed a high mortality rate compared to wild type after LPS challenge and neutrophil migration is not altered (Fig. 5SA and Fig. 5SB)``, In contrast, CCR2 deficiency primarily affects monocyte recruitment, yet in our experimental conditions, monocyte depletion or CCR2 knockout did not significantly alter the severity of endotoxemic shock, indicating that monocytes play a minimal role in mediating susceptibility in HFCD-fed mice.
  
  To specifically investigate neutrophils, we used pharmacological blockade of CXCR2 to inhibit migration and antibody-mediated neutrophil depletion. Both approaches have consistently demonstrated that neutrophils are critical factors in endotoxemic shock.
  
  These findings support our conclusion that neutrophils are the primary cellular contributors to susceptibility in HFCD-fed mice during endotoxemia, with monocytes making a negligible contribution under the tested conditions.
  
  (6) In Figure 6A (but also others with PD-L1) did the authors do isotype control? And can they show how much of PD1+ population goes on neutrophils, and how much on all the other populations?
  
  To address this issue, we performed additional analyses to assess the distribution of PD-L1 expression on CD45+CD11B+ leukocytes. These new results, detailed on Page 9, lines 245-250, and now presented in Supplemental Figure 6, demonstrate that PD-L1 expression is predominantly enriched in neutrophils compared to other immune subsets. This observation further reinforces our conclusion that neutrophils represent a major source of PD-L1 in our experimental model.
  
  To ensure the robustness of these findings, we also included FMO controls for PD-L1 staining in the newly added Supplemental Figure S6. These controls validate the specificity of our gating strategy and confirm the reliability of the detected PD-L1 signal. The corresponding description has been added to the Results section (Page 9, Lines 245-250) as follows: ``First, we observed that only the MAFLD diet caused a significant increase in PD-L1 expression in CD45+CD11b+ leukocytes after LPS challenge (Fig. S6C). We observed that within this population, neutrophils predominate in their expression when compared to monocytes (Fig. 6SA, Fig. 6SB, and Fig. 6SD). Furthermore, PD-L+1 neutrophils showed an exacerbated migration of PD-L1+ neutrophils towards the liver (Fig. 6A and 6B)”
  
  (7) In Figure 6D it is interesting that there is not an increase in PD-L1+ neutrophils in LPS HFCD IFNg+/+ mice in comparison to LPS chow IFNg+/+ mice, since those should be like WT mice (Figure 6A going from 50% to 97%) and so an increase should be seen?
  
  The apparent difference between Figures 6A and 6D likely reflects inter-experimental variability rather than a biological discrepancy. Although the absolute percentages of PD-L1⁺ neutrophils varied slightly among independent experiments, the overall phenotype and trend were consistently maintained namely, that PD-L1 expression on neutrophils is enhanced in response to LPS stimulation and modulated by IFNγ signaling. Thus, the data shown in Figure 6D are representative of this consistent phenotype despite minor quantitative variation.
  
  (8) In Figure 7 do the authors have isotype control for TNFa because gating seems a bit random so an isotype control graph would help a lot as supplementary information, in order to make the figure more persuasive
  
  To address the concern regarding gating in Figure 7, we have included the FMO showing TNFα as a histogram Supplementary Figure 8gG. These control reaffirm the accuracy and reliability of our gating strategy for TNFα, further supporting the robustness of our data. The corresponding description has been added to the Results section (Page 9, Lines 272-274) as follows:`` We observed an exacerbated TNF-α expression by PD-L1+ neutrophils from MAFLD when compared to control chow animals (Fig. 7A, Fig. 7B, Fig. 7D, and Fig8SG).
  
  (9) Figure 6C IFNg+/+ mice on CHOW +LPS is same as Figure 8E mice chow +LPS but just with different numbers. Can the authors explain this?
  
  Although the data points in Figures 6C and 8E may appear similar, we confirm that they originate from entirely independent experiments and represent distinct datasets. To enhance clarity and avoid any potential confusion, we have adjusted the figure presentation and sizing in the revised manuscript. These changes make it clear that the datasets, while comparable, are derived from separate experimental replicates.
  
  (10) Figure 1E chow B6+LPS is the same as Figure 5D B6+LPS but should they be different since those should be two different experiments?
  
  We confirm that Figures 1E and 5D correspond to data obtained from independent experiments. Although the experimental conditions were similar, each dataset was generated and analyzed separately to ensure the reproducibility and robustness of our results.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Why did you look at kidney injury in Figure 1D? I think this should be explained a little.
  
  We assessed kidney injury alongside ALT, a marker of liver damage, because both the liver and kidneys are among the primary organs affected during sepsis and endotoxemia. This rationale has been added to the manuscript (page 5, lines 129–131): “Remarkably, compared to the Chow group, HFCD mice exposed to LPS did not show greater changes in other organs commonly affected by endotoxemia, such as the kidneys (Figure 1D).” By evaluating markers of injury in both organs, we aimed to determine whether our physiopathological condition was liver-specific or indicative of broader systemic injury.
  
  (2) I know Figure 2C isn't your data, but why are there so few NK cells, considering NK cells are a resident liver cell type? Doesn't that also bring into question some of your data if there are so few NK cells? And the IFNG expression (2E) looks to mostly come from T-cells (CD8?).
  
  The data shown in Figure 2C were reanalyzed from a separate NAFLD model based on a 60% high-fat diet. Although this model differs from ours, the observed low number of NK cells is consistent with expectations for animals subjected solely to a hyperlipidic diet, which primarily provides an inflammatory stimulus that promotes recruitment rather than maintaining high baseline NK cell numbers.
  
  In our experimental model, these observations align with published data. Specifically, liver tissue from NAFLD animals typically exhibits low baseline NK cell numbers, but upon LPS challenge, there is a marked increase in NK cell recruitment to the liver. This dynamic illustrates the interplay between dietinduced inflammation and immune cell recruitment in our experimental context and supports the interpretation of our IFNγ data.
  
  (3) In your methods, I think you didn't explain something. You said LPS was administered to 56 week old mice, but that HFCD diet was started in 5-6 week old mice and lasted 2 weeks, then LPS was administered. So LPS administration happened when the mice were 7-8 weeks old, right?
  
  We thank the reviewer for pointing out this inconsistency in our Methods section. The reviewer is correct: the HFCD diet was initiated in 5–6-week-old mice, and LPS was administered after 2 weeks on the diet, such that LPS challenge occurred when the mice were 7–8 weeks old.
  
  We have revised the Methods section (add page 15-16, lines 474–480). to clarify this timeline and ensure it is accurately described in the manuscript. The corresponding description has been added to the Materials and Methods section (Page 14, Lines 436-442) as follows: “Lipopolysaccharide (LPS; Escherichia coli (O111:B4), L2630, Sigma-Aldrich, St. Louis, MO, USA) was administered intraperitoneally (i.p.; 10 mg/kg) in C57BL/6, CCR2 -/-, IFN-/-, and TNFR1R2 -/- mice. The HFCD was initiated in 5–6 week-old mice, and LPS was administered after 2 weeks on the diet, meaning that LPS administration occurred when the mice were 7–8 weeks old, with body weights ranging from 22 to 26 g. LPS was previously solubilized in sterile saline and frozen at -70°C. The animals were euthanized 6 hours after LPS administration”.
  
  (4) Throughout the manuscript, I would consider changing the term NAFLD to something else. I think HFCD diet is a closer model to NASH, so there needs to be some discussion on that. And the field is changing these terms, so NAFLD is now MASLD and NASH is now MASH.
  
  We appreciate the reviewer’s comment regarding the terminology and disease classification. In our experimental conditions, the animals were subjected to a high-fat, choline-deficient (HFCD) diet for only two weeks, a period considered very early in the progression of diet-induced liver disease. At this stage, histological analysis revealed lipid accumulation in hepatocytes without evidence of hepatocellular injury, inflammation, or fibrosis. Therefore, our model more closely resembles the metabolic-associated fatty liver disease (MAFLD, formerly NAFLD) stage rather than the more advanced metabolic-associated steatohepatitis (MASH, formerly NASH).
  
  Indeed, prolonged exposure to HFCD diets, typically 8 to 16 weeks, is required to induce the inflammatory and fibrotic features characteristic of MASH. Since our objective was to study the initial metabolic and immune alterations preceding overt liver injury, we believe that using the term MAFLD more accurately reflects the pathological stage represented in our model. Accordingly, we have revised the text to align with the updated nomenclature and disease context.
  
  (6) I am concerned about over interpretation of the publicly available RNA-seq data in Figure 2. This data comes from human NAFLD patients with unknown endotoxemia and mouse models using a traditional high-fat diet model. So it is hard to compare these very disparate datasets to yours. Also, if these datasets have elevated IFNG, why does your model require LPS injection?
  
  We thank the reviewer for their thoughtful comments regarding the interpretation of the RNA-seq data presented in Figure 2. We would like to clarify that the human NAFLD datasets referenced in our study do not specifically include patients with endotoxemia; rather, they focus on individuals with NAFLD alone.
  
  Comparing data from human and murine MAFLD models, we observed that NK cells, T cells, and neutrophils are present and contribute to the hepatic inflammatory environment. Our reanalysis indicates that the elevations of IFNγ and TNF in NAFLD are primarily derived from NK cells, T cells, and myeloid cells, respectively.
  
  In our experimental model, LPS administration was used to evaluate whether these immune populations particularly NK cells are further potentiated under a hyperinflammatory state, leading to exacerbated IFNγ production. This approach allows us to determine whether increased IFNγ contributes to worsening outcomes in NAFLD, providing mechanistic insights that cannot be obtained from static human or traditional mouse datasets alone.
  
  (7) The zoom-ins for the histology (for example, Figure 1E) don't look right compared to the dotted square. The shape and area expanded don't match. And the cells in the zoom-in don't look exactly the same either.
  
  We have thoroughly re-examined the histological sections and the corresponding zoom-ins, including the example in Figure 1E. Upon verification, we confirm that the zoom-ins accurately represent the highlighted areas indicated by the dotted squares. The apparent discrepancies in shape or cellular appearance are likely due to minor differences in orientation or cropping during figure preparation. Nevertheless, the content and regions depicted are consistent with the original sections.
  
  (8) Did the authors measure myeloid infiltration in the CCR2-/- mice? Did you measure Neutrophil infiltration in the TNF-Receptor KO mice?
  
  Analysis of CD45+ cell migration in CCR2 knockout mice, as shown in Supplemental Figure 5C and 5D, demonstrates that the absence of CCR2 does not impair overall leukocyte migration. Similarly, assessment of neutrophil migration in TNF receptor (TNFR1/2) knockout mice, presented in Supplemental Figure 8A, shows that neutrophil trafficking is not affected in these animals. These results indicate that the respective knockouts do not compromise the migration of the analyzed immune populations, supporting the interpretations presented in our study.
  
  (9) Regarding Methods for RNA-seq Analysis. Was the Mitochondrial percentage cutoff 0.8%, because that seems low. And was there not a Padj or FDR cutoff for the differential expression?
  
  The mitochondrial percentage in our scRNA-seq analysis reflects the proportion of mitochondrial gene expression per cell, which serves as a quality control metric. A low mitochondrial gene expression percentage, such as the 0.8% cutoff used here, is indicative of highly viable cells.
  
  For differential gene expression analysis, we employed the FindMarkers function in Seurat with standard parameters: adjusted p-value (Padj) < 0.05 and log2 fold change > 0.25 for upregulated genes, and adjusted p-value < 0.05 with log2 fold change < -0.25 for downregulated genes. These thresholds ensure robust identification of differentially expressed genes while balancing sensitivity and specificity.
  
  (10) Regarding Methods for Flow Cytometry. How were IFNG and TNF staining performed? Was this an intracellular stain? Did you need to block secretion? TNF and IFNG antibodies have the same fluorophore (PE), so were these stainings and analyses performed separately?
  
  Six hours after LPS challenge, non-parenchymal liver cells were isolated using Percoll gradient centrifugation. Because the animals were in a hyperinflammatory state induced by LPS, no in vitro stimulation was performed; all staining was carried out immediately after cell isolation. Detection of IFNγ and TNF was performed via intracellular staining using the Foxp3 staining kit (eBioscience). Due to both antibodies being conjugated to PE, IFN-γ and TNF-α staining and analyses were conducted in separate experiments. These distinct staining protocols and analyses are detailed in Supplemental Figures 10 and 11. The corresponding description has been added to the Materials and Methods section (Page 16, Lines 490-493) as follows: ``As animals were already in a hyperinflammatory state, no additional in vitro stimulation was required. Intracellular detection of IFN-γ and TNF-α was conducted using the Foxp3 staining kit (eBioscience). Since both antibodies were conjugated to PE, staining and analyses were performed in separate experiments``
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Achieving an NAFLD model/disease is the starting point of this study. I understand that a two-week HFCD diet period was applied due to the decrease in lymphocyte numbers. Was it enough to initiate NAFLD then? Or is it a milder metabolic disease? Which parameters have been evaluated to accept this model as a NAFLD model?
  
  Indeed, the two-week HFCD diet induces an early-stage form of NAFLD, characterized by initial fat accumulation in the liver without significant hepatic injury. While this represents a milder metabolic phenotype, it is sufficient to study the inflammatory and immune responses associated with NAFLD. To validate this model, we assessed multiple parameters: liver weight, blood glucose levels, and collagen deposition. These measurements confirmed the presence of early-stage NAFLD features in the animals, providing a relevant and reliable context for investigating susceptibility to endotoxemia and immune cell dynamics. They are shown in Figure Suplementary 1 and the text was included in the manuscript (Page 5, Lines 116-117): “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C) ”.
  
  (2) It is true that the CD274 gene (encoding PD-L1) and the IFNGR2 gene, corresponding to the IFNγ receptor, are among the upregulated genes when authors analyzed the publicly available RNAseq data but they are not the most significantly elevated genes. What is the reasoning behind this cherrypicking? Why are other high DEGs not analyzed but these two are analyzed?
  
  We highlighted the expression of the IFN-γ receptor (IFNGR2) and CD274 (encoding PD-L1) in the publicly available RNA-seq data to align and corroborate these findings with the key results observed later in our study. To avoid redundancy, we chose to present these genes in the initial figures as they are directly relevant to the subsequent analyses. Regarding the broader analysis of human RNA-seq data, our primary objective was to identify enriched biological processes and pathways, which served as a foundation for the focus and direction of this study.
  
  (3) Figures 3C-3G: I understand that IFNg-/- and NFR1R2a-/- mice are not showing elevated liver damage but it may simply be because of the non-responsiveness to the LPS challenge. I suggest using a different challenge or recovery experiments with the cytokines to show that the challenge is successful and results are caused by NAFLD, truly. The same goes for Figure 6: Looking at Figure 6D one may think that IFNg deficiency alters the LPS response independent of the diet condition (or NAFLD condition).
  
  We appreciate the reviewer’s insightful comment and fully understand the concern regarding the potential non-responsiveness of IFN-γ⁻/⁻ and TNFR1R2a⁻/⁻ mice to the LPS challenge. To address this point and confirm that these knockout animals are indeed responsive to LPS stimulation, we conducted an additional set of ex vivo experiments.
  
  Specifically, WT and cytokine-deficient (IFN-γ⁻/⁻) mice were fed either Chow or HFCD for two weeks, after which spleens were collected, and splenocytes were challenged in vitro with LPS. We then quantified TNF, IFN, and IL-6 production to confirm that these mice are capable of mounting cytokine responses upon LPS stimulation.
  
  Due to current breeding limitations and a temporary issue in colony maintenance of TNF-deficient mice, we were unable to include TNFR1R2a⁻/⁻ animals in this additional experiment. Nevertheless, we prioritized performing the analysis with the available knockout line to avoid leaving this important point unaddressed.
  
  These additional data demonstrate that IFN-γ-deficient mice remain responsive to LPS, reinforcing that the differences observed in vivo are related to the NAFLD condition rather than a lack of LPS responsiveness.
  
  (4) Figure 1 vs Figure 4: Rag-/- mice seem more susceptible to LPS-derived death even after normal conditions. But If I compare the survival data between Figure 1 and Figure 4, Rag-/- HFCD diet mice seem to be doing better than wt mice after LPS treatment. (1 day survival vs 2 days survival). How do you explain these different outcomes?
  
  We thank the reviewer for this insightful question regarding the survival data in Figures 1 and 4. Although there is a one-day difference in survival outcomes, Rag-/- mice consistently exhibit increased susceptibility to LPS-induced mortality can influence the exact survival timing. Nonetheless, across all experiments, Rag-/- mice display a reproducible phenotype of heightened sensitivity to LPS challenge, which is supported by multiple independent observations in our study.
  
  (5) How do you explain Figure 4J in connection to the observation presented with Figure 7: TNFa tissue levels, even though significant, seem very similar between the conditions?
  
  We would like to clarify that the animals in this study are in a metabolic syndrome state, with early-stage NAFLD characterized by hepatic fat accumulation without significant tissue injury, as shown in Figure 1C.
  
  Under these conditions, the LPS challenge triggers an exacerbated inflammatory response, leading to increased secretion of IFN-γ and TNF-α, primarily from NK cells and neutrophils. While TNFα levels may appear visually similar across conditions, the HFCD mice exhibit a heightened predisposition for an amplified immune response compared to chow-fed mice. This difference is consistent with the functional outcomes observed in our study and highlights the diet-specific sensitization of the immune system.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.16.618651v2
www.biorxiv.org www.biorxiv.org

Repurposed small molecule toxin inhibitors neutralise a diversity of venoms from the Neotropical viperid snake genus Bothrops

1
1. Public_Reviews 22 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the current reviews.
  
  We thank the editors and reviewers for their assessment of this manuscript, and for the positive words highlighting the value of undertaking evaluation of small molecule drugs for snakebite in the neotropics, inclusive of the quality of this work and the value of the validated screening pipeline. We completely agree that the next steps for this work will be to evaluate the preclinical efficacy of the identified drugs in mouse models, though this considerable undertaking will form the basis of future work. Critically, the pipeline that we describe herein facilitates the selection of the most appropriate candidates to progress into such mouse studies, aligning with the 3Rs principles for minimising the need for animal research. The comment around insufficient venom characterisation seems somewhat misplaced – the objective of this project was not to characterise the venoms used, but to evaluate the in vitro inhibition of venom toxin family activities and identify the potential utility of specific repurposed drugs as therapeutics for snakebite in the neotropics. Venom characterisation of the diverse samples used in this project would represent an entire project and manuscript in its own right. We are pleased that the reviewers highlight the gap in research on serine protease inhibitors and the value this paper has in highlighting that more research is required in this area to identify a candidate that is more suitable for future clinical use than nafamostat.
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Small molecule therapeutics for snakebite have received a lot of attention for their potential to close the gap between bite and treatment, where antivenom is not immediately available.
  
  Strengths:
  
  There has been a lot of focus on Africa, Asia, and India, but very little work related to neotropical regions. The authors seek to begin filling this gap in the preclinical literature. The authors use well-developed methods for preclinical assessment.
  
  Weaknesses:
  
  A clearer and more focused discussion of the limitations of the overall present work would be desirable (e.g. protection vs. rescue, why marimastat over prinomastat for in vivo assays when both have been through clinical trials for other indications; real-world feasibility of nafamostat, which has a half-life of 1-2 minutes compared to camostat, which has a half-life of hours). All of this could be improved in a revision.
  
  We thank the reviewer for their shared opinion of the potential value of small molecules as snakebite envenoming therapeutics and their insight on the gap in focus in the neotropics, which this manuscript aims to address.
  
  Our work in this manuscript included standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. In our revised manuscript we will make these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models) which, following the in vitro characterisations presented in this paper, are the logical next step for evaluating small molecule drugs for inhibiting neotropical snake venoms.
  
  Although both marimastat and prinomastat are repurposed drugs that have undergone clinical evaluation for other indications, marimastat has been more extensively characterised preclinically than prinomastat for snakebite, and will soon enter Phase II clinical trial evaluation for this indication (https://www.ddw-online.com/ophirex-to-produce-snake-venom-inhibitor-for-lstm-study-40669-202602/). Marimastat also has a longer half-life in humans of 8-10 hours (Millar et al. 1998), compared to prinomastat (2-5h, Hande et al. 2004). We will more clearly highlight the rationale for selecting marimastat in the revised manuscript.
  
  Although we appreciate the reviewer’s point regarding the short half-life of nafamostat (which is typically given by continuous iv infusion due to its short half-life), in the manuscript we have already stated that we do not recommend the progression of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.
  
  Strengths:
  
  The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.
  
  Weaknesses:
  
  Several aspects of the study design and framing reduce the confidence with which readers can translate the findings beyond the specific experimental context presented. The evidence base is strongest in controlled in-vitro settings, while the bridge to real-world effectiveness remains limited, particularly for understanding performance under conditions that better reflect delayed treatment and systemic exposure. As a result, the manuscript is best interpreted as a well-organized comparative screening study with promising signals, rather than a definitive demonstration of a broadly effective, deployable intervention.
  
  We appreciate the reviewer’s opinion on the thorough and logical workflow we present in this manuscript and the value this pipeline providers the field for future and parallel work. We agree with the reviewer that this provides a well-organized comparative screening study applicable to different snake species or therapeutics. In relation to the comment on this manuscript being a definitive demonstration of a broadly effective, deployable intervention we agree with their opinion and are happy to clarify that while the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation.
  
  Reviewer #3 (Public review):
  
  In this work, the authors wanted to evaluate repurposed small molecule inhibitors for the treatment of envenomation by snakes of the Bothrops genus; one of the most medically relevant in the Americas. I believe the objectives of the research were clearly achieved, and compelling evidence for the ability of these molecules to neutralize enzymatic and toxic activities of metalloproteinases and phospholipases in all the tested venoms is provided. Furthermore, the work highlights the limited efficacy of the tested serine protease inhibitor, suggesting a need for drug discovery campaigns to address toxicity caused by this protein family. The methods are well designed and performed, and the use of both in vitro and in vivo methodologies makes this a thorough and robust work.
  
  These results are extremely relevant, since they take us one step further to a potential orally administered snakebite treatment. The existence of such a treatment could improve the outcomes for thousands of snakebite victims worldwide. I have a few comments and questions that I hope will be useful to the authors:
  
  We thank the author for their high regard for the purpose and execution of this work. Their insight in relation to questions are supportive for an improved manuscript and discussion points for the field.
  
  During the introduction, the authors mention that small-molecule inhibitors can neutralize the localized tissue damage via cytotoxicity of some venoms, and cite PLA2s, SVMPs and/or cytotoxic 3FTxs as the main causing agents of this pathology. I am not aware of any direct effect described by small molecule inhibitors on cytotoxic 3FTxs alone. Has this been observed at all? Or is it more likely that the small molecule inhibitors act on the enzymatic toxins only, preventing synergistic effects with 3FTxs?
  
  We apologise for this error on our behalf. While inhibitory molecules have been described for cytotoxic 3FTxs, these are not small molecules as alluded to in the previous version of the manuscript. We have amended this text in our revised manuscript.
  
  I think it would be relevant to address the effects of non-enzymatic PLA2s, such as myotoxin II, which have been described in detail within Bothrops venoms. I believe there is some evidence of Varespladib also having a neutralizing effect on the myotoxicity caused by these non-enzymatic PLA2s. I suggest adding a comment about the contribution of these toxins in the discussion or in the section where PLA2 activity of the venoms is compared. In my opinion, right now it seems like these were overlooked.
  
  We thank the reviewer for highlighting this point. We agree that this is highly relevant and would benefit from discussion in the revised manuscript given the nature of our assays and the non-enzymatic mechanism of action of certain Bothrops PLA<sub>2</sub>s. We have added this to the discussion.
  
  Regarding Marimastat and the other MP inhibitors, are there any studies showing that they don't have an effect on endogenous MPs? I understand they have been approved for human use before, but is there any indication that they would not have an effect at the doses that would be required to treat envenomation?
  
  Most matrix metalloproteinases inhibitors will act on endogenous MPs to at least some extent (variable potency on different MMPs). Marimastat has demonstrated activity against endogenous metalloproteinases, including MMP1, which was hypothesised to cause severe joint pain when used chronically (i.e. frequent dosing over many weeks) for indications such as cancer, though this effect was reversible within 8 weeks of cessation of drug administration (Wojtowicz-Praga, 1998). Thus long-term use of matrix metalloproteinases inhibitors can cause safety concerns. However, the anticipated duration of dosing for snakebite, which is an acute life-threatening condition, is a few days. It is therefore unlikely that prior safety concerns observed following chronic dosing in cancer studies would apply to its potential use as a snakebite field therapy.
  
  Regarding the quenched fluorescence substrate used for enzymatic activity. Is there a possibility that some of the SVMPs would not act on this substrate, and therefore their activity or neutralization is not observed? Would it be relevant to test other substrates, such as gelatin, collagen, or even specific clotting factors?
  
  It has been observed that certain SVMPs (specifically several PI SVMPs) are not active against this ES010 substrate in vitro. The substrate used in the in vitro SVMP assay is reported by the manufacturer as a substrate for a wide range of MMPs which target the extracellular matrix components mentioned by the reviewer, i.e. collagenases and gelatinases as well as matrilysins, stromelysins and elastate. This in vitro assay combined with the coagulation assays are complementary in covering the main targets of SVMPs (ECM and clotting cascade), prior to haemorrhagic assessment in the egg model. Thus, we are confident that activity for the broad range of SVMP isoforms will be captured through the screening pipeline we have developed.
  
  Finally, could the authors comment or provide some bibliography regarding the translatability of the chicken embryo model in the context of envenomation?
  
  Our current model is based on an earlier egg embryo model (Sells et al. 1997, Sells et al. 1998 and Sells et al. 2000) which described good correlations (p<0.01) with the standard WHO murine preclinical envenoming model. These studies have assessed correlations for minimal haemorrhagic doses (MHDs), LD50s and ED50s in both models for a selection of viper venoms. As chicken embryos at day 6 of development have incomplete neural arcs, the model is not well suited for assessing neurotoxic effects, but can be effectively used for addressing venom-induced haemorrhage and lethality and for testing therapeutics. In addition, a more recent study (Yusuf et al. 2023) reported almost identical LD50s for the venom of Bitis arietans between the two in vivo approaches. The model is also being pursued as a preclinical testing model by an antivenom manufacturer with the focus of reducing the use of rodents in batch release testing (Verity et al. 2021). We will provide further clarification on the rationale for using the egg model, including the supportive references outlined above, in the revised manuscript.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  The manuscript provides a useful comparative dataset across multiple Bothrops venoms and supports SVMP inhibition as a broadly effective lever in the authors in-vitro work. However, the strength of the 'pan-Bothrops' and translational claims is currently limited by insufficient characterization of the exact venom samples tested and by experimental designs that fall in clinically realistic rescue.
  
  Major comments:
  
  (1) The venoms used in this study are historical batches and are not formally characterized beyond SDS-PAGE and literature summaries, despite well-known intra- and inter-population venom variability; this weakens the generalization of the conclusions.
  
  To address this comment, we have increased clarity on our venom sources being historic, Due to the historic source locality is not available beyond country of origin, with the exception of B. lanceolatus which is endemic to Martinique. Figure 1 also makes clear that we agree with the reviewer that the variation is high within Bothrops species. We discuss this variation on the limitations in our sampling for making broad conclusions throughout the first paragraph of the discussion, with the final sentence stating Future proteomic characterisations of the specific venom samples used in this study, which were all sourced from a historical collection (except for B. lanceolatus), would be informative in this regard. Although venom composition of our samples has not been characterised, the focus of the manuscript is the characterisation of the whole venom functional activity through a wide ranging screening pipeline, and the generalisation of our findings is supported by the diversity of the venom samples (i.e. several species) despite them not being characterised (which is not critical for the focus of the study).
  
  (2) On a technical comment, the venom inhibition assays appear to rely on drug-first or preincubation conditions, which can easily overestimate efficacy compared with real snakebite envenomation, where toxins distribute and engage targets rapidly. Here, a translational gap is the clinical feasibility of the 'repurposed' inhibitors, as it is unclear whether the drugs central to the conclusions (especially marimastat, prinomastat and varespladib) are realistically available or stocked in hospitals or could be deployed in regions where Bothrops envenoming occurs. I think that the manuscript should clearly distinguish this from candidates with a plausible access and delivery pathway.
  
  Our work in this manuscript includes standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. None of our methods administer drug-first. Throughout the methods and figure legends we have made these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models), which would be the next step for this research programme.
  
  While the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite, inclusive of the requirement to complete clinical trials, cost-benefit analysis and policy change and manufacturing/distribution feasibility assessments. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within rescue in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation. To further support this point we have included an additional section to the manuscript discussing the current preclinical and clinical progression of prinomastat and marimastat, which also incorporates the public comment on selection of marimastat over prinomastat.
  
  (3) In my opinion, the Nafamostat results and discussion need reframing, given weak SVSP inhibition and intrinsic anticoagulant behavior at 5 µM. Excluding it from certain analyses undermines interpretability, and it may be more appropriate to include it throughout as an explicit negative control condition (showing its baseline anticoagulant effect) rather than omitting it.
  
  Although we understand the reviewers opinion here, we disagree and believe that including nafomastat as a ‘negative control’ may present a negative reflection on the benefit that an efficacious serine protease inhibitor could provide. Furthermore, as the intrinsic anticoagulant effect of nafamostat cannot be de-coupled from direct SVSP toxin inhibition we were unable to interpret the activity which undermines the results. This can be seen in Figure 3b, which demonstrates that a false positive result would occur. For the serine protease assay, we do clearly discuss the lack of efficacy and justification of why EC<sub>50</sub> testing wasn’t appropriate within the guidance of our screening protocols.
  
  In the manuscript we have now further justified our approach in relation to the limitations of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.
  
  (4) The data presentation needs consistent statistical analyses (currently absent for multiple key figures, including Figures 2, 3, 4, 6 and 7) and a clearer explanation for the dose of venom and drugs you choose. For example, Figure 3 relies on a fixed 5 µM drug concentration and very different venom amounts (50-100-250 ng), but it is not discussed whether such exposures are achievable in vivo, or how these concentrations map onto expected pharmacokinetics in patients. Likewise, Bothrops venoms can contain both pro- and anticoagulant activities, so the authors should justify how their framework accounts for anticoagulant components and why the observed plasma phenotypes are interpreted as they are
  
  In relation to the reviewers comment on the need for consistent analysis we thank the reviewer for flagging this and have now included these in figures 3, 4, 6 and 7. However, Figure 2 is presented to display the variation between all the venoms and ultimately used to select the most relevant doses for the latter inhibition experiments, therefore statistical analysis is not relevant for this figure. The updated statistical analysis now includes the following, which has been included in the relevant figure legends and results sections;
  
  Figure 3 - Bars indicate significant results (p = <0.05) identified through one-way ANOVA with Dunnett’s multiple comparisons test to the DMSO control
  
  Figure 4 - two-way ANOVA with Šídák's multiple comparisons test of each venom control compared to the matched venom treated with inhibitor
  
  Figure 6 – the CT and MCF data were analysed independently using one-way ANOVA with Tukey’s multiple comparisons test
  
  Figure 7 - Log-rank test (Mantel-Cox) with Holm- Šídák's multiple comparisons test against treatment vs venom-only control
  
  We have ensured that all figure legends clearly indicate the venom and drug dose to aid the clarity which the reviewer requested.
  
  The comment Figure 3 relies on a fixed 5 µM drug concentration and very different venom amounts (50-100-250 ng), but it is not discussed whether such exposures are achievable in vivo, or how these concentrations map onto expected pharmacokinetics in patients. is an understandable query however, in vitro assessment such as those carried out in this manuscript are not designed to directly inform pharmacokinetic/pharmacodynmanic interpretations, largely because they do not replicate real world envenoming (i.e. preincubation would not occur between a venom and treatment). This is why, as stated, follow on preclinical and clinical assessments are needed for onward progression of these inhibitors to inform dosing regimens that might achieve the necessary exposures required for in vivo venom neutralisation. That being said, PK/PD work has been initiated within Phase I trials, for example with DMPS Abouyannis et al. 2025 demonstrated a plasma exposure of >10 µg/mL for single doses of 1,200 mg and higher. This is equivalent to 80 µM, which although is lower than the EC<sub>50</sub> for some venoms in the clotting assay (Figure 3J), the venom dose (50 to 250 ng/ 50 µL, i.e. 1,000 to 5,000 ng/µL) is estimated to be >1000 times higher than a natural envenoming by Bothrops atrox at less than 1 ng/mL in serum (https://doi.org/10.1016/j.toxicon.2022.09.010). These extrapolations therefore indicate that the doses selected in our studies would have human clinical relevance.
  
  Finally, in terms of anticoagulant venom effects - these would be observed in our experimental approach either as reduced kinetic responses in the plasma clotting assay (as observed with nafamostat in Figure 3B) or as a prolonged clotting time in the thromboelastography assay (Figure 6). As stated in the results section Comparison of coagulation profiles, all of the venoms tested presented with a procoagulant effect. If underlying anticoagulant activity from PLA<sub>2</sub> toxins was to arise after inhibition of the procoagulant toxins (i.e. SVMPs by marimastat), as has been seen for certain other snake venoms previously, this would result in a percentage inhibition far greater than 100% in the plasma assay (Figure 3C to I) or as a prolonged clotting time in the thromboelastography assay. These described anticoagulant profiles were not observed with any venom tested in this study.
  
  (5) Finally, the in vivo evidence is limited to a chicken embryo model. To support your hypothesis, a conventional mouse model with delayed post-envenomation dosing (24-36 h monitoring) is needed to address both safety/toxicity and post-exposure efficacy, and to define a realistic therapeutic window, especially because venom toxins act very quickly and the timing of administration is central to the clinical utility of any small-molecule approach.
  
  We agree with the reviewer that the next important step for this research activity is utilising murine preclinical models to validate the in vitro and preliminary in vivo findings described in this manuscript. However, as stated above, this study provides the initial evidence base that the promising utility of marimastat, DMPS and varespladib as repurposed snakebite drugs extends to a range of neotropical viper venoms. Evaluating the safety, efficacy (both precincubation and rescue approaches) and PK/PD relationships to inform optimal dosing strategies of these molecules will be crucial next steps for the field. However, these activities are far from trivial and will take several years of additional research, and therefore fall outside the scope of this initial manuscript.
  
  To address the concern related to the evidence is limited to a chicken embryo model, we have included additional sentences to discuss the wider use of the egg model within snakebite research and related translation to murine studies.
  
  Minor comments:
  
  (1) Figure 2D: How do you discuss the fact that "no venom" has SVSP activity?
  
  The data for all in vitro assays in Figure 2 is presented as AUC from the raw data (absorbance or fluorescence), for consistency across assay. Therefore, all assays (B to D) have background signal in the absence of venom. The SVSP assay has a greater background signal.
  
  (2) For better understanding, I would suggest adding a dedicated column in Figure 4A with Nafamostat SVSP data reported as "N/D" where applicable.
  
  As stated in the results, due to the weak inhibitory activity EC<sub>50</sub> assessment was not justified, therefore adding this column would be redundant.
  
  (3) The introduction is too long relative to the experimental content and would benefit from tightening to sharpen the motivation and unmet need.
  
  We thank the reviewer for their opinion and we have reviewed the introductory section again. While we made minor edits throughout, we decided not to make substantial modifications to it.
  
  Reviewer #3 (Recommendations for the authors):
  
  I only have some minor comments:
  
  (1) In line 100, the word "that" is repeated.
  
  We thank the reviewer for spotting this error, which we have corrected.
  
  (2) Line 433. I believe the word "compromising" should be substituted by "comprising" here.
  
  We thank the reviewer for spotting this.
  
  (3) Figure 1 and supplementary: Bothrops asper venom has been very thoroughly studied, and using only one study from Costa Rica might underestimate the venom variation within the species. I suggest looking at the following study: https://doi.org/10.1016/j.toxicon.2022.106983. Maybe it is not necessary to change anything, but worth looking into.
  
  We appreciate the reviewer flagging this paper, it has been added to the manuscript (reference 48) and has provided additional data for Figure 1 and Supplementary table 1.
  
  (4) Methods: Given the intraspecies variation described for some of these species, I believe it is relevant to add the locality of origin of the venoms, and not only the country. I, of course, understand this is often unknown for historical samples.
  
  We have included the following sentence in the methods. Due to the historic nature of the venom samples, the source locality is not available beyond country of origin, with the exception of B. lanceolatus which is endemic to Martinique.
  
  (5) Figure 3: It is not very accurate to show an SD when the sample number is 2. I suggest, when possible, showing the mean and the two data points in the plots. This also applies to other figures where n=2. Also, in Figure 3D, does Marimastat seem to have an anticoagulant effect, or is this just within normal variation?
  
  We have removed the statement in the statistics paragraph of the methods Standard deviation (SD) for all kinetic reads and standard error for AUC is reported based on Prism v10 but kept the sentence. The sample sizes for HTS assays including the SVMP, PLA<sub>2</sub> and coagulation experiment are the average of the means from independent assays (n >2 within each independent assay). We understand the reviewer’s opinion on limited meaning of SD as well as SE for Fig 3 A to I, therefore we have changed the error bars to range, as we think that displaying the individual points would result in a lack of visual and analytic clarity.
  
  In relation to the query about marimastat anticoagulant effect in Fig 4D, as shown in 4B marimastat has no direct anticoagulant effect. The >100% inhibition for marimastat is likely to be normal variation as this is a biological assay which has high variability. However, it could also be that the strong inhibition of the SVMPs in B. asper along with limited SVSP activity has unmasked an anticoagulant effect of the remaining PLA<sub>2</sub> toxin which has high activity in this venom. That being said, as B. asper has a similar profile, we would have expected to see a similar profile in B. atrox in both the plasma and TEG assays. Therefore, assay variation seems the most likely reason for this observation.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.02.697350v2
arxiv.org arxiv.org

Neural correlates of perceptual consciousness from within: a narrative review of human intracranial research

1
1. Public_Reviews 21 May 2026
  
  in eLife (unscoped)
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary
  
  In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.
  
  Strengths
  
  The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors, as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.
  
  We appreciate the reviewer's positive feedback on our work.
  
  Weaknesses
  
  Overall, I feel positive about this paper. However, there are a couple of aspects to the manuscript that I think could be improved.
  
  (1) Distinguishing NCCs from their prerequisites or consequences
  
  This section in the introduction was particularly confusing to me. Namely, in this section, the authors' aim is to explain how intracranial recordings can help distinguish 'pure' NCCs from their antecedents and consequences. However, the authors almost exclusively describe different tasks (e.g., no-report tasks) that have been used to help solve this problem, rather than elaborating on how intracranial recordings may resolve this issue. The authors claim that no-report designs rely on null findings, and invasive recordings can be more sensitive to smaller effects, which can help in such cases. However, this motivation pertains to the previous sub-section (limits of noninvasive methods), since it is primarily concerned with the lack of temporal and spatial resolution of fMRI and M/EEG. It is not, in and of itself, a means to distinguish NCCs from their confounds.
  
  As such, in its current formulation, I do not find the argument that intracranial recordings are better suited to identifying pure NCCs (i.e. separating them from pre- or post-processing) convincing. To me, this is a problem solved through novel paradigms and better-developed theories. As it stands, the paper justifies my position by highlighting task developments that help to distinguish NCCs from prerequisites and consequences, rather than giving a novel argument as to why intracranial recordings outperform noninvasive methods beyond the reasons they explained in the previous section. Again, this position is justified when, from lines 505-506, the authors describe how none of the reported single-cell studies were able to dissociate NCCs from post-perceptual processing. As such, it seems as if, even with intracranial recording, NCCs and their confounds cannot be disentangled without appropriate tasks.
  
  The section 'Towards Better Behavioural Paradigms' is a clear attempt to address these issues and, as such, I am sure the authors share the same concerns as I am raising. Still, I remain unconvinced that the distinguishing of NCCs from pre-/post- processing is a fair motivation for using intracranial over noninvasive measures.
  
  We agree that distinguishing proper NCCs from their prerequisites or consequences is primarily a matter of experimental design and theoretical framework, not merely of recording modality. We did not mean to imply that intracranial recordings inherently solve this dissociation.This is now explicitly stated that at the beginning of this section. Instead, we argued that the high signal-to-noise ratio and spatiotemporal accuracy of sEEG offer a stronger "testing ground" for the null findings often relied on by no-report paradigms. This is now also further clarified in the revised section “Limits of noninvasive measures”.
  
  We also explicitly acknowledge, as the reviewer noted, that even the most precise recordings require careful task dissociations to distinguish NCCs from their prerequisites and consequences.
  
  (2) Drawing misleading conclusions from certain studies
  
  There are passages of the manuscript where the authors draw conclusions from studies that are not necessarily warranted by the studies they cite. For instance:
  
  Lines 265 - 271: "The results of these two studies revealed a complex pattern: on the one hand, HGA in the lateral occipitotemporal cortex and the ventral visual cortex correlated with stimulus strength. On the other hand, it also correlated with another factor that does not appear to play a role in visibility (repetition suppression), and did not correlate with a non-sensory factor that affects visibility reports (prior exposure). These results suggest that activity in occipitotemporal cortex regions reflecting higher-order visual processing may be a precursor to the NCC but not an NCC proper."
  
  It's possible to imagine a theory that would predict HGA could correlate with stimulus strength and repetition suppression, or that it would not correlate with prior exposure (e.g. prior exposure could impact response bias without affecting subjective visibility itself). The authors describe this exact ambiguity in interpretation later in the article (line 664), but in its current form, at least in line 270 (when the study is most extensively discussed), the manuscript heavily implies that HGA is not an NCC proper. This generates a false impression that intracranial recordings have conclusively determined that occipitotemporal HGA is not a pure NCC, which is certainly a premature conclusion.
  
  We agree that our interpretation of these studies (lines 265–271 of the previous version of the manuscript) was presented too definitively. We have modified the text (now lines 314-317) to soften this conclusion and align it with the more nuanced discussion later in the manuscript. Specifically, we now frame this as a "suggested dissociation" rather than a conclusive finding (line 730), and we explicitly acknowledge that alternative interpretations remain viable.
  
  Line 243: "Altogether, these early human intracranial studies indicate that early-latency visual processing steps, reflected in broadband and low gamma activity, occur irrespective of whether a stimulus is consciously perceived or not. They also identified a candidate NCC: later (>200 ms) activity in the occipitotemporal region responsible for higher-order visual processing."
  
  The authors claim in this section that later (>200ms) activity in occipitotemporal regions may be a candidate for an NCC. However, the Fisch et al. (2009) study they describe in support of this conclusion found that early (~150ms) activity could dissociate conscious and unconscious processing. This would suggest that it is early processing that lays claim to perceptual consciousness. The authors explicitly describe the Fisch et al results as showing evidence for early markers of consciousness (line 240: '...exhibited an early...response following recognized vs unrecognised stimuli.) Yet only a few lines later they use this to support the conclusion that a candidate NCC is 'later (>200ms) activity in the occipitotemporal region' (line 245). As such, I am not sure what conclusion the authors want me to make from these studies.
  
  This problem is repeated in lines 386-387: "Altogether, studies that investigated the cortical correlates of visual consciousness point to a role of neural responses starting ~250 ms after stimulus onset in the non-primary visual cortex and prefrontal cortex."
  
  This seems to be directly in conflict with the Fisch et al results, which show that correlates of consciousness can begin ~100ms earlier than the authors state in this passage.
  
  We thank the reviewer for pointing out this inconsistency. We agree that stating ">200 ms" conflicts with the findings of Fisch et al. (2009), who observed dissociations as early as ~150 ms. Our goal was to contrast the very early, stimulus-driven responses with the later responses that reflect consciousness. However, as the reviewer correctly notes, the exact "onset" of these signals varies across studies and paradigms. To address this, we have removed the specific ">200 ms" mentioned in line 245 of the previous version of the manuscript and updated the timing in line 284 to "starting 150 ms" to better reflect the results of Fisch et al. We also clarify that while the exact latency depends on the paradigm, a consistent finding is that activity representing conscious contents in higher-order visual cortex follows an initial wave of unconscious processes (lines 809-810).
  
  (3) Justifying single-neuron cortical correlates of consciousness
  
  The purpose of the present manuscript is to highlight why and how intracortical measures of neural activity can help reveal the neural correlates of perceptual consciousness. As such, in the section 'Single-neuron cortical correlates of perceptual consciousness', I think the paper is lacking an argument as to why single-neuron research is useful when searching for the NCC. Most theories of consciousness are based around circuit or system-level analyses (e.g., global ignition, recurrent feedback, prefrontal indexing, etc.) and usually do not make predictions about single cells. Without any elaboration or argument as to why single-cell research is necessary for a science of consciousness, the research described in this section, although excellent and valuable in its own right, seems out of place in the broader discussion of NCCs. A particularly strong interpretation here could be that intracranial recordings mislead researchers into studying single cells simply because it is the finest level of analysis, rather than because it offers helpful insight into the NCCs.
  
  It is true that many prominent theories of consciousness were developed based on macroscopic observations, largely due to the prevalence of non-invasive recordings in humans. However, we argue that recording single-unit activity is important for several reasons, and we made this clearer in the revised version. First, signals like fMRI, EEG (or even LFP) often conflate multiple distinct neural populations. SUA allows us to dissociate neurons representing the percept from neighboring neurons involved in task-related confounds (e.g., motor preparation or arousal) that would otherwise be blurred together. Therefore, some percepts might be represented by sparse coding involving a small, specific population of "concept" or "percept" cells. Electrophysiological studies in animal models reveal that various cognitive processes are encoded within neuronal subspaces that only emerge when single-unit activity is analyzed as lower-dimensional projections of the broader neural activity manifold (Mante et al., 2013; Ebitz & Hayden, 2021; Jayazeri & Afraz, 2017). Importantly, many neural computations are only discernible through the lens of population dynamics (i.e. with single neuron activity) (Vyas et al., 2021). We believe that providing high granularity through SUA recordings prevents over-aggregation of data, ensuring that even system-level theories can build on biologically accurate foundations.
  
  Moreover, some theories are defined at the cellular level. For instance, the Dendritic Integration Theory (Bachmann et al., 2020) posits that the integration of feedforward and feedback signals occurs at the level of individual pyramidal neurons. Without SUA, these cellular mechanisms remain untestable. Beyond spatial granularity, SUA also provides excellent temporal granularity, which is crucial for testing theories that rely on the precise timing of spikes (e.g., neural synchrony). As LFPs reflect average activity across populations, only SUA can confirm whether individual neurons lock their spikes to a specific phase, a mechanism hypothesized to bind features into a conscious whole.
  
  We added these points to a new section in the revised manuscript. References:
  
  Bachmann, T., Suzuki, M., & Aru, J. (2020). Dendritic integration theory: A thalamo-cortical theory of state and content of consciousness. Philosophy and the Mind Sciences, 1(II).
  
  Ebitz, R. B., & Hayden, B. Y. (2021). The population doctrine in cognitive neuroscience. Neuron, 109(19), 3055-3068.
  
  Jazayeri, M., & Afraz, A. (2017). Navigating the neural space in search of the neural code. Neuron, 93(5), 1003-1014.
  
  Mante, V., Sussillo, D., Shenoy, K. V., & Newsome, W. T. (2013). Context-dependent computation by recurrent dynamics in prefrontal cortex. nature, 503(7474), 78-84.
  
  Vyas, S., Golub, M. D., Sussillo, D., & Shenoy, K. V. (2020). Computation Through Neural Population Dynamics. Annual Review of Neuroscience, 43(1), 249-275.
  
  (4) No mention of combined fMRI-EEG research
  
  A minor point, but I was surprised that the authors did not mention any combined fMRI-EEG research when they were discussing the limits of noninvasive recordings. Intracortical recordings are one way to surpass the spatial and temporal resolution limits of M/EEG and fMRI respectively, but studies that combine fMRI and EEG are also an alternative means to solve this problem: by combining the spatial resolution of fMRI with the temporal resolution of EEG, researchers can - in theory - compare when and where certain activity patterns (be they univariate ERPs or multivariate patterns) arise. The authors do cite one paper (Dellert et al., 2021 JNeuro) that used this kind of setup, but they discuss it only with respect to the task and ignore the recording method. The argument for using intracranial recordings is weaker for not mentioning a viable, noninvasive alternative that resolves the same issues.
  
  We thank the reviewer for this point. We have added a discussion of fMRI-EEG to the "Limits of noninvasive measures" section (lines 167-171). While we acknowledge that fMRI-EEG is a powerful non-invasive tool for bridging spatial and temporal scales, we note that it relies on merging an indirect metabolic signal with a weak electrophysiological one filtered by the skull, which is computationally complex and often noisy. In contrast, intracranial recordings provide direct measures of both local field potentials and spiking activity within the same neural population, offering interpretability and signal-to-noise ratio that non-invasive combinations cannot match. In our view, this is not just an alternative to these methods, but a unique means of accessing the underlying neuronal ground truth.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.
  
  They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.
  
  They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with its own merits and pitfalls.
  
  They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.
  
  After setting the stage in this way, the authors provide an extensive review of the knowledge acquired by using invasive recordings in humans. This included population-level measurements in vision and in other sensory modalities, as well as single-neuron level studies. The authors also discuss studies of subcortical NCCs.
  
  The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.
  
  Strengths:
  
  This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC and for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.
  
  Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.
  
  We thank the reviewer for acknowledging the strength of our work.
  
  Weaknesses:
  
  The intention of the authors is to argue how some of the problems faced when studying NCCs are alleviated by the use of intracranial recordings in humans. But in some cases, the link between the problems related to the study of NCCs and the advantages of intracranial recordings over non-invasive methods is not clear.
  
  For example, the authors explain the difficulties in distinguishing between true NCCs from their prerequisites and consequences. This constitutes a difficult conceptual problems that plague all recording techniques. The authors don't provide a convincing explanation of how intracranial recordings offer advantages over EEG or MEG when dealing with these problems.
  
  We agree that the distinction between proper NCCs and their prerequisites or consequences is a fundamental challenge that affects all recording modalities. We did not intend to imply that intracranial recordings are a "silver bullet" for solving this conceptual problem in isolation, and we now explicitly state that at the beginning of this section (line 101).
  
  We have revised the section on "Distinguishing NCCs from their prerequisites or consequences" to clarify that intracranial recordings are a powerful tool when used in conjunction with appropriate experimental designs, rather than a standalone solution to these conceptual difficulties.
  
  For example, the authors explain how the use of non-report designs to rule out post-perceptual processing relies on null results, which, according to them, are harder to interpret given the low resolution of non-invasive methods. But the interpretation of null results is actually more complicated in the case of intracranial recordings. As the coverage achieved by the electrodes is sparse, if a null result is attested, it remains possible that a true effect was present in a nearby patch of cortex out of coverage.
  
  It is true that a null result in an intracranial study may simply reflect that the relevant neural population was not sampled by the specific electrode implantation scheme. However, we argue that interpreting null results is equally, if not more, complicated in non-invasive methods, albeit for different reasons. While M/EEG offers broader coverage, it is blind to many cortical sources because of their orientation (radial sources in MEG) or their location in deep sulci and subcortical structures. The signal-to-noise ratio of M/EEG is also much lower than that of intracranial EEG, making it more likely that null results obscure the existence of subtle effects (Parvizi & Kastner, 2018).
  
  To address this, we revised the manuscript to clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We now explicitly emphasize that drawing conclusions from null results based on intracranial recordings requires caution regarding electrode placement. We also point out that these approaches are complementary: M/EEG can identify large regions of interest, while sEEG can then provide high-resolution "ground truth" to confirm whether those regions are part of the NCC.
  
  Reference: Parvizi, J., & Kastner, S. (2018). Promises and limitations of human intracranial electroencephalography. Nature Neuroscience, 21(4), 474-483. https://doi.org/10.1038/s41593-018-0108-2
  
  The authors argue that the spatial resolution of intracranial recordings is better than that of EEG and MEG. While this is technically true (especially compared to EEG), the true spatial scale of the NCCs is unknown. If NCCs' span is in the mm range, then the additional spatial resolution of intracranial recordings might not be an advantage.
  
  We agree with the reviewer that the exact spatial scale of the NCC remains a topic of ongoing debate. However, we believe that the advantage of intracranial recordings holds true whether the NCC spans millimeters or centimeters. The main spatial limitation of non-invasive electrophysiology (M/EEG) is not just its spatial resolution but also the inverse problem. Since scalp sensors detect a mixture of signals from across the brain, different cortical configurations can produce identical scalp patterns. This makes it challenging to precisely locate the NCC or distinguish it from nearby activity (e.g., motor or attentional signals). When recording intracortically, a widespread NCC could be captured across multiple adjacent channels with high accuracy. Conversely, if the NCC is focal, it can be isolated with high spatial resolution. In either case, intracranial recordings eliminate the spatial ambiguity inherent in scalp recordings. We have revised the Introduction (lines 158-164) to clarify that the "spatial advantage" of intracranial recordings also pertains to the inverse problem, not merely to the ability to record from smaller cortical areas.
  
  Another factor that should be taken into consideration when assessing the spatial resolution of intracranial recordings is that while the listening zone of individual intracranial contacts is small, coverage is sparse and defined by clinical criteria (something that the authors discuss). In practice, the activity recorded by contacts is usually attributed to anatomically defined ROIs with a scale in the cm range. Given the sparse and uneven (across regions and patients) coverage afforded by intracranial recordings, the advantage of intracranial recordings in terms of spatial resolution is overstated.
  
  We thank the reviewer for raising this point regarding how intracranial data is often aggregated into regions of interest. We agree that if researchers generalize findings to large anatomical regions without accounting for single-channel recordings, some of the spatial benefits of intracranial recordings are indeed mitigated. We toned down some of the original claims accordingly, and acknowledged more clearly that clinical constraints of sEEG lead to sparse coverage (245-249).
  
  However, we maintain that even when using an ROI-based approach, intracranial recordings offer a clear advantage over non-invasive methods, in that they represent a direct measure from a specific patch of tissue, rather than a statistical estimate that may be contaminated by "leakage" from distant sources. To address the reviewer’s concern, we have updated the manuscript (lines 244-245) to emphasize the importance of relying on MNI coordinates and individual anatomy rather than solely on broad ROI labels.
  
  Appraisal of whether the authors achieved their aims:
  
  In this work, the authors have gathered an impressive review and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.
  
  What is less clear is how the use of intracranial recordings per se holds potential to overcome problems such as the distinction between true NCCs and the prerequisites and consequences of conscious processing.
  
  Discussion of the likely impact of the work on the field:
  
  This work has the potential of becoming a must-read for anyone working in the field of consciousness research.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.
  
  Strengths:
  
  The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current, sometimes contradicting evidence. As such, the manuscript is important as it calls for a concerted and better exploration of NCCs using iEEG in the future.
  
  We thank the reviewer for stating the importance of our work and its potential contribution to the field.
  
  Weaknesses:
  
  The manuscript extensively discusses the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post-consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.
  
  We agree that a clear definition of report is essential for the reader to interpret the empirical findings presented. We have added a definition to the Introduction (lines 108-111), specifying that we use "report" to refer to any explicit behavioral response (whether verbal, manual, or otherwise) that communicates a subject’s subjective state.
  
  Regarding the conceptual distinction between Phenomenal and Access consciousness, we refer to recent work from some of the co-authors (Mudrik et al., 2025), which suggests that P and A should not be seen as two types of consciousness, but rather as two necessary conditions for conscious experience. While a full discussion of this distinction is beyond the scope of this review, we now clearly state that our focus is on identifying neural activity that reflects the subjective experience itself, regardless of the downstream requirements of report.
  
  Reference: Mudrik, L., Faivre, N., Pitts, M., & Schurger, A. (2025). On a confusion about there being two types of consciousness. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2025.11.012
  
  In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anaesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront and briefly explaining how states and contents interact would strengthen the coherence of the manuscript.
  
  We agree that clarifying the distinction between contents and levels of consciousness early on provides a stronger framework for the paper.
  
  We have added a brief clarification in the Introduction (lines 63-76): "It is also helpful to distinguish between levels of consciousness, defined as a global level of arousal or wakefulness (e.g., being awake vs. under anesthesia), and the contents of consciousness, defined as the specific subjective experiences one has while conscious (e.g., perceiving a visual stimulus; Bayne et al., 2016; Laureys, 2005). While the majority of this review focuses on 'content-specific' NCCs, the two dimensions are intrinsically linked, as global states typically set the conditions for the occurrence of specific conscious contents."
  
  Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.
  
  We thank the reviewer again for this highly positive assessment of the manuscript.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  I would like to reiterate that I believe this is a very scholarly piece of writing, and I congratulate the authors on producing such a useful and timely manuscript. Below, I suggest just a few ways the authors may resolve some of the issues I raised in the public review. However, I would like to emphasise that these are merely suggestions - the authors may think of different and better ways to address these comments that are more in line with either their thinking or writing style, and I would certainly encourage the authors to follow their own preferences if they feel they are at odds with my suggestions.
  
  For the longer comment questioning whether intracranial recordings are really a way to isolate NCCs from their pre- and post-processing, there are two ways the authors could resolve this. One is that they collapse the section distinguishing NCCs from their prerequisites and consequences into the previous section regarding limits of noninvasive measures. For instance, they could make the point that null results are easier to interpret with intracranial recordings in this previous section. Then they could discuss how specific intracranial studies have been able to resolve questions of pre-/post- processing confounds when they introduce studies later in the manuscript. At the moment, the Distinguishing NCCs from their prerequisites and consequences section, at least to me, undermines the argument of why intracranial recordings are important because it spends too much time describing how tasks are the core component of isolating pure NCCs, and not the recording method.
  
  Alternatively, the authors could keep the structure as it is. In this case, I would urge the authors to emphasise the role of intracortical recordings here and to make the argument that this is a problem that intracortical recordings (rather than novel tasks) can solve more convincingly. Citing specific studies that combined intracortical recordings with no-report paradigms and emphasising how the invasive recording allowed the researchers to reach a conclusion that would not have been possible with noninvasive measures would also be helpful.
  
  We thank the reviewer for these useful suggestions and agree that we would not want readers to take from this paper that design issues can be fixed by using invasive recordings. Because confounding issues are crucial in research on the NCC, we believe it is important to include a section on this topic in the Introduction. However, as we explained in our response to the public review, we revised the section introducing Human intracranial electrophysiology to reflect that intracranial recordings are a complementary tool that improves the interpretability of no-report paradigms, rather than a “silver bullet” solution for confound issues. We also explicitly say now that this problem is relevant to all techniques in the study of consciousness, including intracranial recordings (line 101). Additionally, based on the reviewer’s suggestion, we have added a more detailed explanation of how studies that pair intracranial recordings with no-report paradigms provide a unique insight in the Temporal Insights section (lines 822-823).
  
  For my comment: Drawing misleading conclusions from certain studies, I think the public review speaks for itself. I would recommend that the authors make sure they are drawing correct conclusions from the studies they cite, and make clear from the outset where there is ambiguity in interpretation.
  
  We thank the reviewer for bringing these ambiguities to our attention. As explained in the response to the public review, we have modified the text accordingly.
  
  Finally, with regard to the single-cell analyses, I would imagine that most readers will share at least some scepticism around single neurons being the appropriate level of analysis for revealing the basis of perceptual experience. As such, I think it would strengthen the manuscript greatly if the authors could provide a brief argument as to how such work can either inform theories of consciousness or contribute more generally to the study of NCCs, given that the field and its theories are mostly biased towards studying system-level neural processes. I think single-cell analyses are extremely valuable to NCC research, and the authors have a good opportunity to frame these studies accordingly.
  
  We agree. As detailed in the response to the public review, we now specify (1) how a higher level of granularity in electrophysiological measurements can distinguish between awareness-related signals and confounds, (2) that these measurements provide an opportunity to study neuronal population dynamics where various cognitive processes have been shown to emerge in animals and (3) that single-neuron measurements are necessary to test predictions of theories that are defined at the cellular level
  
  Reviewer #2 (Recommendations for the authors):
  
  Recommendations for improving the writing and presentation:
  
  My compliments for having written an impressive review. Overall, I think that this is a beautiful piece of work that will be of great use to the community. My only concern is that the advantages of intracranial recordings over non-invasive methods in solving the difficulties faced in the study of NCCs are overstated.
  
  Here I provide more precise comments for your consideration.
  
  (1) On page 5, lines 100 to 102, you argue that "Scalp EEG and MEG have limitedanatomical resolution due to the overlap of deep and superficial brain signals at the scalp level and, in the case of EEG, the scattering of the adjacent electrical signals through the scalp". It would be good to provide precise estimates of the spatial resolutions of EEG, MEG and intracranial recordings, with accompanying references. Consider also that MEG is relatively insensitive to deep sources. I recommend this paper: Piastra et al. 2020 https://onlinelibrary.wiley.com/doi/10.1002/hbm.25272
  
  We thank the reviewer once again for their positive evaluation of our work. As detailed in the response to the public reviews, we now clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We thank the reviewer for their additional suggestions and have clarified our concern about the anatomical conclusions that can be drawn from scalp EEG and MEG data (lines 158-164).
  
  (2) On page 11, you describe work showing that activity in the occipitotemporal cortex mightreflect a precursor to consciousness, but not an NCC proper, except for the case of faces, in which the fusiform seems to behave like a true NCC. Could you discuss how these seemingly contradictory results could be reconciled?
  
  One possibility is that activity in some parts of the occipitotemporal cortex instantiates content-specific NCCs, i.e., correlates that are only specific to certain stimulus types (in this case: faces), while activity in other parts instantiates precursors of the NCCs. Because faces have been extensively studied, we might have uncovered the content-specific NCCs for these stimuli but not for others. This is now discussed in the text on lines 342-344. Based on reviewer 1’s suggestion, we have also toned down our claim about occipitotemporal activity being a precursor to the NCC.
  
  (3) From line 322, you start to discuss connectivity analyses. Adding a subheading mightimprove readability.
  
  We appreciate the suggestion; however, adding a subheading to a single paragraph would require restructuring the entire section, which could disrupt the flow. We believe the current format maintains clarity and cohesion.
  
  (4) In line 329, you write "It remains unclear to what extent these connectivity patterns reflectpost-perceptual processing and how the signals associated with perceptual consciousness in the occipitotemporal cortex interact with frontoparietal regions." But it's not clear why this is the case.
  
  We meant to make two separate points: (1) these studies did not control for report-related activity using no-report paradigms and (2) there has been no investigation so far of the interaction between occipitotemporal and frontoparietal signals associated with perceptual consciousness. These two points have been clarified in the text (lines 378-381).
  
  (5) In line 692, it would be good to clarify that Pereira 2021 is a single-neuron study.
  
  This has been clarified in the text.
  
  (6) The phrase "more research/work is needed" is repeated several times.
  
  Thank you for pointing this out. To avoid redundancy, we have deleted the second mention of this phrase.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

arxiv.org/abs/2510.08736
uj-pba-workshop.netlify.app uj-pba-workshop.netlify.app

Plant-Based Meat: A Category-Level Share Analysis

1
1. unjournal 21 May 2026
  
  in Public
  
  Quality at parity hasn't unlocked majority adoption. Plant-based nuggets — the format that has reached sensory parity in blinded testing — still hold only 2 to 3% of the conventional nugget category. If matching taste isn't sufficient, then taste investments alone may have lower returns than the parity-headroom argument suggests.
  
  Think about this more and state in a a more reasoned logical way. Note that we're largely thinking about price here (as well as taste, nutrition and availability). We're largely focused on the the impact of cost and price on consumption and substitution. In fact, skeptics were saying that "we don't care too much about substitution and price impacts because 1. it has such a low market share and 2. it's not taste or nutrition comparable."
Visit annotations in context

Annotators

unjournal

URL

uj-pba-workshop.netlify.app/context/pb_meat_share_dashboard_claude.html
www.biorxiv.org www.biorxiv.org

The co-repressor Groucho limits progression through the early transcription elongation checkpoint in vivo

1
1. EMBOpress 21 May 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  1. General Statements
  
  We thank the reviewers for their constructive evaluation of our manuscript. We are pleased by the overwhelmingly positive consensus regarding the quality and significance of our data. In particular, the reviewers highlighted that this is a "nice, clean study with interesting data" and noted that our in vivo functional genetic findings in the Drosophila wing are "clearly a strength" that "moves the paper beyond cell-culture correlations" to provide a "simple, straightforward take-home message".
  
  The principal critique across the reports concerns the extent of direct mechanistic evidence linking Groucho (Gro) to regulation of the early elongation checkpoint. Several reviewers suggested additional genomic experiments, including RNA-seq, PRO-seq, or Pol II ChIP approaches, to further examine transcription and pausing behaviour. However, we would like to flag up that genomic datasets addressing these questions across multiple Drosophila cell lines have already been published previously, including work from our own group and others.
  
  The primary objective of the current study is therefore not to replicate these existing genomic analyses, but rather to build directly upon them. We identify a consistent genomic association between Gro and pausing/elongation factors across cell types. Importantly, we extend these findings beyond genomic correlations through in vivo genetic analysis in the developing Drosophila wing.
  
  1. Description of the planned revisions
  
  *
  
  *
  
  Reviewer 1
  
  The figures and text could lay out the logic of the genetic interactions for non-Drosophila readers. For example, the comparison of single and double copies of Gro-RNAi to combinatorial knockdowns, when it is additive, and when it is interpreted as synergistic.
  
  The statistical analyses presented in Figure 5C, including Fisher’s exact tests comparing phenotype distributions between genotypes, were intended to address the distinction between additive and synergistic genetic interactions. However, we agree that the presentation of these comparisons could potentially be made clearer for readers less familiar with Drosophila genetic interaction assays. We would therefore be open to revising the presentation of Figure 5 and the accompanying explanatory text following editorial guidance and with consideration of the intended readership of the eventual journal.
  
  The statistical analysis of the phenotype distributions should be shown more clearly (Fig. 5B).
  
  Figure 5B is intended to present the distribution of observed phenotypic classes and does not include statistical comparisons. A similar analysis has been published for experiments looking at the phenotypes of moderate Groucho overexpression in the wing in the presence of HDAC inhibitors (Winkler et al., 2010 doi.10.1371/journal.pone.0010166). Statistical analyses of the genetic interaction experiments are presented separately in 5C. We therefore believe the current presentation of Figure 5B is appropriate for illustrating phenotype frequencies rather than statistical inference, but we will consider moving this panel to the Supplementary material.
  
  Minor comments
  
  -Figure 5 would gain clarity if the phenotype classes/panel letters were shown more clearly on the images. -The legends of the wing figures should be expanded, especially for readers outside the Drosophila field. -"in vivo" should be italicised consistently.
  
  We agree that clearer labelling of phenotype classes, panel annotations and expanded figure legends could improve the accessibility of Figure 5, particularly for readers less familiar with Drosophila wing phenotypes and genetic interaction assays. We would therefore be open to revising the presentation of this figure and its accompanying legends in a future revised version.
  
  We thank the reviewer for noting the typographical inconsistency of italics for in vivo. This will be corrected during manuscript revision and proofing.
  
  __Reviewer #2 __
  
  Reviewer #2 (Significance (Required)):
  
  I think this is nice little paper providing a simple, straightforward take-home message. It does not conceptually shake the world, and the evidence consists of (nice) correlations, with no direct proof put forward for the conclusions. I am not a Drosophila geneticist but probably rather an 'expert' on basic transcription mechanisms. I think the data in the paper are of high quality, if limited in scope, and that the conclusions are supported by the results, but I do not think the results or conclusions will have a big audience. Having said that, I found it interesting to learn about this group of repressors and their likely mode of action.
  
  On the other hand, it is worth emphasizing that proteins such as NELF and CDK9 would arguably be expected to be found at very many genes, as promoter-proximal pausing does exist at a plethora of genes, also genes that are house-keeping genes, ie not regulated by cell type or stimuli. So, lots of genes with pausing are not regulated by modulation of pausing. So, basically, the fact that knockdown of the repressor Groucho and loss of pausing is additive does not in my opinion necessarily mean that Groucho works by stabilizing pausing. Although it is admittedly a reasonably assumption, Groucho could also work by repressing transcription initiation; the genetic outcomes of 'double relief' would be the same, ie higher transcription levels. I think a brief comment to this effect might be appropriate, especially in the absence of (difficult to obtain) direct evidence that the transcription initiation step is not affected by Groucho.
  
  While we agree that the current study does not directly exclude possible effects of Groucho on transcription initiation, previously published work has already provided evidence arguing against repression by Groucho occurring primarily through inhibition of transcription initiation or prevention of pre-initiation complex assembly. Groucho-bound transcriptional start sites were previously shown to retain RNAP II occupancy, active chromatin features, and detectable basal transcriptional activity despite repression (Kaul et al., 2014).
  
  To acknowledge this possibility and explain why it is unlikely, we will add the sentence “While effects on transcription initiation cannot be completely excluded, previous work argues against Gro repressing transcription primarily through inhibition of transcription initiation. Gro-bound promoters remain accessible, overlap RNAP II occupancy, and retain active chromatin features and basal transcriptional activity” to the start of the third paragraph of the Discussion.
  
  Reviewer #3
  
  The methods section is lacking details on how ChIP-seq was performed in the BG3 cell line. The methods section does a good job of indicating how the data were processed. Information on the antibodies and conditions used is critical, as is whether spike-in controls were used.
  
  The generation of the ChIP-seq data from BG3 cells has already been published. __We will add the line “The production of ChIP-seq datasets for Gro binding in Kc167, S2R+ and BG3 cells has been described elsewhere (Kaul, Schuster and Jennings, 2014; Bar-Cohen et al., 2023)” in the Analysis of ChIP-seq data subsection of the Methods. __
  
  1. Description of analyses that authors prefer not to carry out
  
  *
  
  __Reviewer #1 __ Major comments 1. The main weakness is the lack of a mechanistic link between Gro and the early elongation checkpoint. This is really the main point for this reviewer. The manuscript builds an interesting model, and the data support a functional connection between Gro and pausing-related factors, but the mechanistic link is absent. At present, the paper relies on co-localisation of ChIP peaks and genetic interaction in vivo. This is interesting and supportive, but with several possible interpretations. The title and some parts of the text are thus a bit stronger than what is directly demonstrated. Two possibilities could be proposed: either tone down the mechanistic claim or strengthen it experimentally. A more direct assay of pause release or productive elongation after Gro depletion at endogenous targets would be highly valuable. For example, Gro-KD followed by Pol II Ser2-P ChIP, or promoter vs. gene body analysis on Gro-bound genes, ideally comparing genes with Gro at TSS vs. not-TSS, would greatly support the proposed model. If the assay is established, this seems feasible in about 4 months.
  
  We thank the reviewer for this thoughtful comment. We agree that the current study does not directly measure genome-wide RNAP II pause release following Gro depletion. However, several key observations linking Gro with promoter-proximal pausing have already been published and are summarised in the Introduction. Previous work demonstrated that Gro occupancy correlates with paused genes and that depletion of Gro reduces RNAP II pausing and increases elongating RNAP II at the endogenous E(spl)mbeta-HLH locus, an established target gene of Groucho-mediated repression (Kaul et al., 2014; doi.10.1371/journal.pgen.1004595). We also note that several of the experiments proposed by the reviewer have already been addressed in previous work. Specifically, Kaul et al. (2014) demonstrated that Gro depletion increases elongating RNAP II (Ser2-P) at the endogenous E(spl)mbeta-HLH locus while total promoter-associated RNAP II occupancy remains largely unchanged. Promoter versus gene body analyses in that study further supported a role for Gro in regulating progression through the early elongation checkpoint rather than transcription initiation.
  
  The aim of the current manuscript was therefore to build upon these earlier mechanistic and genomic observations by asking whether the relationship between Gro and pausing-associated factors extends across multiple cell types and whether it has functional significance in vivo. By integrating comparative genomic analyses with sensitised developmental genetic assays in the wing, we provide evidence that Gro functionally interacts with multiple regulators of the early elongation checkpoint during development.
  
  The bioinformatic part could be strengthened on "distinct TF repertoires" between cell types.The authors interpret the cell type-specific Gro recruitment as reflecting distinct transcription factor repertoires in BG3, Kc167 and S2R+ cells. This is interesting, but not really shown. To make this point more strongly, the author could provide a map of TF expression across different cell types, especially for the TFs corresponding to the enriched motifs they discuss. Otherwise, this remains speculative.In line, the manuscript discusses enriched motifs in BG3 and compares them to Kc167 and S2R+ cells, but this remains a bit descriptive. A clearer side-by-side comparison would strengthen the paper. This is particularly relevant to the motifs used in interpreting cell type-specific recruitment.
  
  The interpretation that cell type-specific Gro recruitment reflects differences in transcription factor repertoires is based on several previously established observations already described in the manuscript. BG3 cells are derived from the larval CNS, whereas Kc167 and S2R+ cells are embryonic haemocyte-like lines (Cherbas et al., 2011; doi.10.1101/gr.112961.110). Transcriptomic analyses have further shown that these Drosophila cell lines maintain stable and distinct lineage-associated transcriptional identities, including differences in transcription factor expression (Cherbas et al., 2011). Given the diversity of transcription factors known to recruit Gro, the observed cell-type-specific binding patterns and motif enrichments are consistent with the distinct lineage-associated transcriptional programmes previously described for these cell lines.
  
  Several overlap analyses could be discussed more in depth. A few statements feel too strong for the actual percentages. For example, the GAF overlap in BG3 is around 51% genome-wide and 56% at TSS, which is meaningful, but not especially high. The text already states that it is not universal, and this point could be discussed more clearly.
  
  We note that the manuscript already explicitly states that overlap between Gro and GAF is not universal. Given the diversity of factors known to recruit Gro and the broad genomic distribution of GAF, we consider overlap frequencies of approximately 50% to represent a substantial association, particularly at transcription start sites. Importantly, the interpretation does not rely on complete co-occupancy between these factors, but rather on the observation that Gro-bound regions show significant enrichment for multiple factors associated with promoter-proximal pausing across different cell types.
  
  Similarly, for the UpSet plot, the wording around the "most frequent" combination could be toned down, because this is not a dominant pattern.
  
  The statement that the overlap between Gro, Nelf-E, GAF, Cdk9 and RNAP II represents the “most frequent” combination refers specifically to the relative frequency of the intersection categories within the UpSet analysis. In this context, the overlap between all five factors represents the largest intersection category identified (306 of 649 Gro peaks), with the next most frequent category containing substantially fewer peaks (90 of 649). We therefore feel that the current wording accurately describes the distribution observed in the analysis.
  
  More generally, I think the manuscript needs a clearer quantitative breakdown of TSS versus non-TSS peaks for the overlap analyses with NELF, GAF, Cdk9 and CycT. Several interpretations depend on this distinction, and right now, this is not always clear enough.
  
  The overlap analyses presented in Figure 3 explicitly distinguish between TSS and non-TSS peaks, and the corresponding quantitative overlap frequencies are described in the Results section. We do not consider that additional breakdowns are required for interpretation of the current data as this distinction is already incorporated into both the analyses and figure presentation.
  
  The "enhancer chromatin" interpretation is interesting, but not fully integrated with the genomic distribution. The observation that Gro is enriched in open enhancer-type chromatin is interesting and supports the idea that Gro does not act mainly through classical repressed chromatin. However, Gro peaks are also enriched at promoters and introns, and this reviewer feels that the manuscript does not fully connect these observations. Where are these enhancer-type peaks located exactly? Are they often intronic? Can this be correlated with the distribution of Gro peaks? This would help the reader and also strengthen the discussion because intronic Gro peaks are present in the data, but are not well integrated into the model.
  
  In the current manuscript, “enhancer chromatin” refers to chromatin states defined by combinations of enhancer-associated histone modifications, including H3K4me1, H3K27ac and H3K56ac as defined by Skalska et al.,2015 (doi.10.15252/embj.201489923), rather than exclusively to distal intergenic regulatory regions. As described in the chromatin-state analysis, these enhancer-associated chromatin signatures do occur at intronic regulatory regions, including regions classified as active intron chromatin. We therefore do not consider the enrichment of Gro peaks at promoters, enhancers and intronic regions to be mutually exclusive observations within this framework.
  
  Intronic enhancer localisation is common in Drosophila, where the compact organisation of the genome results in many developmental regulatory elements residing within introns (Arnold et al., 2013; doi.10.1126/science.1232542). We therefore consider the presence of Gro peaks within intronic regions to be fully consistent with the observed enrichment of Gro binding within enhancer-associated chromatin states.
  
  The in vivo part is a strength, but some important points need clarification.The in vivo section is a clear highlight of the manuscript. It gives functional relevance to the model and moves the paper beyond cell-culture correlations. That said, a few points need to be clearer:-RNAi efficiency is not clear for the tested genes, especially the pausing factors. This is important because the differential effects between NELF subunits could simply reflect differences in knockdown efficiency.
  
  While differences in RNAi efficiency could potentially contribute to variation in phenotype strength between individual knockdowns, multiple biological explanations could also account for the differing effects observed between NELF subunits, including differences in protein stability, residual complex activity, or subunit-specific functions. Importantly, the central conclusion of the manuscript does not depend on quantitative comparison of phenotype strength between individual NELF components, but rather on the observation that perturbation of multiple pausing-associated factors genetically interacts with Gro in vivo.
  
  If RNAi validation is possible with existing reagents, this seems realistic within 3 months.
  
  The manuscript focuses on the genetic interactions observed between Gro and pausing-associated factors in vivo rather than on quantitative comparison between individual RNAi lines. As no specific validation experiments were proposed, we are not currently planning additional RNAi validation analyses for the present study.
  
  The discussion could be expanded, especially because the mechanism is not fully shown.Since the direct mechanism is still missing, the discussion could compensate. Right now, the proposed model is interesting, but it still leaves many open questions. For example:-Is Gro affecting the recruitment or activity of elongation factors?-Could looping or enhancer-promoter communication contribute?-How should the intronic Gro peaks be interpreted in the model?-In the wing, could the phenotype be discussed more mechanistically, in light of what is already known about Gro and derepression of vein-promoting genes?For example, a model figure could help here.
  
  We thank the reviewer for these thoughtful suggestions.
  
  Several of the points raised by the reviewer are discussed in the manuscript already. For example, we discuss the possibility that Gro influences the activity or recruitment of elongation-associated factors. We agree that enhancer-promoter communication and chromatin looping are a plausible component of this mechanism. As the Drosophila genome is compact and intronic enhancers are highly prevalent, topological looping provides a clear physical framework for how Gro molecules distributed at non-TSS sites regulate promoter-proximal machinery. Indeed, we have previously published this model (Kaul, Schuster, and Jennings, 2015; see Figure 1C; doi.10.1080/21541264.2014.1000709). Our current in vivo and genomic findings build directly upon this model, suggesting that within these established looped configurations, Gro acts locally to interface with and stabilize the pausing machinery.
  
  With respect to the wing phenotypes, the Discussion focuses primarily on the interpretation of the observed genetic interactions between Gro and pausing-associated factors rather than on defining the precise downstream target genes contributing to vein phenotypes. We agree that additional mechanistic dissection of these developmental phenotypes would be interesting. However, this would require a substantial expansion of the study into the detailed developmental and signalling mechanisms underlying vein specification, which lies beyond the primary focus of the current manuscript.
  
  OPTIONAL: It would be interesting to know whether the same peak distribution / functional logic is observed in mammalian TLE orthologs. This is not essential for the current conclusions, but it would broaden the impact.
  
  Determining whether similar genomic distributions and functional relationships are conserved for mammalian TLE orthologues will be an important future project. However, relatively little comparable genome-wide TLE occupancy data are currently available, meaning that such analyses would require a substantial independent undertaking beyond the scope of the present study.
  
  Minor comments -Please explain why promoters were defined as {plus minus}250 bp from the TSS. This seems rather narrow.
  
  Promoters were defined as ±250 bp from annotated transcription start sites. This window size is commonly used in Drosophila genomic studies, where the compact organisation of the genome means that broader windows frequently overlap adjacent genes.
  
  -Please clarify why S2R+ cells are included in the comparative part but are not followed in the same way in some downstream analyses.
  
  S2R+ cells were included in the comparative analyses to determine which aspects of Gro recruitment were shared across multiple cell types and which were cell-type specific. Some downstream analyses focused on BG3 and Kc167 cells because these lines had the most extensive corresponding datasets available for the chromatin and pausing-factor analyses performed in the current study.
  
  __Reviewer #3 __ Here Martínez Quiles and Jennings investigate the role of the Groucho repressor in BG3 cells. This extends a previous study that used S2R+ cells, published previously by one of the authors, as well as Kc167 cells. They find that Gro is recruited to gene promoters in a cell-type-specific manner. Gro associates with open chromatin, is mostly associated with enhancer regions, and is primarily excluded from regions of the genome that are repressed by Polycomb. After studying its function in cell culture, the authors investigate the role of Gro in a wing-specific background. The findings here are mostly correlative, showing that loss of Gro results in stronger phenotypic defects when combined with loss of factors including NELF-B or NELF-D, LARP7, and bin3. They propose that Gro acts to attenuate gene expression during early gene expression. This claim would be greatly strengthened if the authors provided RNA-seq data in addition to the ChIP-seq data shown in this manuscript, especially to examine gene expression patterns among the different cell lines studied here. At present, this is a correlative study that does not illuminate the mechanism of Gro in directly regulating promoter-proximal pausing or RNA polymerase behavior.
  
  We thank the reviewer for this suggestion. However, extensive transcriptomic analyses of Drosophila cell lines, including Kc167, S2R+ and BG3-derived lines, have already been published (Cherbas et al., 2011), together with RNA-seq analyses following Gro depletion (Kaul et al., 2014). In addition, the association between Gro occupancy and paused genes has also been reported previously (Kaul et al., 2014; Chambers et al., 2017; doi. 10.1186/s12864-017-3589-6).
  
  While additional RNA-seq analyses could further characterise transcriptional differences between cell lines, RNA-seq alone would not directly determine whether altered transcript levels arise specifically through changes in promoter-proximal pausing, as opposed to effects on transcription initiation, transcript stability, or indirect downstream regulatory effects. We therefore do not consider additional RNA-seq analyses necessary to support the central conclusions of the present study.
  
  Figure 2-3: For the ChIP-seq data, scale the y-axis in the same manner to better understand enrichment between the samples.
  
  These ChIP-seq datasets were generated independently using different antibodies and experimental conditions, direct comparison of enrichment magnitudes across datasets would not be biologically meaningful. Accordingly, our analyses focus on significant peak calls and overlap relationships rather than relative signal intensity. Applying identical y-axis scaling across all tracks would obscure significant enrichment in several datasets and could therefore be misleading.
  
  RNA-seq data between different cell lines would greatly enhance the authors findings or Pro-Seq to really show a relationship with Gro binding and promoter proximal pausing.
  
  We note that RNA-seq datasets for Gro depletion in Kc167 and S2R+ cells have already been published previously (Kaul et al., 2014), together with evidence linking Gro occupancy to paused genes (Kaul et al., 2014; Chambers et al., 2017). We therefore do not consider that additional RNA-seq analysis would substantially strengthen the central conclusions of the current manuscript.
  
  Moreover, RNA-seq alone cannot distinguish if altered transcript abundance reflects changes in promoter-proximal pausing from other mechanisms influencing transcript abundance. While PRO-seq approaches could provide further mechanistic information regarding RNAPII dynamics, such experiments are beyond the scope of the present study.
  
  This study helps to further clarify how Gro binds DNA in different cell types and indicates that may intersect with factors involved in promoter proximal pausing. The study is highly correlative and would require additional work to show a mechanistic link between Gro and transcription attenuation due to promoter proximal pausing.
  
  While we agree that PRO-seq approaches could provide additional mechanistic information regarding RNAPII dynamics, establishing an appropriate experimental and analytical framework for these analyses would require a substantial extension beyond the scope of the present study. In addition, several aspects of the relationship between Gro occupancy, transcriptional repression, and promoter-proximal pausing that underpin these suggestions have already been addressed in previously published work, including RNA-seq analyses following Gro depletion (Kaul et al., 2014), evidence linking Gro occupancy with paused genes (Kaul et al., 2014; Chambers et al., 2017), and studies demonstrating that Gro-mediated repression does not occur through inhibition of pre-initiation complex assembly. The current manuscript is therefore intended to build upon these existing findings by integrating comparative genomic analyses with new in vivo genetic interaction data.
  
  *
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2022.09.17.508372
www.medrxiv.org www.medrxiv.org

A natural experiment in Kenya reveals durable immunosuppressive effects of early childhood malaria: a longitudinal cohort study

1
1. Public_Reviews 21 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This important study sought to investigate the role that early childhood malaria exposure plays in the development of antibody responses to unrelated pathogens and vaccine-derived antigens in Kenyan children. In this natural experiment, the authors compare antibody levels among children who have been exposed to different levels of malaria transmission by using protein microarray technology. Although the findings are of importance, the evidence remains incomplete, and the analysis would benefit from a more in-depth evaluation of potential confounders. With the appropriate analysis, the findings will be of great interest for global health, immunology, and vaccine development.
  
  We thank the editors for highlighting the need for a more comprehensive evaluation of potential confounding. We agree that this is a critical aspect of the study and have now undertaken additional analyses to address this directly.
  
  The original longitudinal cohort was designed to investigate the acquisition of naturally acquired immunity to malaria and did not include systematic collection of anthropometric/nutritional, environmental or socioeconomic data, precluding direct adjustment for these factors within the primary dataset. However, to assess whether there were population-level differences in these factors, we leveraged contemporaneous hospital-based surveillance data from the same geographic regions, which includes measurements of anthropometry and nutritional status (muac, weight-for-age, and height-for-age) and detailed infection diagnostics.
  
  Using this independent dataset, we fitted mixed-effects regression models adjusting for age, calendar year, and concurrent infections (RSV, parainfluenza, influenza A, human metapneumovirus, OC43). For all three anthropometric indices, we found no evidence of systematic differences between children from Junju and Ngerenya. Adjusted differences were small and centred around zero (muac: −0.12, 95% CI −0.38 to 0.15, weight-for-age: −0.05, −0.28 to 0.19, height-for-age: 0.08, −0.17 to 0.33), with no consistent directional effect.
  
  As the longitudinal cohort was randomly selected from these underlying populations, these findings suggest that the groups were broadly comparable with respect to nutritional status and there were no differences in their exposure to the infections that were included in the analysis. We have incorporated these analyses into the revised manuscript, added a new figure focussed on this analysis -fig. 6, updated the statistical analysis and discussion sections), and believe they substantially strengthen the evidence by addressing a key source of potential confounding.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The study shows that childhood malaria can weaken the antibody response to other vaccines and infections. This suggests that early exposure to P. falciparum may have a long-lasting effect on immunity, with implications for vaccine efficacy in endemic areas.
  
  Strengths:
  
  This study stands out for its longitudinal design, the use of robust immunological techniques, and the comparison between areas with different levels of malaria exposure. Its findings reveal that early malaria can weaken the response to childhood vaccines, with important implications for public health in endemic regions.
  
  We thank the reviewer for this comment
  
  Weaknesses:
  
  One of the study's main limitations is the lack of functional data confirming the clinical impact of the low antibody levels. Furthermore, although multiple immune responses were measured, other important components, such as cellular immunity, were not assessed. Furthermore, the results may not be generalizable to other regions.
  
  We thank the reviewer for this important comment and agree that the absence of functional immunological assays is a limitation of the current study. Our analysis was designed to determine whether early-life malaria exposure is associated with durable alterations in antibody responses to unrelated pathogens and vaccine antigens, rather than to establish the downstream functional consequences of these differences. As such, the study is able to demonstrate a broad and persistent attenuation of humoral responses but cannot directly determine whether the lower antibody levels observed translate into reduced neutralising capacity or diminished protection at the individual level.
  
  We have revised the manuscript to make this distinction more explicit. In the revised discussion, we now state that although reduced antibody titres to vaccine-preventable pathogens may have implications for long-term protection, the clinical significance of these differences remains to be established in future studies incorporating functional assays and clinical outcome data.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors investigated whether early-life malaria exposure has long-term effects on immune responses to unrelated antigens. They leveraged a natural experiment in coastal Kenya where two adjacent communities (Junju and Ngerenya) experienced divergent malaria transmission patterns after 2004. Using 15 years of longitudinal data from 123 children with weekly malaria surveillance and annual serological sampling, they measured antibody responses to multiple pathogens using a protein microarray technology and ELISA.
  
  Strengths:
  
  (1) Extensive longitudinal data collection with weekly malaria surveillance, enabling precise exposure classification.
  
  (2) Use of a natural experiment design that allows for causal inference about malaria's immunological effects.
  
  (3) Broad panel of antigens tested, demonstrating generalized rather than antigen-specific effects.
  
  (4) Within-cohort analysis in Ngerenya controls for geographic and environmental factors.
  
  (5) Validation of key findings using both serologic microarray and ELISA.
  
  (6) Important public health implications for vaccine strategies in malaria-endemic regions.
  
  We thank the reviewer for these comments
  
  Weaknesses:
  
  (1) Lack of participants' characteristics (socio-economic, nutritional, physical).
  
  We thank the reviewer for this important comment. We have now included a detailed summary of participant characteristics in Table 1to provide context for the study population. This includes key demographic and longitudinal variables stratified by cohort (Junju and Ngerenya), including sex distribution, age at study entry and exit, duration of follow-up, number of visits per participant, and total serum samples analysed. Detailed data on socio-economic status, nutritional status, and other environmental or physical characteristics were not consistently available across all participants and time points, and therefore could not be included. This has now been explicitly stated as a limitation in the discussion.
  
  (2) Somewhat limited sample size (longitudinal analysis of 123 children total), with further subdivision reducing statistical power for some analyses.
  
  We thank the reviewer for this important observation. The study is based on an intensively followed cohort with weekly malaria surveillance and repeated serological measurements throughout childhood, allowing detailed characterisation of early-life exposure and subsequent immune trajectories. This depth of longitudinal sampling provides resolution that is not achievable in larger cross-sectional studies. We acknowledge that subdivision of the cohort reduces statistical power for some analyses. Nevertheless, the key findings were consistent in several independent comparisons, including a reduction in antibody levels for broad panel of antigens in the malaria endemic setting, within-cohort analyses in Ngerenya that replicated this observation, and the confirmation of results generated on the protein microarray on the ELISA platform. The consistency of these findings across analytical approaches and measurement platforms reduces the likelihood that the observed effects are driven by small-sample variability. We have clarified this point in the revised discussion to emphasise that the strength of the study lies in the depth and longitudinal resolution of the data rather than the absolute sample size.
  
  (3) Potential confounding by unmeasured socioeconomic, nutritional, or environmental factors between communities.
  
  We thank the reviewer for this important point and agree that residual confounding between communities must be considered. As outlined in reponse to the editorial assesment, we have undertaken additional analyses using contemporaneous population-level data from the same regions and found no evidence of systematic differences in anthropometric indices between children from Junju and Ngerenya after accounting for age, calendar year, and concurrent infections, with effect estimates small and crossing zer. In addition, the within-Ngerenya analysis provides an internal comparison within a shared geographic and environmental setting, reducing the likelihood that unmeasured socioeconomic or environmental differences between communities account for the observed associations. The new analyses suggest that major population-level differences in nutritional status or infection burden are unlikely to explain the observed patterns. We have clarified this point in the revised discussion and explicitly acknowledge the possibility of residual confounding.
  
  (4) Lack of ability to determine the direction of the associations found between malaria exposure and other IgG levels to unrelated pathogens.
  
  We agree that, as an observational study, our analysis cannot definitively establish the direction of the association between malaria exposure and antibody responses to unrelated antigens. However, several features of the study design strengthen the inference that early-life malaria exposure contributes to the observed differences. First, malaria exposure was characterised prospectively through intensive weekly surveillance, allowing precise temporal definition of exposure in early childhood. Second, within the Ngerenya cohort, where children were exposed to different levels of malaria due to a rapid decline in transmission, those with even limited early-life exposure exhibited lower antibody responses at 10 years of age than malaria-naïve peers, despite residing in the same geographic and environmental context. In addition, we now show that these differences are not confined to a single timepoint but are evident across the full longitudinal follow-up after adjustment for age and repeated measurements. While we cannot exclude the possibility of residual confounding or bidirectional relationships, the convergence of evidence from the natural experiment design, within-cohort contrasts, and age-adjusted longitudinal analyses supports early-life malaria exposure as a key contributor to long-term differences in antibody responses. We have clarified this in the discussion.
  
  (5) Despite good longitudinal data, the main analysis was conducted as a cross-sectional analysis at age 10 for many comparisons, which limits the understanding of temporal dynamics.
  
  We thank the reviewer for highlighting this point. While age 10 was initially used as a standardised reference point for cross-sectional comparisons, the underlying dataset is longitudinal, with repeated antibody measurements across childhood. To address this more directly, we have now complemented these analyses with antigen-specific mixed-effects regression models incorporating all available longitudinal data, with adjustment for age and a random intercept for repeated measurements within individuals. These models demonstrate that the differences between cohorts are not confined to the age-10 cross-section but are evident in an age-adjusted longitudinal framework for multiple antigens. We have retained the age-10 comparisons for reference, but the primary inference is now based on the longitudinal mixed-effects analyses. These changes are reflected in the revised results and statistical analysis sections. We thank the reviewer for this astute point, which we think has substantially improved the manuscript.
  
  (6) Statistical analysis is limited to univariable comparisons without consideration for confounders or adjusting for multiple comparisons.
  
  We agree that the original analyses relied primarily on univariable comparisons. In the revised manuscript, we have extended the analytical framework to include mixed-effects regression models that account for age effects and repeated measurements within individuals. These models estimate the average age-adjusted difference in antibody responses between cohorts across the full follow-up period. We have also applied false discovery rate (FDR) correction to account for multiple antigen testing. For multiple antigens, the direction and magnitude of cohort differences remain consistent under this approach, strengthening the robustness of the findings beyond the original univariable comparisons. These analyses have been incorporated into the revised results and statistical analysis sections.
  
  (7) No mechanistic understanding of how early malaria exposure creates lasting immunosuppression.
  
  We agree that this study does not directly resolve the mechanistic basis underlying the observed long-term differences in antibody responses. The primary aim of this work was to identify and characterise durable alterations in humoral immune profiles associated with early-life malaria exposure, rather than to define the cellular or molecular pathways involved. However, our findings are consistent with a growing body of experimental and clinical literature suggesting that malaria infection can induce sustained perturbations in B cell and T cell compartments, including the expansion of atypical memory B cells, altered germinal centre responses, and increased regulatory immune activity. These mechanisms have been proposed to impair the generation and maintenance of effective humoral immunity. In the revised discussion, we have clarified that the mechanistic basis of this phenomenon remains to be fully defined and have expanded the discussion of plausible pathways in light of existing literature. We now explicitly position our findings as providing population-level evidence of a durable immunological phenotype that warrants further mechanistic investigation.
  
  (8) No understanding of the clinical Implications of the reduced IgG levels observed in the area with high malaria exposure.
  
  We agree that this study does not directly establish the clinical consequences of the reduced antibody levels observed in malaria-exposed children. The primary objective of this study was to characterise long-term differences in humoral immune profiles associated with early-life malaria exposure, rather than to assess downstream clinical outcomes or functional antibody activity. We have clarified this limitation in the revised discussion. Nevertheless, the breadth and consistency of the observed differences for multiple vaccine-preventable and infectious antigens raise the possibility that early-life malaria exposure may have implications for long-term immune protection. We now emphasise in the revised discussion that future studies incorporating functional assays and clinical outcome data will be required to determine whether these serological differences translate into altered susceptibility to infection or reduced vaccine effectiveness.
  
  Assessment of Claims:
  
  The data appear to support the authors' primary claims, but the strength of the evidence is limited, and the results should be interpreted with caution. Together with the currently available evidence of P. falciparum's impact on the host's immune function, this natural experiment design provides further evidence for a relationship between early malaria exposure and reduced antibody responses. The within-Ngerenya analysis controls for geographic factors and thus enhances the quality of the evidence, however, it still fails to account for the physical, nutritional, and socio-economic factors that may have driven the observed changes. Additionally, the mechanism underlying this effect remains unclear, and the clinical significance of reduced antibody levels is not established.
  
  We thank the reviewer for this assessment and for recognising the strengths of the natural experiment design and within-cohort analyses. We agree that, as an observational study, our findings should be interpreted appropriately. In the revised manuscript, we have undertaken additional analyses and clarifications to strengthen the evidential basis of our conclusions and to address the points raised. To address potential confounding by nutritional and related factors, we analysed contemporaneous hospital-based surveillance data from the same geographic populations since nutritional and socioeconomic variables were not consistently collected during the course of longitudinal follow up. For three independent anthropometric indices of nutrition status (muac, weight-for-age, and height-for-age), we found no evidence of systematic differences between children from Junju and Ngerenya after adjustment for age, calendar year, and concurrent infections. As the longitudinal cohort subjects were randomly drawn from these populations, these findings suggest that the two groups were broadly comparable with respect to early-life growth and nutritional status.
  
  We agree that the mechanistic basis of the observed differences is not resolved in this observational study. We have clarified this point in the revised discussion and expanded our consideration of plausible biological pathways based on existing literature, including perturbations in B cell and T cell compartments. Similarly, we now explicitly state that the clinical implications of reduced antibody levels remain to be determined and will require studies incorporating functional assays and clinical outcomes. We believe these revisions strengthen the manuscript by providing a more comprehensive interpretation of the data.
  
  Impact and Utility:
  
  This work has fundamental implications for understanding vaccine effectiveness in malaria-endemic regions and may contribute to informing vaccination strategies. The findings, if strengthened, would suggest that children in areas of high malaria transmission may require modified immunization approaches. The dataset provides a valuable resource for future studies of malaria's immunological legacy.
  
  We thank the reviewer for this comment
  
  Context:
  
  This study builds on prior work showing acute immunosuppressive effects of malaria but uniquely attempts to demonstrate the durability of these effects years after exposure. The natural experiment design addresses limitations of previous observational studies by providing a more controlled comparison.
  
  We thank the reviewer for this comment
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  We suggest that further analyses of potential confounders such as anthropometric indices, socioeconomic status, and comorbidities would render the evidence more robust.
  
  We thank the Reviewing Editor for this important suggestion. We agree that careful consideration of potential confounding factors is critical to the interpretation of these findings, and have undertaken additional analyses to address this.
  
  Because anthropometric and related socioeconomic measurements were not collected systematically within the original longitudinal malaria cohort, we assessed potential population-level differences using hospital-based surveillance data from the same geographic regions. This dataset includes measurements of anthropometry (mid-upper arm circumference, weight-for-age, and height-for-age) as well as detailed infection diagnostics in childhood. Using these data, we fitted regression models adjusting for age, calendar year, and concurrent, clinically diagnosed infections. For all three anthropometric indices, we found no evidence of systematic differences between children from Junju and Ngerenya, with effect estimates small and crossing zero (fig. 6). As the longitudinal cohorts were randomly selected from these populations, these findings suggest that the groups were broadly comparable with respect to nutritional status and infection exposure. With respect to socioeconomic status and comorbidities, detailed individual-level data were not available within the longitudinal cohort. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic and environmental setting, provides a complementary control for these factors. We have incorporated these additional analyses and clarifications into the revised manuscript statistical analysis, discussion lines and believe they strengthen the robustness of the findings by addressing key sources of potential confounding.
  
  Reviewer #1 (Recommendations for the authors):
  
  The manuscript is well written, with clear and informative figures that effectively support the findings.
  
  We thank the reviewer for this comment
  
  Suggestions:
  
  (1) Although the study well controlled for malaria exposure, other environmental or infectious factors that influence immunity could be considered:
  
  Nutritional status in childhood (malnutrition impacts immune response), co-infections (helminths, respiratory viruses), socioeconomic differences, or differences in access to health services, even minimal, between Junju and Ngerenya.
  
  We thank the reviewer for highlighting the potential influence of environmental, infectious, and socioeconomic factors on immune responses. We agree that these are important considerations in the interpretation of observational data. To address nutritional status and concurrent infectious exposures, we analysed contemporaneous hospital-based surveillance data from the same geographic populations. This dataset includes measurements of anthropometric indices (mid-upper arm circumference, weight-for-age, and height-for-age) alongside detailed clinical diagnostics for common childhood infections. Using regression models adjusting for age, calendar year, and concurrent infections, we found no evidence of systematic differences in anthropometric profiles between children from Junju and Ngerenya (fig. 6). These findings suggest that the populations from which the longitudinal cohorts were randomly selected were comparable with regard to early-life growth and nutritional status. We agree that we cannot fully exclude the influence of unmeasured factors such as helminth infections, socioeconomic variation, or subtle differences in healthcare access. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic, environmental, and healthcare setting, provides an internal control for many of these factors. The persistence of similar patterns within this setting supports malaria exposure as a key contributor of the observed differences. We have clarified these considerations in the revised discussion and believe that, the additional analyses and within-cohort comparisons strengthen the robustness of our conclusions while acknowledging the limitations inherent to observational studies.
  
  (2) Measurement of other immunological markers:
  
  In addition to IgG, include: B cell subpopulations (naive, memory, atypical), cytokine levels (IL-10, IFN-γ) to characterize the immunological microenvironment.
  
  You could include these recommendations in the text for future studies.
  
  We thank the reviewer for this thoughtful suggestion. We agree that detailed immunophenotyping, including characterisation of B cell subpopulations and cytokine profiles, would provide important insight into the mechanisms underlying the observed differences in antibody responses. In the revised manuscript, we have expanded the discussion to highlight these important avenues for future work, including the potential role of altered B cell subsets (and regulatory or inflammatory cytokine environments in shaping long-term humoral responses).
  
  Reviewer #2 (Recommendations for the authors):
  
  The manuscript is well-written.
  
  We thank the reviewer for this comment
  
  (1) Methodological Clarifications:
  
  Do the authors have any information regarding the characteristics of these children that could be of use in understanding their immune responses better? (e.g., weight, height, BMI, socioeconomic status, HB level, access to health care, etc.).
  
  We thank the reviewer for highlighting the importance of participant characteristics in interpreting immune responses. Anthropometric and related clinical measures were not collected systematically within the original longitudinal malaria cohort, as the study was designed to investigate the acquisition of naturally acquired immunity to malaria.
  
  To address this, we analysed contemporaneous hospital-based surveillance data from the same geographic populations, which include measurements of anthropometric indices (mid-upper arm circumference, weight-for-age, and height-for-age) alongside detailed infection diagnostics. Using regression models adjusting for age, calendar year, and concurrent infections, we found no evidence of systematic differences in anthropometric profiles between children from Junju and Ngerenya (fig. 6) Detailed individual-level data on socioeconomic status, haemoglobin levels, and healthcare access were not available within the longitudinal cohort impeding direct adjustment in the longitudinal cohorts. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic and healthcare setting, provides an internal control for many of these factors. These considerations are now clarified in the revised discussion.
  
  Could the authors provide more detailed statistical analysis, including power calculations and multiple comparison corrections?
  
  In the revised manuscript, we have extended the statistical analysis and now include antigen-specific mixed-effects regression models incorporating all available longitudinal measurements, which is comprehensively described in the statistical analysis section. We have also applied false discovery rate (FDR) correction to account for multiple testing across antigens, and report both unadjusted and FDR-adjusted significance in the revised results. With respect to power, the sample size was determined by the number of children meeting inclusion criteria within the long-term surveillance cohorts in terms of availability of a sufficient number of longitudinal samples. We have clarified this in the revised manuscript.
  
  Clarify the criteria for selecting the 123-child subset from the larger surveillance cohorts.
  
  We thank the reviewer for this comment. The 123 children included in this analysis were selected from the larger surveillance cohorts based on the availability of sufficiently dense longitudinal serum sampling as described above. Specifically, children were required to have at least eight longitudinal samples available in the archive, enabling robust assessment of within-individual antibody trends over time. This criterion was applied to ensure adequate temporal resolution to examine the long-term stability of malaria-associated effects on antibody responses. Children with fewer available samples were therefore excluded, as limited sampling would not allow reliable characterisation of longitudinal patterns. We have clarified these inclusion criteria in the revised manuscript.
  
  (2) Additional Analyses and Data Presentation:
  
  The authors could consider dose-response analyses relating malaria episode frequency/timing to degree of immunosuppression or even AMA-1 IgG levels and degree of immunosuppression. How do they associate over time?
  
  We thank the reviewer for this suggestion. To address this, we examined the relationship between malaria exposure (using cumulative febrile malaria episode count derived from longitudinal surveillance data) and the magnitude of heterologous antibody responses. In mixed-effects models adjusting for age and repeated antibody measurements, higher malaria episode burden was associated with lower antibody responses against multiple antigens (fig 7).
  
  Analyze whether the effects vary by specific age at malaria exposure.
  
  We agree that age at exposure is an important consideration. We have now assessed how the relationship between malaria burden and antibody responses varies with age by including age as a non-linear term and modelling interactions between malaria exposure and age as described above. These analyses did not suggest substantial heterogeneity in the association over age, and therefore we have retained the simpler presentation for clarity.
  
  Provide correlation analyses between different antibody responses to assess whether suppression is generalized.
  
  We have addressed this by modelling responses jointly across a panel of heterologous antigens and by examining antigen-specific associations. The direction of effect was consistent for the majority of antigens, with no evidence of opposing trends, supporting a broad rather than antigen-specific effect.
  
  The authors could consider moving Figures 2a and b to the supplementary material.
  
  We thank the reviewer for this suggestion. We carefully considered whether panels 2a and 2b could be moved to the supplementary material. However, we have retained them in the main text because they provide a simple, intuitive illustration of how AMA1 antibody responses track with malaria exposure at the individual level, complementing the population-level analysis shown in fig. 2c. We feel that this helps establish the biological validity of the microarray platform in a way that is immediately interpretable to the reader, and therefore supports the interpretation of subsequent analyses.
  
  The authors could consider replacing Figures 3a and b with IgG levels from ALL vaccinated children and ALL non-vaccinated children.
  
  We thank the reviewer for this suggestion. We would like to retain these figures for the same reasons that have been articulated above for figures 2a and b.
  
  (3) Discussion Enhancements:
  
  The authors should consider expanding the discussion to address the limitations of the data more thoroughly, particularly regarding the potential differences between cohorts that could have contributed to the results.
  
  We have expanded the discussion to more explicitly address potential differences between cohorts that could contribute to the observed findings, including nutritional, socioeconomic, and environmental factors.
  
  The discussion needs to acknowledge the lack of directionality for the associations observed. As stated above, although I agree in general terms with the observations that the authors have made, it is not possible to distinguish between a suppressive effect of malaria on immune responses to infection-derived pathogens or a protective effect of malaria that leads to less exposure to infection-derived pathogens (and consequently lower IgG levels). The mechanisms behind these could include things like different health-seeking behaviors or social interactions from kids who have malaria versus those who don't, for example.
  
  We agree that, as an observational study, we cannot definitively establish the direction of the association between malaria exposure and antibody responses to unrelated antigens. We have now clarified this limitation explicitly in the discussion. We acknowledge the alternative interpretations raised by the reviewer, including the possibility that differences in exposure to other pathogens, potentially driven by behavioural, environmental or healthcare-related factors, could contribute to the observed patterns. At the same time, we note that the natural experiment design, prospective malaria exposure classification, and within-Ngerenya comparisons support early-life malaria exposure as a key contributing factor. We have revised the discussion to reflect this balance.
  
  Extend the discussion of potential biological mechanisms underlying durable immunosuppression.
  
  We thank the reviewer for this suggestion. We have expanded the discussion to more fully consider potential biological mechanisms that could underlie the observed long-term differences in antibody responses. Specifically, we now discuss evidence from prior studies indicating that malaria infection can induce sustained alterations in B cell and T cell compartments, including expansion of atypical memory B cells, disruption of germinal centre responses, and increased regulatory immune activity. We position our findings as providing population-level evidence of a durable immunological phenotype, while noting that targeted mechanistic studies will be required to define the underlying pathways.
  
  Extend the discussion around the clinical implications of the observed antibody level differences.
  
  In the revised discussion, we highlight that studies incorporating functional assays and clinical outcome data will be required to determine whether these serological differences translate into altered susceptibility to infection or reduced vaccine effectiveness.
  
  (4) Technical Issues:
  
  Could the authors please:
  
  (1) Clarify microarray data processing and quality control procedures.
  
  We thank the reviewer for this request. We have expanded the methods section to provide additional detail on microarray data processing and quality control procedures.
  
  (2) Provide information on inter-assay variability and batch effects.
  
  We have expanded the methods section to clarify how these were evaluated and addressed. Inter-assay variability was monitored using pooled adult serum included on every slide as a consistent positive control. This allowed us to assess slide-to-slide consistency in signal detection across the full antigen panel. In addition, fluorophore-conjugated IgG and IgA controls were printed directly onto each miniarray to confirm scanner performance independently of antigen–antibody interactions. At the sample level, each specimen was assayed on two independent miniarrays per slide, generating four spatially separated replicate measurements per antigen. Technical variability was quantified using the coefficient of variation (CV), and measurements with CV >20% were excluded from downstream analyses.
  
  (3) Include details on how missing data were handled in longitudinal analyses.
  
  We thank the reviewer for highlighting this point. We have added clarification in the statistical analysis section describing how missing data were handled. Specifically, mixed-effects models were used, which accommodate unbalanced longitudinal data without requiring imputation, allowing all available observations to contribute to the analysis.
  
  (4) Include details of the parameters of the LOWESS analysis shown in Figure 1.
  
  We have expanded the figure 1 legend to include the parameters used for the loess smoothing shown, including the smoothing span.
  
  (5) Include details of the samples used for Figure 3d (Negative and Pooled Adult Serum).
  
  We have clarified in the methods the nature and purpose of the samples used in Figure 3d. The negative control consisted of phosphate-buffered saline applied to a full miniarray in place of serum, allowing assessment of background and non-specific signal in the absence of antibody binding. The pooled adult serum comprised a composite of sera from multiple healthy adults from the same setting and was included as a positive reference sample, expected to contain a broad repertoire of antigen-specific antibodies. These controls were included on each slide to enable interpretation of assay performance, with the negative control defining baseline signal and the pooled adult serum providing a consistent reference for antigen recognition across the microarray.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2025.05.26.25328345v3
www.biorxiv.org www.biorxiv.org

Genome reorganization and its functional impact during breast cancer progression

1
1. Public_Reviews 21 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Strengths:
  
  This work sets a benchmark for integrative 3D genomics in oncology. Its methodological sophistication and conceptual advances establish a new paradigm for studying nuclear architecture in disease.
  
  We appreciate the very kind words.
  
  Weaknesses:
  
  Major Issues
  
  (1) Functional tests would strengthen the observed links between structure and gene changes. For example, the COL12A1 gene loop formation correlates with its increased expression. Disrupting this loop using CRISPR-dCas9 at chr6 position 75280 kb could prove whether the loop causes COL12A1 activation. Such experiments would turn strong correlations into clear mechanisms.
  
  We agree that targeted disruption of specific loops such as COL12A1 will be important for functional validation of the causal relationships between enhancer-promoter loop formation/dissipation and changes in gene expression. However, the intent of our current study was to profile changes in genome organization at a global scale to deduce general features of cancer progression-associated changes in genome organization, rather than to explore specific loop interactions. The current findings are a foundation for more targeted functional follow-up studies.
  
  (2) The H3K27ac looping idea needs deeper validation. Data suggests H3K27ac loss weakens loops without affecting CTCF. Testing how cohesin proteins interact with H3K27acmodified sites would clarify this process. Degron systems could rapidly remove H3K27ac to observe real-time effects. Also, the AP-1 motifs found at dynamic loop sites deserve functional tests. Knocking down AP-1 factors might show if they control loop formation.
  
  We agree that modulating histone modifications or transcription factors would provide insights into the underlying mechanisms driving the changes we observed. However, such studies utilizing degrons or small molecule inhibitors that globally knock down either H3K27ac or specific transcription factors are often difficult to interpret. For example, assessing the role of AP-1 factors, as suggested, would be complicated by the variety of AP-1 proteins. In addition, H3K27ac reduction could inhibit loop strength either directly (i.e. by reducing cohesin recruitment) or indirectly (i.e. by reducing gene expression which could in turn affect loop strength). Parsing out the exact relationships between these features will require extensive follow-up work and falls outside of the scope of the current study.
  
  (3) Connecting findings to patient data would boost clinical relevance. The MCF10 model is excellent for controlled studies. Checking if TAD boundary weakening occurs in actual patient metastases would show real-world importance. Comparing primary and metastatic tumor samples from the same patients could reveal new structural biomarkers. If tissue is scarce, testing cancer cells with added stroma cells might mimic tumor environment effects.
  
  We have leveraged publicly available datasets to link the observations from the progression model to clinical samples. Specifically, we have compared our datasets to chromatin organization data in non-cancerous mammary epithelial cells (HMEC), five cell lines representing distinct cancer subtypes ranging from less (luminal) to more aggressive (triple negative, TNBC), as well as tissue samples from TNBC patients with contralateral normal controls. We explored the conservation of both loops and TADs identified in the MCF10 progression system in each of these maps, paying particular attention to how features that are differential between MCF10 cells differ across other cancer cell types. We observe a high degree of conservation of static loops and TAD boundaries among the cancer samples, as well as some degree of cell-specific changes among loops and boundaries that change during MCF10 progression. These findings are included in Supplemental Figures 3 and 4 and are discussed on page 7.
  
  Minor Issues
  
  (1) Adding a clear definition for static loops would help readers. For example, state that static loops show less than 10 percent contact change across replicates.
  
  Static loops are defined as loops with a fold-change of 1.5 or more between any two MCF10 cell lines and an adjusted p-value of less than 0.025 considering change across biological and technical replicates. This definition is stated on page 6).
  
  (2) In the ABC model analysis, removing promoter regions from the enhancer list would focus results on true long-range interactions.
  
  The ABC model already excludes the promoter of each gene. Only self-promoters are excluded, whereas the model allows promoters of other genes to act as potential long-range enhancers of the target gene. We have added text to make this clear (see page 11).
  
  (3) Briefly noting why this study sees TAD weakening while other cancer types show different patterns would provide useful context.
  
  The biological reason for TAD weakening in the MCF10 model is not known, but neither the mechanism for boundary weakening nor the reason for apparently different behavior amongst cancers is known. We expanded the text on this discussion slightly, but we refrain from making any definitive claims. We do note that differences in the types of cancer studied or the methods used for detecting changes in TADs (i.e. different sensitivities and thresholds for detecting change) could be responsible (see page 15). We also mention that the loss of insulation at many TAD boundaries detected in our study are subtle changes in intensity that could be potentially missed if using methods tailored to find more drastic changes in TAD architecture.
  
  Reviewer #2 (Public review):
  
  While the conclusions are broadly supported, methodological and analytical refinements are required.
  
  We appreciate these comments.
  
  (1) Model representativeness. The long-term culture-adapted MCF10 genome harbours extensive aneuploidies and translocations. Validation of key COL12A1/WNT5A loop dynamics in an independent breast-cancer line (e.g., MDA-MB-231, T47D) or in patientderived organoids/PDX models would strengthen generalizability.
  
  Although the generation of Micro-C datasets in additional cell lines is outside of the scope of this study, we used publicly available Hi-C data from triple negative breast cancer (TNBC) progression and patient samples (Kim, Han & Chun et al. 2022) to assess generalizability of the MCF10 model findings. While these maps are lower resolution than the Micro-C maps used in our study, they are of sufficient depth to detect loops at a similar resolution (10 kb). We report these findings in Supplemental Figures 3 and 4 and discuss them on page 7.
  
  We find that chromatin loops and TAD boundaries detected across the MCF10 system are highly conserved across all other mammary epithelial lines studied. Chromatin loops that were more prominent in MCF10AT1 and MCF10CA1a lines were also significantly stronger in TNBC cells. Insulation score boundaries that were weakened in MCF10CA1a showed strong insulation across all cell lines in TNBC. These findings highlight that different model systems indeed have distinct profiles of structural change, just as they have distinct gene expression profiles.
  
  It is worth noting that direct comparison at individual loci is complicated by variations in gene expression profiles between the MCF10 model and the TNBC progression model; for example, COL12A1 is not significantly upregulated between normal and TNBC tissues in this study (unlike in the TCGA-BRCA data) and is downregulated between HMEC and TNBC cell lines. Regardless, our analysis provides some indication of conserved and divergent features in the various model systems.
  
  (2) The study remains purely correlative; no perturbation experiments are conducted to demonstrate causal roles of chromatin loops on gene expression. CRISPR interference (CRISPR-Cas9-KRAB/HDAC) or enhancer deletion/inversion should be applied to 3-5 pivotal loops (e.g., COL12A1, WNT5A) to test their impact on target-gene expression and cellular phenotypes (e.g., proliferation, migration).
  
  We agree that targeted disruption of specific loops such as COL12A1 will be important for understanding the causal relationships between enhancer-promoter loop formation/dissipation and changes in gene expression. However, the intent of our current study was to profile changes in genome organization at a global scale to deduce general features of cancer progression-associated changes in genome organization, rather than exploring specific loop interactions. The current findings are a foundation for more targeted follow-up functional studies.
  
  (3) The manuscript lacks integration with clinical datasets. Integrate TCGA-BRCA data to assess whether elevated COL12A1/WNT5A expression associates with overall survival (OS) or distant metastasis-free survival (DMFS)
  
  To assess clinical significance of specific loci, we have queried expression of all differentially expressed genes in the MCF10 progression system among TCGA-BRCA expression data. We summarize our findings in Supp. Fig. 5E and discuss them on page 8.
  
  We found that roughly 25% of genes that change in our model also change significantly in breast cancer, but only roughly half of those genes change in the same direction (i.e. up-regulated in MCF10CA1a vs MCF10A, and up-regulated in tumor vs normal samples). Interestingly, there was a higher degree of directional agreement between latechanging genes (i.e. genes that change in MCF10CA1a compared to MCF10A and MCF10AT1) than early-changing genes (i.e. genes that change in MCF10AT1 and MCF10CA1a compared to MCF10A).
  
  We have also explored the impact of select highlighted genes on overall survival (OS). We present these data in Supp. Fig. 6 and discuss it on page 8. While not all genes showcased in this study have a significant impact on overall survival, most trend in the same direction as their differential expression would suggest (i.e. genes more highly expressed in cancer vs tumor also have a hazard ratio above 1).
  
  Reviewer #3 (Public review):
  
  The differential topology analysis and its integration with transcription is very well done- one of the best versions of this I have read in the 3D genome field!
  
  We appreciate the reviewers’ endorsement.
  
  However, the paper is framed largely as a cancer biology study, and it teaches us much less about this. I am worried that some of the trends for each topologic feature are not going to be consistent across the pre-malignant-malignant-metastatic spectrum and would like the authors to soften some of their claims a bit regarding how this clarifies our understanding of cancer evolution.
  
  We agree that the strength of the study lies in its deep mapping of chromatin architecture and the landscape of enhancers and differentially expressed genes, which we hope to use to better understand the relationship between chromatin structure and gene expression, regardless of their cancer relevance. To better relate the findings in the progression system to cancer, we have added new data from direct comparisons of the MCF10 progression system with multiple patient-derived cancer cell lines and cancer tissues. These data are shown in Supp. Fig. 3 and 4 and discussed on p. 7. Regardless, we have softened the claims regarding cancer progression throughout the manuscript.
  
  Weaknesses:
  
  Major Concerns:
  
  (1) The integration of gene expression and chromatin loops is intriguing. The authors' differential analysis, however, omits consideration of genes that are on and simply further upregulated versus genes that transition on/off or off/on. It would be nice to see the authors break out looping patterns for these two different patterns of regulation, as it may be instructive regarding the rules for how EP loops govern transcription.
  
  To address different types of gene expression patterns, we analyzed 108 genes that went from an unexpressed or “off” state (2 or fewer read counts) in one cell line to an expressed “on” state (100 or more read counts) in another, and 111 genes that go from “on” to “high” (1000 or more read counts). We present these data in Supp. Fig. 8 and discuss the findings on page 9. While neither of these genes were enriched for differential loops, a large number overlap with loop anchors. We found a relationship between loop strength and gene expression levels; genes that are more strongly expressed are more likely to overlap with the anchor of a chromatin loop. All gene sets show similar strong trends at distal regulatory regions.
  
  (2) Given the paucity of differential loops at the majority of genes whose expression changes, the authors should examine chromatin subcompartments, as these may associate more with differential transcription.
  
  We present subcompartment analysis in Supp. Fig. 9. Our CALDER compartment calls are qualitative rather than quantitative, so to explore this we examined how compartments change genome-wide and at specific promoters. We show these data in Supp. Fig. 9 and discuss the findings on page 10-11. We see that between any two cell types, a majority of changes occur between closely related subcompartments, i.e. from A.2.2 to A.2.1 (1 step more A-like) or B.1.1 (1 step more B-like). The promoters of differentially expressed genes have minimal subcompartment changes, but genes that shift from on to off have larger changes. Differentially expressed genes with promoters that shift by multiple subcompartments have significant impacts on fold-change, but smaller shifts have minimal impacts on gene expression. In summary, small changes in subcompartments are very common and have little impact on gene expression, while larger changes are infrequent and correlate more strongly with changes in gene expression.
  
  (3) The authors could push their TAD analysis further by integrating it with transcription. Can they look at genes and their enhancers that span these altered boundaries to see if these shifts impact transcription?
  
  We provide this analysis in Supp. Fig. 9. We started, as suggested, by looking at genes with distal enhancers (as determined by the ABC model) that span a single TAD boundary. However, the number of genes that fit this definition was relatively small, so we expanded to look at any genes with promoters in the proximity (50kb) of differential insulation score boundaries, for which we saw the same trends with more robust signal. Our findings are shown in Supp. Fig. 9 and discussed on page 10. We found that genes near weakened boundaries are not enriched for differentially expressed genes, while those near strengthened boundaries are. Comparing the fold-change of genes near strengthened, weakened, and static boundaries showed a significant inverse correlation between boundary strength and gene expression, although effect sizes were small. These results show that changes in TAD boundary insulation have small but noticeable impacts on gene expression.
  
  (4) The progression of cancer critically goes from a benign -> pre-malignant -> malignant -> metastatic series of steps. The AT1 line is described as 'premalignant' and thus the authors' series omits a malignant line. While I think adding such a sample is an unreasonable request at this point (as it would have had to have been studied in 'batch' with these other samples), the authors should acknowledge that they omit this step and spend some time discussing the genetic, morphologic, and phenotypic features for their 3 conditions. The images in Figure 1S aren't particularly useful- they don't tell the reader that these cells are malignant/benign. The karyotypic data are intriguing but not fully analyzed, so it is hard to know what true phenotype these cells represent. For example, malignant means DCIS/invasive carcinoma - so then what does this pre-malignant cell model represent? The described alteration in the AT1 line is a Ras oncogene, so in some sense, the transition to this line really is just +/- Ras. The authors could spend some time thinking about the effects of Ras specifically on the 3D genome.
  
  We have expanded our discussion of the relevance of the MCF10 model on page 4, and the limitations of the model on page 17. The MCF10 progression model has been extensively used by many laboratories, and its properties have been discussed in detail (i.e. Polizzotti et al. 2012). Critically, the MCF10AT1 cell line is the product not only of Ras oncogene expression but then derived from a 100-day-old precancerous lesion that formed a squamous carcinoma in a mouse, and over this time it accumulated additional changes. The MCF10AT1 line is considered pre-malignant as it has accrued critical changes that prepare it for the metastatic transition, but it does not immediately form tumors when injected back into mice. Unlike the MCF10DCIS cell line which is malignant but not metastatic, the more aggressive MCF10CA1a is classified as both malignant and highly metastatic, forming tumors that quickly metastasize to the lungs in mouse xenografts. While both MCF10AT1 and MCF10CA1a are tumorigenic, we acknowledge the lack of a nonmetastatic malignant cell line in the discussion on page 17. We have also provided updated karyotype characterization of the cell lines used in this study in Supp. Fig. 1B and now include full composite karyotypes in the Methods section (page 18).
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  The reviewer’s recommendations are the same as their public review comments. See our response to the review comments above.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) If conditions permit, it is recommended that inclusion of primary human mammary epithelial cells (HMECs) to distinguish immortalisation-specific from malignancy-specific 3D changes.
  
  Micro-C data of equal resolution is not available for HMECs. We have, however, incorporated analysis of publicly available deeply sequenced Hi-C data of HMECs into several figures that explore the conservation of loops and TADs in these cells (Supp. Fig. 3 and 4).
  
  We find that chromatin loops and TAD boundaries detected across the MCF10 system are highly conserved across all other mammary epithelial lines studied. Chromatin loops that were more prominent in MCF10AT1 and MCF10CA1a lines were also significantly stronger in TNBC cells. Insulation score boundaries that were weakened in MCF10CA1a showed strong insulation across all cell lines in the TNBC system. These findings highlight that different model systems indeed have distinct profiles of structural change, just as they have distinct gene expression profiles.
  
  (2) The relationship between loop alterations and copy-number variations (CNVs) is not explored. If conditions permit, it is recommended that overlay differential loops with SNP/Indel/CNV data to exclude spurious differences arising from structural alterations.
  
  While we have not conducted an in-depth SNP analysis, we have clarified our discussion of the karyotype analysis on pages 21 and 23 and how we mitigated these effects when identifying differential loops between cell lines.
  
  (3) The horizontal and vertical coordinates of the diagram are difficult to view; it is recommended that the size of the text on the picture be adjusted to ensure that it is clear to read. Some of the text coordinates of the figure are labeled in gray; it is recommended that they be in black.
  
  The clarity of the figures has been improved.
  
  Reviewer #3 (Recommendations for the authors):
  
  I really like this paper. I think if the cancer focus can be down-emphasized (because I'm not fully clear what we've really learned about cancer), then it represents a nice dataset and a thoughtful, comprehensive analysis.
  
  We greatly appreciate the kind words and helpful feedback. The cancer focus has been toned down throughout the manuscript, as suggested.
  
  Minor Concerns:
  
  (1) The authors present a nice summary of the topological changes across samples. However, summary statistics can mask noise/bias and also don't fully convey the effect size of the reported changes. Highlighting individual loci and visualizing these would strengthen the paper and participate in maintaining a high standard for our genomic studies of topology, in which we summarize, but also provide representative examples. I would appreciate seeing more example plots at distinct loci (even if in the supplemental information).
  
  We have included several more example regions in Supp. Fig. 7 and 12, including four looped genes that change similarly between the MCF10 series and TCGA-BRCA data (2 stably looped genes and 2 differentially looped genes, 2 up-regulated and 2 downregulated), and six differentially looped and differentially expressed genes (3 which change in the same direction as the loops, and 3 which change in the opposite direction).
  
  (2) "To identify loops that changed significantly during cancer progression, we assessed changes in contact frequency among every loop in each cell type, correcting for karyotypic differences that result in differences in coverage between cell lines (see Methods)." The Methods section is not adequately explained. Also, could you go a bit deeper to define if these large-scale changes shift the 3D genome specifically? This is hard, but there may be some low-hanging fruit given the otherwise fairly isogenic features in your model.
  
  We have added more detail to the Methods section on pages 21 and 23 on how karyotypic abnormalities were included in our analysis and differential loop calling. A deeper analysis of how large-scale karyotypic changes affect chromatin organization (i.e. through the formation of neoloops and TADs through translocations) is indeed an attractive subject, but due to its complexity requires a separate dedicated study.
  
  (3) "Approximately half of chromatin loops featured some combination of active gene promoters and enhancers within 10kb of loop anchors". The authors have high-resolution topology data and should be more stringent; these features should have to overlap loop anchors or at least use a distance less than 10kb, which, in some sense, forfeits the advantages of high-resolution topology data.
  
  The threshold of 10kb was chosen for several specific reasons: First, the loop sizes detected here are large enough that this relatively large region still represents a small fraction of the loop span, and these regions are reasonably considered anchor-proximal. Second, the loops we detect are non-punctate, both in aggregate analysis (Figure 1H, bottom) and at individual loci (see example regions), showing increased contact frequency among several 5kb or 10kb bins. Therefore, adding 10kb to either side (2 pixels on 5kb maps and 1 pixel on 10kb maps) ensures that the full region of increased contact frequency is included. Finally, ultra-resolution Hi-C data has also shown that loops remain diffuse even with 1kb resolution maps (albeit they do get smaller than the 30kb used here) (Harris & Gu 2023). We have added a brief justification of this overlap size to the text on page 24.
  
  (4) "These results show that not only changes in either contact frequency and enhancer activity correlate with increased gene expression, but they also correlate with each other, suggesting a potentially linked functional role during enhancer-promoter communication." The authors could use this opportunity to disentangle the contributions of loops and chromatin modifications a bit more. The exceptions are of interest - e.g., loop is stable, gene expression changes or loop changes, gene expression does not. Highlighting exemplar cases for these exceptions (rather than just a genomics summary) would be nice to see.
  
  The additional example regions we have included in Supp. Fig. 7 and 12 now showcase a wider variety of scenarios; in addition to more examples of static loops with gene expression changes (Fig. 2, Supp. Fig. 7E-F) and differential loops with matching gene expression changes (Fig. 4, Supp. Fig. 7C-D, Supp. Fig. 12A-C), we now also feature examples of differential loops where gene expression changes in the opposite direction (i.e. a strengthened loop at a down-regulated gene, Supp. Fig. 12D-F).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.05.14.654144v3
Local file Local file

Untitled document

1
1. RealDidacticus 20 May 2026
  
  in Public
  
  I started undoing the 'participatory' design plans I unilaterally made to reconceive acollective methodology with more uncertain, voluntary, and relational dynamics. Surprisingly, this'ineffective' ongoing turn became a strength rather than a limitation—
  
  No plan is unilateral, we are a conduit of past relationships, of people who have influenced us, and we acknowledge this so much that we allow the transference of autonomy through "differently abled" people guardians, stewards of nature, animal caretakers, and political representatives.
  
  What Volpi is looking for without stating is a sustainable equalised economy where there are no power monopolies and notable hierarchies that may lead to oppression. But to say that "inefficiency" is a "strength" unkowingly perpetuates those oppressive structures, because this "inefficiency" is almost exclusive of wealthy people. Volpi's language is colonised, as they probably don't realise this.
  
  Say they get involved in an actually slow process, one where they don't propose, but wait for others to ask, one, like an ethnography, where they learn and listen and don't try to impose themselves and their ideas because that is the productivist system that academia perpetuates... then the group they end up in will either be a marginally small group of outcast people, or privileged (or both), with minimal potential impact for change; or they will end up in a bigger already existing association where they in a way "inflitrate" and only over multiple years start to achieve trust capital to push their ideas (having also taken some others' in order to claim epistemic humility and a certain representativeness).
  
  Let's see... how do I spell this? We must not condone infinite growth, but when it comes to things like ending poverty, I think our stance should be unambiguously clear that this is progress, that this is positive.
Annotators

RealDidacticus
www.medrxiv.org www.medrxiv.org

Determining fragility and robustness to missing data in binary outcome meta-analyses, illustrated with conflicting associations between vitamin D and cancer mortality

1
1. Public_Reviews 20 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This manuscript addresses an important methodological issue-the fragility of meta-analytic findings-by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong.
  
  Strengths:
  
  (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.
  
  (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.
  
  (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.
  
  Reviewer #3 (Public review):
  
  (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.
  
  This is a point I was remiss not to better elucidate. With regards to generalisation, the text has been modified to explicitly state that generalisability in this context means no specific study dependence, just a net number of subjects required to flip a result. The text reads:
  
  “Atal's method is highly useful, but one possible objection is that it has the downside of non-generalisability, as it finds very specific combinations of trials and patients that would have to be re-coded (events classified as non-events and vice-versa) for results to become insignificant. For example, an Atal meta-analytic fragility of 4 pertains to a specific and often unique circumstance when 4 patients could be recoded from a specific study or combinations thereof to change outputs, but this does not generalise to any 4 patients in that meta-analysis. This makes this definition of meta-analytic fragility useful but not general, and perhaps less intuitive to interpret than a typical RCT fragility metric. In this work, we establish a generalizable meta-analytic fragility metric, based upon Ellipse of Insignificance (EOI) analysis for dichotomous outcome trials. This method creates a pool of events and non-events in both arms, adjusted for weighing, and answers the general question of how many patients would have to be effectively recoded in a meta-analysis for results to flip, without requiring specific study identification.”
  
  (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.
  
  This is a very fair observation, and I need to better explain myself here! So there are effectively two measures of heterogeneity considered in this work; the typical value from a meta-analysis and the measure of divergence between the crude and the inverse-variance weighed adjusted – when these differ my small amounts, one could conceivably use either measure. I’ve changed the text to better reflect this, including:
  
  “This modification in akin to pooled in a meta-analysis, and adjusts for study level heterogeneity. After this modification, a standard EOI analysis can then be applied to the vector . In addition, we can also employ ROAR analysis to the same vector, yielding the raw number of patients in either or both arm who could be added a given direction to change the result, and exact combination of control and experimental group redactions required to change the result from a significant finding to a null one. Caveats for implementation and interpretation are outlined in the discussion section.”
  
  (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.
  
  This is an excellent suggestion – I’ve tried to do it with percentages, as in table 2, but these are minute in the case of the vitamin D trials, partially I suspect because they are extraordinarily weak. The Cohen’s H for these meta-analyses yields tiny values, which I think might be tied to the virtually negligible percentages we obtain for number needed to flip. With stronger data, it might be worth expanding this into a useful heuristic measure for robustness, though I don’t think vitamin D data as in this work is going to help us much. In light of the reviewer’s excellent comment, I added the following:
  
  In light of the reviewer’s excellent comment, I added lines 230-240 in the revised manuscript.
  
  (4) Comments on revisions:
  
  I am unable to find the author's responses to my previous round comments (Reviewer #3) in the revision package, though replies to the other reviewers are present. I will provide my updated feedback once these responses are available for review.
  
  My sincere apologies, I neglected the specific comments in error – this document should address them now, thank you again for giving this your time and consideration!
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

medrxiv.org/content/10.1101/2025.08.15.25333793v3
www.biorxiv.org www.biorxiv.org

Analysis of dendritic input currents during place field dynamics

1
1. Public_Reviews 20 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Fogel & Ujfalussy report an extension of a visualization tool that was originally designed to enable an understanding of detailed biophysical neuron models. Named "extended currentscape", this new iteration enables visual assessment of individual currents across a neuron's spatially extended dendritic arbor with simultaneous readout of somatic currents and voltage. The overall aim was to permit a visually intuitive understanding for how a model neuron's inputs determine its output. This goal was worthwhile and the authors achieved it. Their manuscript makes two additional contributions of note: (1) a clever algorithmic approach to model the axial propagation of ionic currents (recursively traversing acyclic graph subsections) and (2) interesting, albeit not easily testable, insights into important neurophysiological phenomena such as complex spike generation and place field dynamics. Overall, this study provides a valuable and well-characterized biophysical modeling resource to the neuroscience community.
  
  Strengths:
  
  The authors significantly extended a previously published open-source biophysical modeling tool. Beyond providing important new capabilities, the potential impact of "extended currentscape" is boosted by its integration with preexisting resources in the field.
  
  The code is well-documented and freely available via GitHub.
  
  The author's clever portioning algorithm to relate dendritic/synaptic currents to somatic yielded multiple intriguing observations regarding when and why CA1 pyramidal neurons fire complex spikes versus single action potentials. This topic carries major implications for how the hippocampus represents and stores information about an animal's environment.
  
  Weaknesses:
  
  While extended currentscape is clearly a valuable contribution to the neuroscience community, this reviewer would argue that it is framed in a way that oversells its capabilities. The Abstract, Introduction, Results, and Methods all contain phrases implying that extended currentscape infers dendritic/synaptic currents contributing to somatic output., i.e. backwards inference of unknown inputs from a known output. This is not the case; inputs are simulated and then propagated through the model neuron using a clever partitioning algorithm that essentially traverses a biologically undirected graph structure by treating it like a time series of tiny directed graphs. This is an impressive solution, but it does not infer a neuron's input structure.
  
  We are sorry if our text could be interpreted as if we were inferring unobserved inputs from the known outputs. This was not intentional and we were unaware of the possibility of such interpretation.
  
  In fact, at the beginning of the Results, we started the description of the extended currentscape method by explicitly stating that we need to measure the input currents: “Our method … requires measuring the membrane and axial currents throughout the dendritic tree of a neuron (in every node of the circuit)”.
  
  To further clarify that our method starts with measuring the input currents, we made this information explicit already in the abstract (“Our approach relies on the iterative decomposition of the axial current flowing between neighbouring compartments in proportion to the underlying membrane currents measured in the model.”), and in the Introduction (“Even if the membrane currents are known, studying the impact of particular ion channels on the neuronal response in such a dynamical system under in vivo conditions is hindered by two major obstacles”). We also rewrote several parts of the text to remove any phrases that could imply the inference of the inputs (line 568). We believe that after clarifying this at the beginning of the paper, the readers will not misinterpret our descriptions later in the text.
  
  Because a directed acyclic graph architecture is shown in Figure 2, it is unintuitive that the authors can infer bidirectional current flow, e.g. Figure 3 showing current flowing from basal dendrites and axon to soma, and further towards the apical dendrites. This is explained in Methods, but difficult to parse from Results amidst lots of rather abstract jargon (target, reference, collision, compartment). Figure 2 would have presented an opportunity to clearly illustrate the author's portioning algorithm by (1) rooting it in the exact morphology of one of their multicompartmental model neurons and (2) illustrating that "target" and "reference" have arbitrary morphological meanings; they describe the direction of current flow which is reevaluated at each time step.
  
  We thank for this comment. We agree that the concepts introduced here to explain our method are rather abstract and could be difficult to understand. To help the reader we followed the instructions of Reviewer and redesigned Fig. 2 to provide a step by step explanation of the extended currentscape method. In particular,
  
  We used a simpler model where the structure of the graph can be directly related to the morphology of the model.
  
  We show that the target node can connect multiple subtrees with axial currents flowing in different directions. We explain that in this case the inward and the outward subtrees are pruned and partitioned separately.
  
  We provide a glossary in Table 1 to ensure that the readers can follow our description and do not get lost amidst lots of rather abstract jargon.
  
  We also clarified that although the target compartment is chosen arbitrarily by the user, it remains the same for all time points throughout the analysis.
  
  Analyses in Figure 7, C and D, are insightfully devised and illuminating. However, they could use some reconciliation with Figure 5 regarding initiation of individual APs versus CSBs within place fields.
  
  We thank the reviewer for the positive comments and also for pointing out the potential source of misunderstanding. We slightly changed the text at Fig 5 to emphasize that this is a single example trial, and we added the following sentence to the paragraph describing Fig 7CD: “Consequently, the somatic current dynamics before the iAP and the CSB presented in Fig 5Cc-Dd can be regarded as illustrative samples from a broad distribution, but the differences observed between them are not representative.}”
  
  The intriguing observations generated by extended currentscape also point to its main weakness, which the authors openly acknowledge: as of now, no experimental methods exist to conclusively tests its predictions.
  
  We agree with the Reviewer that not being able to apply our extended currentscape method to reveal the current types driving real neurons recorded in vivo is currently a weakness of our approach. However, we would like to emphasize that it may be feasible to use it to estimate the spatial distribution of the membrane currents driving the cell based on in vivo voltage imaging data, as we briefly outline in the discussion.
  
  Reviewer #2 (Public review):
  
  Summary
  
  The electrical activity of neurons and neuronal circuits is dictated by the concerted activity of multiple ionic currents. Because directly investigating these currents experimentally isn't possible with current methods, researchers rely on biophysical models to develop hypotheses and intuitions about their dynamics. Models of neural activity produce large amounts of data that is hard to visualize and interpret. The currentscape technique helps visualize the contributions of currents to membrane potential activity, but it's limited to model neurons without spatial properties. The extended currentscape technique overcomes this limitation by tracking the contributions of the different currents from distant locations. This extension allows tracking not only the types of currents that contribute to the activity in a given location, but also visualizing the spatial region where the currents originate. The method is applied to study the initiation of complex spike bursts in a model hippocampal place cell.
  
  Strengths. >
  
  The visualization method introduced in this work represents a significant improvement over the original currentscape technique. The extended currentscape method enables investigation of the contributions of currents in spatially extended models of neurons and circuits.  >
  
  Weaknesses.
  
  The case study is interesting and highlights the usefulness of the visualization method. A simpler case study may have been sufficient to exemplify the method, while also allowing readers to compare the visualizations against their own intuitions of how currents should flow in a simpler setting.  >
  
  We thank the reviewer for this comment. In fact we had been also considering to include a simpler case study to illustrate the extended currentscape method in the original submission. In accordance with the comments from Reviewer 1, we now use a simple model to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  (1) Model complexity vs. intuition/validation. The case study relies on a very complex CA1 model, making it difficult to build intuition about current flow and to validate the visualization. Inclusion of a simpler benchmark (e.g., soma plus a dendrite with two branches, fewer compartments) is recommended to demonstrate how the extended currentscape behaves in a more tractable setting.
  
  Inspired by the suggestions of the Reviewers, we modified Figure 2 and now first use a simple model with a soma and a dendrite with two branches to introduce the concepts of our analysis. We start with a few examples where the reader can compare the results with their own intuition in simpler cases.
  
  (2) Rationale and citations for input structure. The in vivo-like input design (untuned inhibition; 12 co-tuned excitatory clusters with large conductances; the goal of generating place fields) would benefit from a more explicit rationale and substantially more literature support. Alternative plausible scenarios (e.g., distributed co-tuned inputs and homosynaptic plasticity) should be articulated, and choices situated within the experimental literature on CA1 excitation/inhibition, including tuning and anti-tuning results.
  
  We extended the paragraph in the Results describing the input structure and added the most important references there. We added further references to the Methods section where we argue that “Reliable place cell tuning can be achieved by functional synaptic clustering without increased excitatory drive in the place field (Ujfalussy and Makara 2020) or via strong excitatory drive without input clustering (Grienberger et al., 2017, Ujfalussy and Makara, 2020). However, experimental data indicates that both of these mechanisms are present and contribute to the activity of place cells (Adoff et al., 2021,Tasciotti et al., 2025)” and “although interneurons can display spatial tuning, they typically have a broad tuning with low selectivity (Ego-Stengel et al., 2007, Dupret et al., 2013, Geiller et al., 2020). A weak disinhibition within the place field can also contribute to the selective firing of place cells (Geiller et al., 2022, Valero et al., 2022), this was not necessary for place cell activity in novel environments (Geiller et al., 2022) and the overall inhibitory input to place cells is largely untuned (Grienberger et al., 2017).”
  
  (3) Scope of PCA-based claims. The interpretations derived from the PCA analysis appear broader than warranted, given subcellular heterogeneity and the dominance of somatic action potential variance. These claims should be tempered with more explicit statements about what PCA can and cannot resolve in this context.
  
  We thank the Reviewer for the opportunity and encouragement to clarify this part of the text. We agree with the Editor and the Reviewers that the results of the PCA analysis can not be used to support claims regarding the presence or the absence of independent dendritic events. In fact, we aimed to use it as an illustration that global activity tends to dominate PCA analysis even when the “neuron is mainly driven by strong, functionally clustered synaptic inputs to a few dendritic branches”. We acknowledge that we did not formulate this point clearly in the original submission. Therefore we substantially rewrote this part of the Results and performed additional analysis to clarify that there is a substantial amount of soma-independent dendritic activity in our model that remains invisible for a PCA based analysis.
  
  Reviewer #1 (Recommendations for the authors):
  
  Major concerns:
  
  (1) Depolarization-inactivated K+ may be an important consideration to model burst-firing.
  
  Our current model includes 2 kinds of transient K+ channels that show inactivation after depolarization: a proximal and a distal type, as the original model in Jarsky et al., 2005. We now made this explicit in the main text (line 178).
  
  (2) Description of the in vivo-like model's excitatory and inhibitory input structure needs many more citations of biological studies to communicate rationale for the author's decisions, e.g. untuned inhibitory neurons, organization of a subset of excitatory inputs into 12 function synaptic clusters with co-tuned presynaptic neurons and outsized synaptic conductances. The goal is clearly to create CA1 pyramidal neurons with place fields, which would be helpful to state upfront. But additionally, (a) place fields could arise from homosynaptic potentiation of distributed co-tuned excitatory inputs (e.g., Bittner, et al. 2017 study describing BTSP made no assumptions) and (b) CA1 inhibitory interneurons can be spatially tuned (Ego-Stengel & Wilson, 2006; Wilent & Nitz, 2007; Geiller, et al. 2020) and even anti-tuned (Geiller, et al. 2021).
  
  We thank the Reviewer for pointing out the lack of appropriate references in this section. We made the following changes in the manuscript:
  
  (1) Stated explicitly that the goal was to create place cell activity.
  
  (2) Added references to the main text to justify our choices of the inputs (lines 234-241).
  
  (3) We included a longer rationale for the choice of synaptic clusters and the lack of inhibitory (anti-)tuning in the Methods section, describing the neuron model. In brief, Adoff et al., 2021 reported more clustering of excitatory inputs within the place field. In our model, the degree of clustering is somewhat larger than the clusters reported. Although inhibitory neurons can be tuned, their tuning is much weaker than that of place cells and seems to play only a minor role in the generation of place fields (Grienberger et al., 2017). The presence of inhibitory anti-tuning is controversial: although Geiller et al., 2021 reported weak (~10%) anti-tuning, they did not find it in novel environments, indicating that it is not needed for spatially selective activity (lines 628-646).
  
  (3) Interpretation of principal component-based analyses shown in Figure 4 could be toned down. As written in section "CSBs in the CA1 pyramidal neuron", it sounds like CA1 pyramidal neuron dendrites display minimal autonomous activity. However, PCA does not seem well-suited to address the heterogeneity of subcellular voltage dynamics over physiologically relevant timescales. Somatic action potentials, and their backpropagation/modulation of dendritic voltage, would of course explain a very large fraction of variance. However, if local dendritic events summate over fine timescales to initiate somatic firing, it is hard to imagine this important nuance being detected. On the other hand, it is hard to imagine single dendritic branches driving robust somatic firing except in the relatively extreme situation in which large numbers of synapses synchronously drive the same branch to initiate a local Ca2+ spike (Figure 3, A-C).
  
  We agree with the reviewer that PCA can not reveal the potential dendritic origin of somatic APs, and thus is not suitable to assess the role of local dendritic spikes in shaping the output of the cell. We wanted to highlight here that even in cells with excitable dendrites driven by strong, local input clusters, exhibiting frequent local dendritic spikes, the dendritic membrane potential dynamics will be dominated by global fluctuations with surprisingly little sign of local dynamics in the PCA components. As the reviewer also pointed out, this may not be surprising as local events either remain spatially restricted and thus contribute little to the overall variability of the dendritic Vm or they initiate somatic APs and will thus be counted as global events.
  
  To demonstrate the high propensity of local dendritic events, we analysed local Vm peaks in dendritic branches and found that ~7.6% of the peaks were not coupled to somatic APs.
  
  Although this number could seem low, we emphasize that most of the 92.4% of the dendritic peaks coupled to APs potentially reflect the backpropagation of the same somatic events to multiple dendritic sites. To confirm this, we performed an additional analysis measuring the spatial extent (number of branches involved) of the individual dendritic events. We found that 90% of the events remained local, restricted to a few dendritic branches, while 10% of the events were global, associated with BAPs and involving the majority of the dendritic tree. Interestingly, these global events dominate the PCA analysis and are responsible for >90% of the dendritic Vm peaks. These results are included in a new panel in Figure 4H.
  
  We conclude that, “this way, although only 10% of the dendritic Vm events were associated with bAPs, they were ~60-times larger than local events and they dominated the PCA analysis even in the presence of local regenerative dendritic events driven by strong, functionally clustered synaptic inputs.” We believe that this model and analysis could serve as an important benchmark for future experimental studies investigating the structure of membrane potential correlations in in vivo voltage imaging data (Lee et al., 2026).
  
  (4) One suggestion would be to display more data as shown in Figure 4F, with a longer X axis to clarify the temporal relationship between local dendritic spikes and the first somatic action potential.
  
  We added a few more examples including the CSBs presented in Fig8G-I as a new supplementary Figure S4. We also slightly extended the x-axis on this supplementary figure as the reviewer requested.
  
  If the models indicate that passively filtered EPSPs drive most somatic action potentials, as seems to be the case in Figure 5, then this would also be helpful to show as in Figure 4F.
  
  In Fig 5 we showed two examples of isolated APs. The first AP was indeed driven by passively filtered EPSPs. The second one was preceded and possibly caused by a dendritic spike, as highlighted by the black arrowhead labelled c in Fig. 5Cc. We further analysed the currents driving iAPs in Fig 7B and C, and found that there is considerable heterogeneity in the magnitude of the dendritic Na currents driving the soma before action potentials. Figure 8 and Figure S3 (now Fig. S5) show further examples for iAPs driven either by passively filtered EPSPs or dendritic spikes. We also included these examples in the new supplementary Figure S4.
  
  (5) Another suggestion would be to use one-hot vectors containing onset times of different event types, since this would divorce the amplitude/duration of events from their influence over total variance.
  
  In this paper our goal was to illustrate the ability of the extended currentscape method to reveal the origin of the axial currents driving neuronal activity. In Fig. 4, our primary intention was to characterize the membrane potential response of the model in a way that is easily comparable with experimental data. To further quantify the frequency of local events, we added a new panel showing the spatial extent of dendritic events (Fig. 4H). To make our model more comparable with recent publications, we also calculated two additional metrics used to evaluate the relationship between somatic and dendritic activity (Fig 4I-J). We hope that these additional analyses help the reader to characterize the prevalence and impact of local dendritic events on somatic activity.
  
  (6) From section "Input conditions for complex spike burst generation", paragraph 2: "Note that synapse density, the ion channel mechanisms and the input statistics are identical for tuft and oblique branches,...". The authors should justify this parameterization given the numerous known differences between tuft and oblique branches in both of these regards and acknowledge accompanying interpretational caveats.
  
  We agree with the reviewer that experimental data demonstrated several significant differences between the tuft and oblique branches regarding both the inputs they receive and the way they process it. However, in the present paper we chose not to include these differences for several reasons:
  
  Here we aimed to focus on the abilities of the dendritic currentscape methods and use CSBs as a case study to illustrate how dendritic currentscape can reveal the membrane currents underlying complex neuronal responses.
  
  Currently there is no CA1PN model that would be able to reproduce all data regarding tuft and oblique integration and would be able to fire calcium spikes. We only wanted to make minimal modifications to the existing CA1PN model to make it capable of generating Ca-spikes and CSBs. We are currently working towards developing and extensively testing a new model, examining the role of these regional differences in CSB generation.
  
  Although there is information regarding input statistics and dendritic physiology in the literature, many of the relevant parameters are underconstrained. We wanted to avoid overfitting by keeping the model simple.
  
  By maintaining identical inputs and ion channel distribution we can distinctly highlight the special role of tuft morphology in CSB generation. Altering the inputs or the ion channel density for the tuft would make the interpretation more ambiguous, and elucidating the specific role of the different factors in CSB generation is the subject of future investigations.
  
  In sum, although we acknowledge that our model does not reflect the full complexity of CA1 PNs and its inputs, we regard this simplicity as a useful feature of the model. We added a section discussing potential future extensions of the model and highlighting interpretational caveats in the discussion (lines 482-490).
  
  (7) Given the debate in the field regarding the level of functional autonomy present in dendrites, the authors' finding that dendritic voltage largely tracks that of the soma (though see concern above re: PCA), and their access to specific currents, the authors have an important opportunity investigate the divergence between Ca2+ and voltage sensors as reporters of dendritic activity.
  
  For instance, why have some studies reported relatively common isolated dendritic Ca2+ transients in CA1 pyramidal neurons while other studies, including voltage imaging studies, have reported the opposite?
  
  We thank the Reviewer for the opportunity to highlight a few important points regarding functional autonomy of dendrites based on the analysis of our model. We would like to first note that only parallel calcium and voltage imaging studies will be able to ultimately resolve this debate. Nevertheless, below we briefly summarize our take on this issue.
  
  (1) In general, most Ca2+ imaging studies found that soma-independent dendritic events are rare. "Isolated dendritic transients (no coincident somatic event; see fig. S6, C and D, for example) were overall rare. Isolated apical dendritic Ca2+ transients, which have not previously been reported in CA1PNs, were larger and more frequent than those observed in basal dendrites." (O’Hare et al., 2022). "Activity in the ... basal dendrites ... along the track but outside of the place field was rarely observed” (Sheffield and Dombeck, 2014) and “overall, isolated dendritic transients were similar in size but occurred far less frequently than coincident dendrite-soma transients”, or “data indicate that spatially reliable dendritic firing was almost exclusively yoked to somatic tuning, likely reflecting strong backpropagation of burst firing during traversals of the somatic PF” (Rolotti et al., 2022). Consistent with this observation, a dendritic Vm peak chosen randomly from any branch has ~93% probability to be related to a bAP in our model. However, it is also true that ~90% of events in the model are local events, simply because isolated events involve ~60-times fewer branches (1.8 on average) than events associated with bAPs (114 branches) in the model. If the spatial extent of typical local events are also similarly small in real neurons as in the model, then even rare occurrences of dendritic events may reveal substantial dendritic independence. We added a section quantifying the functional autonomy of dendrites in the model in the main text, around Fig 4H.
  
  (2) Ca2+ indicators are slower and nonlinear and thus they are somewhat unreliable reporters of dendritic voltage events, especially in distal dendrites (Wu et al., 2026; Gonzalez et al., 2026). To illustrate this, we calculated three metrics in our model that were also reported in recent dendritic Ca2+ imaging studies (Rolotti et al., 2022, Sheffield et al., 2014, 2017). First, we calculated the fraction of bAPs detected in a branch (called dendrite-soma coupling in Rolotti et al., 2022, see their Fig. 2C) as a function of the distance of the branch from the soma (our new Fig. 4I). In the Ca2+ imaging data, this was essentially constant ~30% between distances 5-100 µm from the soma. In contrast, the fraction of bAPs detected in the model was 100% in this range as bAPs propagation failures did not occur before µ100 µm. This is also consistent with a recent voltage imaging study showing that even low-transmission bAPs reliably propagate to the proximal dendrites (Lee et al., 2026, Fig 3G). The low and distance independent dendrite-soma coupling reported by Rolotti et al. can only be reconciled with the known biophysics of neurons if the recorded calcium signal is unreliable reporter of the underlying voltage. Indeed, it has been reported that Ca signals associated with bAPs can be absent in some dendritic branches (Landau et al., 2022) or that local, nonlinear Ca signals can appear in the absence of local regenerative voltage response (Weber et al., 2016, Tran-Van-Minh et al., 2016) and that the Ca signals are highly variable across cells (Eltes et al., 2019).
  
  Second, we calculated the fraction of local events as a function of the distance from the soma (our Fig 4J; see also Fig. 2F in Rolotti et al.). When averaged across all branches, this was somewhat lower in the model (18%) than in the data (38%) which, again, could be explained by the low reliability of detecting global voltage events in all compartments based on the calcium signal.
  
  Third, the range of branch-spike-prevalence (BSP) values in our model (0.5-0.9; Fig. 4H) seem consistent with that reported (0.4-0.8) at first (Fig 4C of Sheffield et al., 2014; Fig 2 of Sheffield et al., 2017). However, we note that there are several important differences: for technical reasons, Sheffield et al. reported BSP for place field traversals and not for individual events, and they measured Ca2+ dynamics in the basal dendrites. Since bAPs are almost always present in all basal dendrites in the model (basal BSP > 0.9 for all events with somatic spikes) and place field traversals were always accompanied by somatic APs, BSP for basal dendrites would be nearly 1 in the model. Thus, the lower BSP values reported by Sheffield et al. could be explained by the limited reliability of the Ca2+ indicators in reporting regenerative voltage events in neuronal processes.
  
  We briefly discussed these differences in the Discussion (lines 474-478).
  
  (3) Finally, to our knowledge, there are 3 relevant in vivo voltage imaging studies in CA1 PNs. Liao et al., 2024 found that in induced place cells the tuning of dendritic events (presumably local or back-propagating Na-spike) was similar to the somatic tuning, which is consistent with our model where dendritic activity and tuning is dominated by bAPs. However, they did not acquire simultaneous signals from the dendrites and the soma so they could not study the independence of the dendritic events. Lee et al. (2026) found that only 10% of the dendritic events are not associated with a somatic spike, which is lower than the number of independent events in the model. However, the events they found were generated in the distal apical trunk (their Fig 3D) and they could not record from the most distal branches where most of the isolated events were generated in our model. Gonzalez et al., 2026 measured voltage and calcium in selected locations within the dendritic tree, and could not reliably estimate the fraction of isolated events throughout the cell. (Gonzalez et al, 2024 measured voltage only in single spines and soma, but did not quantify independent dendritic events; Wong-Campos et al., 2023 measured dendritic integration and bAPs in L23 branches; Wu et al. 2026 recorded in CA2 neurons.)
  
  We added a paragraph in the discussion comparing the level of functional autonomy present in the model dendrites to recent Ca- and voltage-imaging studies (lines 467-474).
  
  Minor concerns:
  
  (1) Abstract:
  
  There is a need to explain what currentscape is - even at the cost of not invoking its name. To a reader not familiar with currentscape, the abstract is extremely difficult to understand.
  
  We reworded the title and the abstract to make them more accessible to readers not familiar with the term currentscape.
  
  (2) "Currentscape analysis of place field dynamics" section:
  
  It would be helpful to emphasize upfront that dendritic determinants of individual somatic APs versus CSBs will be discussed separately. Since somatic action potentials are discussed before CSBs, I found this section initially confusing as I attributed those findings to CSBs until reading the next paragraph.
  
  We added a sentence to clarify that we analysed subthreshold responses, APs and CSBs separately.
  
  (3) Bottom of p2 discussing mixed literature on what drives CSBs in CA1 PCs:
  
  Overall accurate and useful point, but an important nuance is glossed over which misportrays state of field. References ex vivo studies that fail to drive CSBs with somatic current injection and in vivo study successfully doing so. These aren't really conflicting results. In vivo current injection co-occurs with spontaneous synaptic input, which is high in CA1 and results in PCs that are significantly depolarized at rest relative to those in acute slices. Bittner 2017 ex vivo results are consistent with this: CSBs driven by Cs+-based internal solution to block K+ channels (partially, using strategy of purposefully high series resistance). Similar situation in vivo given that A-type K+ channels are inactivated by depol. Resulting increase in input resistance lowers input threshold to CSB. This is clarified in Results, p.5: "Under in vivo-like synaptic input conditions (see below and Methods), dendritic Ca2+-spikes could also be evoked by somatic current injection (Fig. S1E), as in Bittner et al. (2015).", which makes p. 2 feel especially awkward.
  
  We agree with the Reviewer that these are not necessarily conflicting results. We rephrased this section, emphasizing that the role of the different input pathways in the initiation of CSBs are not clear.
  
  (4) Abbreviating "pyramidal neuron" with PC is confusing:
  
  PC often means place cell. The authors could change this, such that PC refers to "pyramidal cell", or else use PN as an abbreviation. It is important to avoid confusion, especially because place cell dynamics feature prominently in the manuscript.
  
  Thanks for the suggestion. We replaced PC with PN throughout the manuscript.
  
  (5) Only apical dendritic parameters are described in section 2 of Results, but the full morphology is shown in Figure 3B with basal currents shown in panels C and F. Some clarification is needed - either what currents were considered for basal dendrites and why, or else why basal dendritic current parameters were not considered for this simulation using apical dendritic current injection but nonetheless examining basal dendritic currents.
  
  We clarified in the text that the original model contained a standard set of Na and K channels (line 178).
  
  (6) Clarify "i" and "s" in the Figure 3C legend - "intrinsic" and "synaptic" white letterings are small/hard to see in the bottom subpanels.
  
  We now spell out intrinsic and synaptic in the Figure and increased the contrast of the letterings.
  
  (7) Regarding the computational benefit of recursively decomposing axial currents along an adaptively truncated acyclic graph, it would be useful to (a) include a supplemental figure benchmarking this approach to standard approaches to quantify the described gain in computational efficiency and (b) describe computing hardware in the Methods.
  
  We included an estimated benefit of the pruning process (line 758) as well as the utilised computing hardware and the simulation times in the Methods (line 776).
  
  Reviewer #2 (Recommendations for the authors):
  
  The manuscript is in great shape, it is well organized, and the figures are gorgeous. I believe that the extended currentscape is a great extension of the original currentscape method. In particular, the possibility of partitioning currents by the spatial location of their sources is a great addition.  >
  
  Recommendations:
  
  (1) The method is applied in the context of an interesting case study that highlights its usefulness. However, the model in the study is so complex that it is difficult to develop an intuition of how currents should be flowing, and this makes it hard to intuitively validate the visualization method. I think that applying the extended currentscape in a simpler model - maybe a soma with a dendrite with two branches, fewer compartments - would be instrumental in developing this intuition.  >
  
  We now first use a simple model with a soma and a dendrite with two branches to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases. We also added the currentscape analysis of a standard, two-compartmental model from Pinsky and Rinzel, 1994 as Supplementary Figure 1.
  
  (2) I found a number of typos and minor stylistic details you may want to fix in a revised version of the manuscript.
  
  (a) Abstractine, line 12. I believe the word "recursive" is a bit technical at this point. It's meaning in this context becomes clear after ones goes through the details of the algorithm (Figure 2).  >
  
  We replaced the word “recursive” with “iterative”. We hope that this will make the abstract clearer for the readers. In fact, we realized that the word iterative is a better description of the algorithm, so we replaced the “recursive” with “iterative” consistently throughout the manuscript.
  
  (b) Figure 1, caption."Since we included the capacitive current, the magnitude of the inward and the outward currents is identical (Kirchhoff's law)."This sentence can be confusing. If the inward and outward currents are the same, the membrane potential doesn't change. I believe that you are including the capacitive current in the inward (or outward) currents.
  
  Indeed, we included the capacitive current in the inward or outward currents. We changed the text to clarify this.
  
  (c) Lines 92-93. I do not fully understand this sentence. Are you making an assumption? What does 'continuos flow of axial current' mean? >
  
  By ‘continuous flow of axial current’ we meant a spatially continuous stream of axial currents flowing from the reference to the target. To clarify this, we added the explanatory sentence: “i.e., if the axial current is not blocked or reversed between the reference and the target.”
  
  (d) Equation (1.) Why summing axial currents over j? Is this for the case of a branching point?
  
  The compartment could be 1) part of a continuous segment of dendritic branch, where axial currents can flow from the distal and the proximal direction (sum over 2); 2) It can be a branch point with 3 axial currents; 3) or it can be a leaf compartment with only one axial current, in which case the summation is not relevant. We clarified this in the text.
  
  (e) Figure 2, caption. Typo. "When the axial currents flows…" Should it be 'current'? - Figure 3, caption. Typo in (C) "Extended currentscape"  >
  
  Corrected.
  
  (f) Figure 4. I cannot see the grey lines or the dotted lines mentioned in the caption.  >
  
  We added an arrow highlighting the gray and the dotted lines in the figure.
  
  (g) Figure 5, caption. "Red boxes highlight regions analyzed in panels B-D."Because this is a spatially extended model, region may be confused with spatial location, but you are highlighting a temporal interval. >
  
  We rephrased the caption referring to temporal intervals now.
  
  (h) Line 341. This is a numerical experiment, correct?  >
  
  We clarified in the text and added that it was indeed a simulation experiment.
  
  (i) Line 349. Should it be 'distributions'?  >
  
  Corrected
  
  (j) Line 422. Typo. Missing space 'in vivousing' >
  
  Corrected
  
  (k) Line 537. "Preprocessing membrane…" I found this entire subsection a bit confusing and hard to read.
  
  We rephrased this subsection to clarify it and facilitate reading.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.28.667243v2
www.biorxiv.org www.biorxiv.org

Functional Separation of mRNA Domains Coordinates Pluripotent Cell Behavior

1
1. Public_Reviews 20 May 2026
  
  in eLife
  
  Author response:
  
  eLife Assessment
  
  This study provides fundamental insights by demonstrating that the Nanog mRNA coding sequence (CDS) and 3′UTR domains are spatially segregated and functionally distinct in pluripotent stem cells and blastocysts, with 3′UTR-enriched border cells primarily influencing morphogenesis and CDS-enriched inner cells largely regulating transcription and epigenetic programs. The work opens a novel conceptual avenue for understanding how separable mRNA domains can differentially control cell behavior and differentiation. However, the evidence is incomplete, as key aspects of the molecular nature, biogenesis, and precise characterization of the separated 3′UTR and CDS RNA species, as well as causal links between their perturbation and the observed phenotypes (e.g., via rescue and deeper characterization of 3′UTR elements), remain to be fully established.
  
  We thank the editors and the three reviewers for their careful and constructive engagement with our manuscript. We greatly appreciate the reviewers’ recognition of the conceptual significance of the study and their thoughtful suggestions for strengthening the mechanistic and molecular characterization of the work. We have carefully considered all points raised and outline below the revisions planned for the revised manuscript.
  
  The phenomenon of differential CDS and 3’UTR expression is not unique to Nanog. Independent 3’UTR and CDS expression and differential CDS/3’UTR usage has been observed across multiple genes, tissues, and developmental contexts, including genome-wide (Mercer et al., 2011) and transcriptome scale studies (Kocabas et al., 2025, Ji et al., 2021). Prior studies have proposed that isolated 3’UTRs may arise through regulated RNA processing pathways coupled to exonucleolytic degradation and, in some cases, recapping mechanisms (Malka et al, 2017, Haberman et al., 2024). While the precise molecular mechanisms underlying isolated Nanog CDS and 3’UTR generation remain unresolved, our observations (contained here) support regulated RNA processing models. Our original submission included a brief discussion of this topic; however the revised manuscript will include substantially expanded analyses and discussion of the generation of isolated Nanog CDS and 3’UTR species.
  
  The revised manuscript will address the major concerns regarding:
  
  (1) The molecular nature, biogenesis, and precise characterization of the separated 3′UTR and CDS mRNA species
  
  (2) The causal relationship between perturbation of these RNA species and the observed phenotypes, including additional rescue experiments and deeper computational characterization of putative, functional 3′UTR elements.
  
  Specifically:
  
  (A) New supplementary analyses and schematics designed to further clarify the conceptual and mechanistic framework of the study, including:
  
  (i) Computational examination of the Nanog 3’UTR across all reading frames for open reading frames (ORFs).
  
  (ii) As suggested by Reviewers 1 and 3, single cell traces of Nanog mRNA expression from the full-length mESC dataset used in this study, illustrating distinct transcript isoforms and CDS/3’UTR expression patterns across individual cells, complementing the color-coded tSNE analyses currently presented in Fig. 2.
  
  (iii) Expanded schematic model and analyses addressing possible mechanisms underlying the generation of isolated Nanog CDS and 3’UTR enriched RNA species, including transcript architecture, predicted RNA structural barriers, and exonucleolytic processing models.
  
  (iv) Expanded discussion of the predominantly nuclear localization of the Nanog 3’UTR signal and its implications for transcript biogenesis, processing, and potential noncoding functions.
  
  (B) Correction of all minor labeling errors.
  
  (C) Additional experimental analyses, including:
  
  - Expansion of Nanog 3’UTR overexpression and rescue experiments to include cell spreading assays.
  
  - Expanded analysis of the effects of ROCK pathway inhibitors on colony morphology and cytoskeletal organization.
  
  - Examination of the ability of ROCK inhibition to restore normal embryoid body formation.
  
  Collectively, these planned revisions are intended to strengthen the mechanistic framing, molecular characterization, and broader significance of the study while clarifying the interpretation and scope of the conclusions.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  There is evidence that some genes encode mRNAs from which separate processed transcripts may arise, separating the coding sequence (CDS) from the 3'-UTR, and with both mRNA elements remaining stable in the cell. However, the functional consequences of these mRNA fragments have not been firmly established. In the manuscript by Yang et al., the authors probe the mRNA domain architecture of Nanog in the context of embryonic stem cell colonies and blastocysts. The authors detect spatial separation of Nanog CDS-containing mRNA from abundant Nanog 3'-UTR RNAs depending on the cell position in 2D embryonic stem cell colonies or in blastocysts.
  
  Strengths:
  
  The phenotypic analyses of the Nanog mRNA hold promise for revealing distinct roles for the Nanog encoded protein and a separate RNA encompassing the Nanog 3'-UTR.
  
  Weaknesses:
  
  There are a number of questions about the molecular nature of the mRNA species that the authors should address in order for the results to be firmly established, as noted below.
  
  (1) It is not clear how the authors verified that their probes are specific for Nanog CDS or 3'-UTR regions. Especially for the 3'-UTR probe, it is confusing why colonies show green only regions, suggesting only the CDS is present. I would expect the CDS and 3'-UTR probes to colocalize in the interior cells. Is it possible that the 3'-UTR probe is targeting another RNA?
  
  We thank the reviewer for raising the important question of probe specificity. We realize that the data that underlying this concern is the absence of colocalizing between CDS and 3’UTR probes in colony border cells.
  
  The absence of CDS/3’UTR colocalization in colony border cells is not due to probe failure but instead reflects the principal observation underlying the study. If Nanog CDS and 3’UTR sequences were present exclusively as intact full-length transcripts in a strict stoichiometric ratio, Nanog positive cells would be expected to be positive for both probes (appearing yellow). Instead, border cells exhibit strong 3’UTR signal with minimal or absent CDS signal, while adjacent interior cells show the opposite pattern.
  
  The fact that both probes robustly detect signal within the same sample but in spatially distinct cell populations, argues that both probes are functional and that the observed differential localization reflects genuine biological differences in levels of transcript components.
  
  The CDS probe targets ~300 bp within the coding region, while the 3’UTR probe targets ~300 bp within the proximal region of the Nanog 3’UTR. Hybridization specificity was validated as described in the Methods and in our previous studies (Kocabas et al 2015; Ji et al 2021), including negative controls. We additionally now provide a supplemental figure (New Figure 1-figure supplement 2A), highlighting that the Nanog 3’UTR and CDS probes label cell populations distinct from each other, further indicating their specificity.
  
  In addition, full-length scRNA seq datasets from both mouse and human ESCs demonstrate differential CDS/3’UTR expression patterns for Nanog and many other genes. To further clarify this point, the revised manuscript will include single cell transcript traces from mESCs illustrating the distinct Nanog isoforms detected across individual cells (New Figure 2-figure supplement 1A)
  
  (2) It would help for the authors to include a graphic similar to Figure 3, Figure Supplement 1A, that diagrams the location of the CDS and 3'-UTR probes (this should also be done for Oct4 and Sox2). This graphic could also show all potential polyadenylation signals.
  
  We agree that additional schematic clarification would improve readability. The revised manuscript will include schematics showing the locations of the CDS and 3’UTR probes for Nanog, Sox2 and Oct4 (New Fig. 1- figure supplement 1A).
  
  (3) I think, based on the fluorescence patterns, there is evidence that the signal for the Nanog 3'-UTR probe is nuclear (images with DAPI staining), but this is not commented on that I could find. This should be discussed, as nuclear retention has implications for the noncoding function of the 3'-UTR fragment.
  
  The reviewer is correct that the Nanog 3’UTR signal mostly nuclear. Whie this was noted in (the original) Figure 1-figure supplement 2A, we agree that it is possible that mechanistic and functional implications were not sufficiently discussed in the original manuscript. The revised manuscript will include expanded discussion of the relationship between nuclear localization transcript processing, and potential noncoding functions of isolated Nanog 3’UTR species
  
  (4) Figure 2, Figure Supplement 1A needs a better explanation. It's not clear how the reads map to the different regions of the Nanog mature mRNA. The authors should show examples at different ratios of CDS to 3'-UTR. Do the reads have a sharp boundary at the junction of where the isolated 3'-UTR is thought to occur?
  
  We thank the reviewer for this suggestion. The revised manuscript will include new single cell read maps across the Nanog locus from full length mESC scRNA-seq datasets (New Figure 2-figure supplement 1A), illustrating distinct CDS enriched and 3’UTR enriched transcript isoforms across individual cells.
  
  These analyses indicate that some CDS dominant transcripts contain 3’UTR sequence, while many appear to contain little or no detectable 3’UTR sequence. Conversely, many 3’UTR enriched transcripts contain only minimal or truncated CDS sequence. Importantly full CDS and 3’UTR mRNA components are frequently not present in a strict 1:1 ratio, either within individual cells, or across cell populations.
  
  The revised manuscript will also include expanded supplementary analyses integrating transcript architecture, predicted RNA structural barriers, polyadenylation analysis, and single cell coverage patterns to further examine possible mechanisms underlying the generation of isolated Nanog CDS and 3’UTR species (New Figure 2-figure supplement 1B,C).
  
  (5) I looked in the Zenbu browser at human NANOG CAGE mapping in the FANTOM5 dataset. I could not see evidence for substantial capping of a 3'-UTR fragment when filtering for embryonic cell types. Given the strong signal for the 3'-UTR in border cells, I would expect to see evidence for capping if the RNA were indeed capped. This suggests that if it exists, it is likely uncapped and (as noted in point 3) is likely nuclear retained.
  
  Prior studies have reported isolated uncapped and recapped 3’UTR species in multiple systems (Malka et al, 2017; Haberman et al, 2024). We agree that the predominantly nuclear localization and lack of a strong CAGE signal for Nanog are important observations and will expand discussion of these points in the revised manuscript.
  
  (6) Are there predicted polyadenylation signals near the end of the CDS that would generate a short 3'-UTR, and are these signals conserved across mammals?
  
  Computational analysis of the mouse Nanog 3'UTR identifies a single canonical PAS (AATAAA) at position 1074, located at the 3’ end of the annotated 3’UTR and this terminal PAS is conserved across mammals. These analyses will be included as a supplementary figure and discussed further in the revised manuscript section addressing Nanog transcript biogenesis.
  
  (7) It would help to see a zoomed-in view of the region targeted by one of the guide RNAs in the 3'-UTR, and where that site is relative to the polyadenylation signal. Is the polyadenylation signal upstream, i.e., CDS proximal?
  
  This will be provided in the revised manuscript (New Figure 2-figure supplement 1C,i) Two guide RNAs were used to generate the Nanog 3’UTR deletions. The downstream guide is upstream of the terminal polyadenylation signal at nt 1074 to preserve polyadenylation of the remaining Nanog CDS containing transcript.
  
  Consistent with this, all Nanog 3’UTR knockout lines retain normal Nanog protein levels. The revised manuscript will include supplementary schematics showing guide RNA positions relative to the CDS, 3’UTR probes, and terminal PAS.
  
  (8) A final note, the use of green and red together will be challenging for those who are colorblind. Providing a different false color palette would be helpful.
  
  We appreciate this attention to accessibly. The red/green color combination was chosen to provide the highest contrast between CDS and 3’UTR signals in the in situ hybridization experiments, which is important for visualizing their differential spatial localization. We will ensure that figure legends clearly indicate channel assignments throughout the manuscript.
  
  I am refraining from comments on the cell biology and morphological insights, as they are remote from my core expertise.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This manuscript shows that the coding sequence (CDS) and 3' untranslated region (3'UTR) of mRNA transcripts from the Nanog gene have distinct expression patterns and functions. In both human and mouse embryonic stem cells colonies and blastocysts, these domains are spatially segregated, with 3'UTR-enriched cells occupying the borders and CDS-enriched cells residing in the interior. CDS mRNA expression is correlated with the expected regulation of transcription and epigenetics associated with the Nanog protein. Interestingly, expression of the 3'UTR appears to play an independent role in cell behavior and colony morphogenesis. Indeed, deletion of the 3'UTR causes specific defects in cell spreading and protrusive activity, with alteration in the localization of adhesion and cytoskeleton-associated proteins. Remarkably, a large proportion of those defects are rescued upon ROCK inhibition. Deletion of either Nanog CDS or 3'UTR leads to distinct modifications in the differentiation competence.
  
  Strengths:
  
  The independent role of 3'UTR mRNA domains, although identified in neurosciences a couple of years ago, is a novel and exciting field relatively unexplored in early development.
  
  The manuscript offers a multilayer series of experiments, in ES cells colony, blastocysts, and embryoid bodies, including imaging, -omics, genetic and pharmacological challenges, and differentiation experiments, thereby unveiling very convincingly the role of Nanog 3'UTR in morphogenesis.
  
  Weaknesses:
  
  The pathways leading to the generation of those distinct transcript domains are unknown. Although the functional differential roles are well demonstrated whether the expression patterns are a cause or a consequence of the cells' localization in the embryo remains to be explored.
  
  We thank the reviewer for these thoughtful comments and for recognizing the potential significance of independent 3’UTR functions in early developmental systems.
  
  Regarding the mechanisms underlying generation of distinct CDS and 3’UTR transcript domains, the revised manuscript will include new supplementary analyses and schematic models addressing possible Nanog transcript processing pathways, as outlined above.
  
  We agree that the relation between spatial location and Nanog 3’UTR expression is an important question. Specifically, it remains unclear whether cells first acquire high Nanog 3’UTR expression and subsequently localize to the colony border or whether border position itself promotes high Nanog 3’UTR expression.
  
  Our current data suggest that both processes may contribute. Deletion of the Nanog 3’UTR does not prevent colonies from establishing border/interior pattern, indicating that high Nanog 3’UTR is not strictly required for border pattern itself. At the same time, Nanog 3’UTR overexpression and rescue experiments increased the likelihood of border localization, suggesting that elevated Nanog 3’UTR expression promotes behaviors associated with border occupancy.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  In this manuscript, Yang et al reported distinct functions of the protein-coding sequence (CDS) and the 3' untranslated region (UTR) in the Nanog mRNA in pluripotent stem cells. They first observed different localization patterns for the CDS and 3' UTR in embryonic stem cells and in blastocyst embryos, and this pattern correlates with cell populations in different pluripotent states based on single-cell sequencing data. To characterize the potentially distinct functions of these regions, the authors generated knockout (KO) cell lines in which either the CDS or the 3' UTR was genetically ablated. These deletions led to different phenotypes in multiple assays. These results provided evidence that the CDS and 3' UTR of an mRNA could have distinct functions. Although these results are potentially interesting, several questions need to be addressed before the validity of their conclusion can be confirmed.
  
  Strengths:
  
  This study provides evidence for distinct functions of the protein-coding sequence and 3' untranslated region of an mRNA in pluripotent stem cells. The concept could be more broadly applied.
  
  Weaknesses:
  
  The initial observation (distinct localization of CDS and 3' UTRs) and the causal relationship between the KO and phenotype need further validation.
  
  Major points:
  
  (1) The authors showed distinct localization patterns of the CDS and 3' UTRs in human and mouse ESCs and blastocysts, and the overlap between their signals was minimal (Figure 1). Does this mean that the CDS and 3' UTR RNAs exist separately? For example, in cells that only showed signals for 3' UTRs, do these RNAs only contain 3' UTRs and lack CDS? Was this confirmed by RNA-seq experiments? If so, how are they generated (i.e., by transcription from a novel promoter or partial degradation of the full-length mRNAs)? This is a key question. Without a clear characterization of these RNAs, the rest of the study cannot be substantiated.
  
  We thank the reviewer for raising this important question, which overlaps substantially with several key points raised by Reviewer #1 concerning the molecular nature and characterization of the Nanog CDS and 3’UTR species.
  
  Colony border cells exhibit strong Nanog 3’UTR signal with minimal detectable CDS signal, while adjacent interior cells show the reciprocal pattern. These observations strongly suggest the existence of distinct Nanog transcript species rather than exclusively full-length transcripts containing stoichiometric amounts of both CDS and 3’UTR sequence.
  
  This conclusion is independently supported by full-length Smart-seq2 scRNA seq datasets from both mouse and human ESCs, which provide transcript coverage across both CDS and 3’UTR regions.
  
  (2) To confirm that the phenotypes of CDS or 3' UTR KO cells were caused by the deleted regions instead of other artifacts, rescue experiments should be performed.
  
  Rescue experiments were included in the original submission (Fig. 4). The revised manuscript will expand these analyses to include cell spreading. We will also include additional ROCK pathway modulation experiments.
  
  (3) As over-expression of the 3' UTR showed a phenotype, important regions within it should be identified, and also the possibility that the 3' UTR contains open reading frame(s) and is translated should be tested.
  
  The revised manuscript will also include supplementary computational analyses of the Nanog 3’UTR, including open reading frame prediction, Kozak scoring, and evolutionary conservation analysis. (New Figure 2-figure supplement 1B). These analyses identify no evidence for strongly supported coding potential within the 3’UTR. Further, isolated Nanog 3’UTR transcripts are largely confined to the nucleus, making active translation unlikely.
  
  The revised manuscript will include new supplementary analyses addressing Nanog transcript structure and possible biogenesis mechanisms (New Figure 2-figure supplement 1C).
  
  References:
  
  ViennaRNA/RNA fold – Lorenz et al 2011 Algorithms Mol Biol 6:26- RNA Secondary Structure stem loop, minimum free energy (MFE) prediction
  
  NCBI BLASTP- Altschul et al (1990) J Mol Biol 215:403- ORF conservation, protein sequence similarity search
  
  NCBI Entrez/Biohthon- Cock et al (2009) Bioinformatics 25:1422- sequence retrieval
  
  PhastCons/UCSC multiz alignments- Siepel et al (2005) Genome Res 15:1034- evolutionary conservation scoring
  
  UCSC Genome Browser- Kent et al. (2002) Genome Res 12:996-1006- conservation track access
  
  Eaton et al (2020) Mol Cell 78:439- Stall model
  
  Brannan et al (2012) Genes Dev 26:2621-Stall model
  
  Addition to Methods.
  
  ORFs (≥10 amino acids) were identified in all three forward frames according to Kozak (1987). Evolutionary conservation was assessed by BLASTP (Altschul et al., 1990) against RefSeq proteins. Poly(A) signals were identified by pattern matching for canonical and non-canonical hexamers. Conserved sequence blocks were obtained from UCSC PhastCons tracks (Siepel et al., 2005). RNA secondary structures were predicted using ViennaRNA RNAfold (Lorenz et al., 2011) with a sliding 80-nt window. The stall model for isolated transcript generation follows Eaton et al. (2020).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.03.15.484504v2
www.biorxiv.org www.biorxiv.org

Integrating computational protein structure predictions and genetic dependencies to discover functional multi-protein complexes

2
1. EMBOpress 20 May 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  We thank the reviewers and we are glad that they acknowledge this work to be a timely contribution to a quickly moving field and a valuable tool to generate testable hypothesis. We are pleased that reviewer #2 highlights that “a major strength is the combination of orthogonal evidence types” and that the tool serves to generate novel hypothesis. The revised manuscript will sharpen the positioning of the study within this context. Additional experimental evidence will be provided to address the points raised by reviewers #1 and #3.
  
  Reviewer #1* 1.The authors do not co-IP ARF1. This does not surprise me as small GTPases often hydrolyse their GTP during lysis. *
  
  We agree that this is likely due to transient association and GTP hydrolysis during lysis and will add a section to the manuscript.
  
  There have been a number of ARF1 bioID screens done- have the authors checked if their complex has turned up here?
  
  We will include this in the revised manuscript.
  
  I am a bit confused by some of the interpretation about KO and loss of JTB staining. They interpret: "The SYS1 acts as a Golgi recruitment factor for both ARFRP1 and JTB". The ARFRP1 has been published and is a cytosolic protein, so that makes sense. However, the JTB is not cytosolic by a membrane protein, so cannot be "recruited". Now maybe it is retained in the Golgi by this interaction, but if that is the case you would still expect signal on another organelle or the plasma membrane (and we see it isnt degraded in the lysosome due to the western blot). I am confused by the authors model here.
  
  We will clarify the phrasing and will provide a clearer interpretation, also considering the other improved imaging experiments that will be included in the revised manuscript.
  
  4.The authors validate their JTB antibody and confirm the fact that there are not reduced SYS1 levels in the JTBKO- this is very clear (albeit unquantified). What I do not see validated is the SYS1KO. I think this is quite important.
  
  We will validate SYS1 KO using TIDE and/or western blotting.
  
  5.The colocalisation in panel 3D is weak and unclear to me. It is not quantified. It is not clear if there have been 3 repeats.
  
  The revised manuscript will include improved imaging data. We will repeat relevant experiments, include appropriate controls and quantify where necessary.
  
  6.The imaging in figure 3 is not clear in places, and it stands out in a very clear manuscript. I cannot see the JTB in panel F. There are no scale bars. The dynamic range of the image is not utalised. I do not see the stain in the JTB in either of the sys1 KO, i do not see the SYS1-FLAG staining in the complement, and it is not quantified at all. It may all seem trivial, but (to me) this is an absolutely critical bit of biology data to support the informatics.
  
  The revised manuscript will include improved imaging data. We will repeat relevant experiments, include appropriate controls and quantify where necessary.
  
  7.I am a bit unconvinced by the interpretation of it being a retrograde trafficking complex. This is for 2 key reasons- 1) the VSV-G is antrograde (despite unusually they interpret a "severe defect in retrograde transport"). 2) Even if it was only having an effect in the retrograde direction I would still remain a little open minded about it as you can easily mistake trafficking of a protein in one direction for another if an unknown protein (SNARE for example) has defective trafficking.
  
  We used VSVG-KDEL in this assay. This setup specifically measures retrograde trafficking. We will clarify this in the revised manuscript. We will clarify in the Discussion that we confirmed a role in retrograde trafficking but cannot exclude a role in anterograde trafficking
  
  Reviewer #2
  
  Major comment: scope and interpretation of DepMap-derived functional evidence The manuscript could benefit from more clearly defining the scope of the functional evidence used to nominate complexes. The central co-dependency signal is derived from DepMap 24Q2 CRISPR gene-effect profiles, which are primarily cancer cell-line fitness/proliferation data. This is an important limitation because the resulting correlations may preferentially capture complexes or pathways that influence viability in proliferating cancer cells, while missing complexes active in differentiated, tissue-specific, stimulus-dependent, or non-proliferative contexts. Conversely, some correlations may reflect shared cancer-lineage or fitness dependencies rather than direct participation in a stable complex. The authors are appropriately cautious in stating that DepCom is not a complete inventory of human protein complexes, but the title, framing, and resource description could still be read as implying a more general catalogue of functional protein complexes. The authors might consider adding a clearer introduction to DepMap and explicitly discuss how the cancer-cell-line origin of the data affects interpretation of the 518 predicted complexes. This could be addressed without new experiments, for example by adding text early in the Results section explaining what the CRISPR gene-effect scores measure, and by expanding the Discussion to clarify that DepCom represents structurally plausible complexes prioritized by co-dependency across cancer cell lines, rather than an unbiased or context-independent map of human protein complexes. The selection of highlighted examples would also benefit from clearer justification. The peroxisome, actin, WNK/TSC22D2, and Golgi/JASS examples are biologically interesting, but the rationale for choosing them is not always explicit. Were they selected because they were novel, high-confidence, disease-associated, experimentally tractable, or representative of different resource categories? Briefly stating the selection criteria would help readers understand whether these examples are illustrative case studies or representative outcomes of the pipeline.
  
  We agree with the reviewers' assessment that this resource should be viewed as hypothesis-generating and that the overall framing should be improved. We will revise the manuscript at the appropriate sections, according to the more detailed comments of all reviewers.
  
  Minor comments
  
  Clarify post-clustering removal of large/problematic protein families and complexes. In the Methods, the authors state that "clusters of histones and keratin clusters, as well as the mito-ribosome, complexes of the electron transport chain and the mediator complex" were removed because of their large sizes. This filtering step would benefit from additional detail. Please specify the criteria used to define these removed clusters, how many clusters/proteins were removed at this stage, and whether removal was based only on size or also on biological/manual curation. It would also be helpful to explain why these proteins or clusters were removed after clustering rather than excluded before graph construction and clustering, since highly connected or compositionally biased protein families could potentially influence neighboring cluster assignments. If available, a brief robustness check showing that pre-removal of these proteins gives similar candidate complexes would strengthen confidence in the clustering procedure.
  
  We will add the requested information to the relevant section. Alongside the manuscript we will also provide lists of the complexes before and after every filtering step
  
  Clarify the rationale for excluding complexes larger than 5000 residues. The 5000-residue cutoff is understandable for AF3 computational cost, but the manuscript should briefly state how many candidate complexes were excluded by this cutoff and whether this preferentially removes known large assemblies. This would help readers understand the scope of complexes that DepCom is expected to miss.
  
  Alongside the manuscript we will now also provide lists of the complexes before and after every filtering step.
  
  Improve wording in the CAP1/CFL1/WDR1/ACTB example. The sentence "Additionally, CAP1 works in concert with CFL1 to accelerate depolymerisation, though if a four-protein complex consisting of actin, WDR1, CAP1 and CFL1 is relevant is not clear" is difficult to parse. Possible revision might be something like: "Additionally, CAP1 works in concert with CFL1 to accelerate depolymerisation, although it remains unclear whether actin, WDR1, CAP1 and CFL1 form a stable four-protein complex in cells." This more clearly separates known biology from the speculative interpretation of the DepCom prediction.
  
  Wording will be improved.
  
  Improve reproducibility details for AF3 predictions. The Methods state that predictions were run using a local AF3 installation, but reproducibility would be improved by reporting relevant AF3 settings, number of seeds/models per complex, whether templates were used, how disordered regions were handled, and whether predictions were repeated for all complexes or only selected examples. This is especially important because the manuscript notes that multiple predictions can yield different subunit arrangements.
  
  We will provide detailed settings in the methods section. Regarding disordered parts: All predictions used full length sequences (canonical UNIPROT ID) for each protein, so disordered residues are included. If disordered regions have low PLDDT and poor PAE, these regions will simply not score as interfaces in AlphaBridge. The one exception where we did crop structures is Figure 2D, but purely for visualization purposes, the full length complex did score in the pipeline (uncropped).
  
  Reviewer #3
  
  Co-essentiality is not the same as physical complex membership. This is the biggest conceptual concern. Genes in the same pathway are co-essential whether or not their products bind. The authors lean on the structural prediction step to filter this out, but that means the entire pipeline rests on AF3+AlphaBridge being correct about who interacts with whom. There is no independent benchmarking shown of how often AlphaBridge calls a true positive vs a false positive at the chosen 0.5 cutoff. Why 0.5? Where does that number come from? A short benchmarking section using known complexes (CORUM 5.0, hu.MAP 2.0, the PDB) would make the choice defensible. Right now it reads as arbitrary.
  
  We thank the reviewer for bringing up the need for such an important clarification. We fully agree that co-essentiality does not equal physical interaction and structure predictions are imperfect. This is precisely the logic underlying our pipeline design, not a limitation we overlooked. The two data sources are used sequentially and serve distinct roles: first, we construct protein sets that are connected through networks of predicted binary physical interactions; then we cluster these based on DepMap correlations, selecting likely physical complexes that display co-essentiality between their components.
  
  In other words, clustering on DepMap data alone would certainly return many spurious correlations: as the referee points out “Co-essentiality is not the same as physical complex membership”. Anchoring the search space with structural predictions substantially reduces this noise. Neither data source alone is sufficient, nor do we claim either is definitively "correct": the value lies in their combination. We hope improved phrasing in the revised manuscript will highlight this better.
  
  Regarding benchmarking AlphaBridge score: we have benchmarked AlphaBridge, in response to reviewer feedback on the original AlphaBridge paper (Structure, Cell Press). In the figure here it is clear that in our benchmark of PDB structures (with
  
  Comparison to existing resources is incomplete. I can't help but wonder what was found here that would not have been possible by analysing existing resources. CORUM 5.0 (7,193 mammalian complexes, ~71% human-derived; Tsitsiridis et al. 2024 NAR), hu.MAP 2.0 (Drew et al. 2021, ~6,965 complexes from >15,000 MS experiments), BioPlex 3.0 (Huttlin et al. 2021, 118,162 interactions in HEK293T), ad the Complex Portal already cover a large fraction of the human complexome. The authors compare to PDB, the original interactome paper, and Complex Portal, but they explicitly skip CORUM and hu.MAP, both of which are central reference resources in this space. Without including these, the "60 complexes unique to DepCom" number is not really meaningful. This needs to be redone properly.
  
  We will add the comparison with Corum and hu-MAP in the revision.
  
  Validation rate is one out of 518. The JASS work is solid, but a single experimentally validated complex out of 518 gives the reader essentially no estimate of how often the rest of the predictions are correct. Even a smaller systematic effort, say IP-MS on five to ten predicted novel complexes in the same cell line, would do an enormous amount to establish how trustworthy the resource is. The authors already have the V5/IP-MS pipeline running. Right now the manuscript implicitly asks the reader to trust 517 predictions on the strength of one validation.
  
  In this paper we validated one out of the 60 complexes we claim are new. Notably we provide new biological data and demonstrate how consulting our resource, or following the same logic of combining functional and structural information, can lead to new exciting discoveries. We note that out of the 518 complexes we list, 69 complexes are exactly mirrored in the PDB and/or Complex Portal, while for another 389 there is partial evidence. Thus, our dataset is amply validated, and at the same time contains data to enable new discoveries. We also note, that following the release of our resource eight months ago, a new high-impact publication “validated” a complex we have independently picked in DepMap (Oosterheert et al, Choreography of rapid actin filament by coronin, cofilin and AIP1, Cell, 2025). We will rephrase relevant sections (also in response to reviewer 2) to increase clarity about validation.
  
  The functional and disease clustering is potentially circular. GO terms and STRING associations are themselves derived in large part from the published literature on protein function, including text mining channels in STRING, much of which is downstream of complex membership. Of course complexes cluster into "DNA repair" and "vesicle trafficking" if you cluster on GO and STRING. The same applies to Open Targets, which integrates GWAS Catalog, ClinVar, literature mining, and other sources. The clustering is fine as a navigation aid for the website, but it is not, as currently presented, an independent validation of anything. I would tone the discussion down accordingly.
  
  We did not mean to present the clustering as an independent validation. We will tone down the discussion accordingly.
  
  AF3 limitations on this class of problem. AF3 itself acknowledges limitations (Abramson et al. 2024, including the December 2024 addendum), and subsequent benchmarking has flagged disordered regions, dynamic/large assemblies, and certain transmembrane systems as known weak points. The JASS complex is largely transmembrane, the WNK1-TSC22D2 example involves disorder-to-order transitions, and several flagship examples involve large multi-domain proteins. The authors acknowledge some of this in passing but should state explicitly which complexes were trimmed, how the trimming choices were made, and whether predictions were repeated with different seeds to check stability. Figure S4 is a good start, but for a resource paper a more systematic seed-stability analysis is warranted.
  
  No complexes were trimmed for the initial AF3 predictions. The WNK1-TSC22D2 example was trimmed and re-predicted only for visualization purposes. We apologize for the misunderstanding and will state this more clearly.
  
  AF3 certainly has limitations. Regarding disordered regions, these will almost always be assigned a poor pLDDT (also if AF3 wrongly folds them into helices). AlphaBridge will not pickup these low pLDDT regions as interfaces. Regarding dynamic assemblies, these might again lead to poor confidence scores and consequently these will not be picked up as interfaces by AlphaBridge. If AF3 confidence metrics are analyzed properly, the main concern for both disordered regions and dynamic assemblies is to miss true positive interactions, rather than finding false positive. As we did not aim to identify all possible human complexes, we consider focusing on the most confidently predicted interactions to be a fair trade off.
  
  While the JASS complex is indeed a membrane protein complex, the predictions are exceptionally confident across multiple seeds (we can provide predictions from multiple seeds for revision), and validates experimentally. Of course, structure predictions are no substitute for experimental structures, as cautioned multiple times throughout the manuscript.
  
  Figure S4 shows that despite the complex overall geometry being flexible, the interaction sites are predicted with high confidence across different poses. Since the aim of this study was to identify proteins interacting with each other, not accurate structures (which need to be solved experimentally), we argue that recomputing all structures with multiple seeds is disproportionately expensive computationally and would delay publication of a timely study while adding little.
  
  Statistics are thin in several places. On the Fisher exact test for Golgi/ER enrichment in V5-JTB IP-MS (Supplemental Table 1), an odds ratio of 2.77 is modest, and there is no comparison to a matched control IP. Is this more than expected by chance against an appropriate background? The IP-MS volcano plots show many significant proteins, but how was the background controlled? On the LLM section, no quantitative evaluation is presented at all and the assessment is admitted to be subjective.
  
  We will qualify the conclusions drawn from the IP-MS experiments. We maintain that together with the additional cell biology data, we build a compelling and convincing picture for this JASS complex.
  
  Experimentally, the background is controlled by measuring enrichment over WT cell lines that have undergone the same IP procedure as the V5-SYS1/JTB expressing cells (lysis, incubation with the anti-V5 conjugated beads, same wash procedure and sample processing), as is the standard in the field. We will clarify in the Methods section. Regarding identification, FDR rate was set to 1% at protein and peptide level and peptide spectrum matches (PSMs) were additionally filtered for SequestHT Xcorr score >1.
  
  We agree with the referee that the LLM interpretation is subjective and cannot be benchmarked. We suggest revising the resource and the paper, only providing structured LLM prompts to facilitate users asking the right questions, but we will not provide the LLM answers as part of the resource.
  
  The 4ï¿½ACTB speculation. The authors themselves note the AlphaBridge score declines from 0.9 (1ï¿½ACTB) to 0.78 (4ï¿½ACTB), yet they speculate about functional implications. This is exactly the kind of post-hoc rationalisation around weak evidence that should either be supported with experiment or removed. Either remove or qualify as speculative.
  
  We will qualify this as speculative
  
  The LLM-assisted analysis. I am genuinely uncomfortable with releasing 76 LLM-generated complex annotations as part of a published resource when the authors openly state these have "not been systematically validated". Putting these summaries on a website with the imprimatur of a peer-reviewed paper will lead to them being cited and reused. At minimum, the website needs prominent warnings on every page where an LLM summary appears, the prompts must be fully reproducible (not just downloadable as JSON), and a small validation table, say 10 complexes scored by a domain expert for accuracy of each claim, should be included as a supplemental figure. As it stands this section reads like an enthusiastic add-on that has not been thought through with the same care as the rest of the work.
  
  We thank the referee for bringing forward this consideration. We agree to remove the LLM answers for the 78 complexes from the manuscript and from the website, to ensure that the outputs cannot be cited. We will provide two different objective structure prompts for download to encourage variety in responses for curious users who want to explore. We will add a prominent disclaimer noting that responses resulting from these prompts cannot be interpreted as facts without validation.
  
  We cannot guarantee reproducibility with modern LLM inference architecture. Even if seeds are kept the same and temperature=0, floating-point non-determinism in GPU operations, distributed inference, and batch effects may lead to different results. Furthermore, models go through many different iterations rapidly. As a consequence, it is impossible for us to guarantee reproducibility
  
  Cutoffs and cluster numbers need stability analysis. The cutoff for the 75th-percentile DepMap correlation (mean of random + 3 SD = 0.147) is reasonable but should be accompanied by an FDR or precision/recall estimate against a labelled reference set. The choice of 20 final clusters in functional clustering (because that gave a peak in silhouette score) and 14 for disease clustering should also be supported by stability analysis, e.g. resampling.
  
  The 75th percentile cutoff is, in our opinion, well justified and sufficient for our purposes. FDR and precision recall need a set of true and false positives. The DepMap correlation clusters are an intermediate step in our pipeline and do not necessarily hold the final complexes. How can intermediate reference DepMap clusters be constructed and defined as true or false positives? Even if we would score clusters that contain a known complex as true positives, how to define false positives? If clusters do not contain a known complex, that does not necessarily mean that these proteins don’t interact, just that they have not been shown to interact yet.
  
  We will run resampling to improve confidence in the choice of cluster number.
  
  Internal numerical consistency. The bioRxiv preprint abstract refers to 354 high-confidence multi-protein complexes, while the body of the manuscript discusses 518 (224 dimers + 294 multimers). The relationship between these numbers should be stated explicitly. Likewise, the breakdown of "60 unique to DepCom" into 41 heterodimers + 19 multimeric should be reconcilable in the figures and tables. The number "9,764 unique seed proteins" should also be clarified to confirm it is the DepCom-internal seed set and not inherited from the Zhang et al. coverage or hu.MAP 2.0 (9,963 proteins). These are easy fixes but matter for a resource paper.
  
  BioRxiv preprint: The preprint that the reviewer read is an older version, which will be updated. .
  
  The 9,764 unique seed proteins is from the Zhang et al paper, and are the human proteins identified to confidently interact with at least one other human protein. We will make this more clear.
  
  Mander's overlap coefficient. The VSV-G(ts045)-KDELR retrograde-transport assay is well established and the experiment is clean, but MOC has been increasingly criticised in the colocalisation literature (Adler & Parmryd 2010, 2021). Best practice is to also report Manders' M1/M2 coefficients or Pearson's correlation alongside MOC. Adding these would be straightforward and would strengthen Fig 4B.
  
  We will improve co-localization measures where appropriate.
  
  Minor comments 1. Page 4: "candidate sets of potential multi-protein complex members". Pick one, they are either candidates or potential, not both.
  
  Will be addressed.
  
  Page 7: "Complex 294... mechanistic basis for CFL1 and WDR1 cooperation has only recently been described". Please update the reference list and language given how recent this is.
  
  Will be addressed.
  
  Page 7: JTB is described as "poorly characterised". This is a bit too strong. JTB has been studied in the context of TGF-β-induced mitochondrial regulation (Kanome et al. 2007), cytokinesis and chromosomal passenger complex association (Platica et al. 2011), the structural characterisation of its extracellular domain (Rousseau et al. 2012), and breast cancer biomarker work (Jayathirtha et al. 2022). A more accurate framing would be "incompletely characterised, with previously reported but functionally unresolved roles". The novelty here is the Golgi connection, which is genuine.
  
  We will rephrase.
  
  Page 8: the citation of Blomen et al. 2015 Science for "Golgi-related synthetic lethality" should be checked against the actual supplementary data of that paper to confirm the JTB attribution is correct.
  
  Will be check.
  
  Figure 1: as in many omics papers, please think of us colourblind readers. The pink-green DepMap correlation scale will be hard for some of us.
  
  The color scheme in use, alongside others, was tested with two colleagues that have different variants of colour blindness and was judged to be the best compromise.
  
  Figure 5A and 5B: 21 and 14 colour-coded clusters respectively in a single UMAP is too much. Consider splitting into separate panels by broad theme or providing an interactive version only.
  
  We will focus on a subsection, and provide the full interactive version on the homepage
  
  Page 11: "manually evaluated the quality of outputs". By whom, blinded to which model produced which output? Methods are silent on this.
  
  As stated above, we will remove the LLM part
  
  Some figures show "hairballs" with very limited informative content. Fig. 1B left panel and the AlphaBridge wheel plots in particular convey relatively little at the size shown.
  
  We will try and find a way to draw the AlphaBridge circular plots in better resolution; we do not however that the reviewer’s observation might be an artefact of the PDF file distributed to reviewers.
  
  The reference list looks a bit thin on prior systematic complexome efforts. BioPlex 3.0 (Huttlin et al. 2021 Cell), hu.MAP 2.0 (Drew et al. 2021 MSB) and CORUM 5.0 (Tsitsiridis et al. 2024 NAR) should all be cited and discussed.
  
  We will include the additional references where appropriate
  
  The discussion section drifts into general comments about AI in science that don't add much. I would cut about a third of it and use the space for a more careful framing of the actual contribution.
  
  We will shorten the discussion section and phrase more carefully.
  
  General assessment Reviewer #3: The strongest aspect of this study is the JASS complex story. The IP-MS, the SYS1-KO rescue experiment, the VSV-G(ts045)-KDELR transport assay, and the orthogonal CRISPR screens with diphtheria and Pseudomonas exotoxins together build a convincing case for JTB as a regulator of Golgi-to-ER retrograde trafficking. This part of the paper is genuinely nice work and would stand on its own. The pipeline itself, combining structural predictions with functional dependency data and filtering with AlphaBridge, is sensible and timely. It is a reasonable demonstration of how confidence filtering should be done at this kind of scale. The main limitations concern the resource framing. After reading the manuscript several times I am still trying to identify the central novel contribution beyond the JASS validation. The interactome predictions are taken from Zhang et al., DepMap is public, AF3 is public, AlphaBridge is the authors' own previously published tool, and GO/STRING/Open Targets/dbPTM are all public. The manuscript is essentially an integrative pipeline plus a website plus one experimentally followed-up complex. The framing oversells what is genuinely new. The authors' own comparison (Fig. S3) shows 60 complexes "unique to DepCom" out of 518, of which 41 are heterodimers and only 19 are multimeric. Nineteen genuinely novel multi-protein complexes is still a contribution but it is a long way from the 354/518 that the abstract and discussion implicitly emphasise. The validation rate (one of 518) and the missing comparisons to CORUM 5.0 and hu.MAP 2.0 are the two issues that most need addressing.
  
  We will rephrase these issue to adjust the framing. We would put forward that the main contribution of this manuscript is to present an integrative framework that combines data from orthogonal sources to highlight the possibility of structure prediction models to serve as a discovery tool. The reviewer identifies correctly (albeit derogatorily) that this is “essentially” an integrative pipeline. But it is an integrative pipeline that combines genetics and computational structure predictions in a novel (to the best of our knowledge) way and surfaces interesting new biology. The biology of the JASS complex goes well-beyond simple validation experiments, and we believe its discovery (based on our data) carries more value that the reviewer attributes to it.
  
  PeerReviewed
2. EMBOpress 20 May 2026
  
  in Review Commons
  
  Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Referee #1
  
  Evidence, reproducibility and clarity
  
  Characterising protein complexes is a fundamental goal in modern molecular cell biology. Here, Uckelmann and colleagues have presented a solution to part of this problem. By combining functional clustering with alphafold modelling, they present a high throughput bioinformatic solution. The paper and figures are exceptionally clear and well presented. The conclusions are reasonable, and the data interesting. I am a cell biologist with expertise in molecular machinery of trafficking, so the focus of my review will be on the identification of a new complex, that is proposed to have a role in retrograde trafficking. On the whole I find this a interesting and convincing finding. However I have some comments and questions that I hope may help the authors. I will naturally focus my comments on the cell biology.
  
  1.The authors do not co-IP ARF1. This does not surprise me as small GTPases often hydrolyse their GTP during lysis. 2.There have been a number of ARF1 bioID screens done- have the authors checked if their complex has turned up here? 3.I am a bit confused by some of the interpretation about KO and loss of JTB staining. They interpret: "The SYS1 acts as a Golgi recruitment factor for both ARFRP1 and JTB". The ARFRP1 has been published and is a cytosolic protein, so that makes sense. However, the JTB is not cytosolic by a membrane protein, so cannot be "recruited". Now maybe it is retained in the Golgi by this interaction, but if that is the case you would still expect signal on another organelle or the plasma membrane (and we see it isnt degraded in the lysosome due to the western blot). I am confused by the authors model here. 4.The authors validate their JTB antibody and confirm the fact that there are not reduced SYS1 levels in the JTBKO- this is very clear (albeit unquantified). What I do not see validated is the SYS1KO. I think this is quite important. 5.The colocalisation in panel 3D is weak and unclear to me. It is not quantified. It is not clear if there have been 3 repeats. 6.The imaging in figure 3 is not clear in places, and it stands out in a very clear manuscript. I cannot see the JTB in panel F. There are no scale bars. The dynamic range of the image is not utalised. I do not see the stain in the JTB in either of the sys1 KO, i do not see the SYS1-FLAG staining in the complement, and it is not quantified at all. It may all seem trivial, but (to me) this is an absolutely critical bit of biology data to support the informatics. 7.I am a bit unconvinced by the interpretation of it being a retrograde trafficking complex. This is for 2 key reasons- 1) the VSV-G is antrograde (despite unusually they interpret a "severe defect in retrograde transport"). 2) Even if it was only having an effect in the retrograde direction I would still remain a little open minded about it as you can easily mistake trafficking of a protein in one direction for another if an unknown protein (SNARE for example) has defective trafficking.
  
  Significance
  
  Characterising protein complexes is a fundamental goal in modern molecular cell biology. Here, Uckelmann and colleagues have presented a solution to part of this problem. By combining functional clustering with alphafold modelling, they present a high throughput bioinformatic solution. The paper and figures are exceptionally clear and well presented. The conclusions are reasonable, and the data interesting. I am a cell biologist with expertise in molecular machinery of trafficking, so the focus of my review will be on the identification of a new complex, that is proposed to have a role in retrograde trafficking. On the whole I find this a interesting and convincing finding.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2025.09.09.675133
www.biorxiv.org www.biorxiv.org

Retrosplenial cortex enables context-dependent goal-directed sensorimotor transformation

1
1. Public_Reviews 19 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary
  
  The strength of this manuscript lies in the behavior: mice use a continuous auditory background (pink vs brown noise) to set a rule for interpreting an identical single-whisker deflection (lick in W+ and withhold in W− contexts) while always licking to a brief 10 kHz tone. Behaviorally, animals acquire the rule and switch rapidly at block transitions and take a few trials to fully integrate the context cue. What's nice about this behavior is the separate auditory cue, which shows the animals remain engaged in the task, so it's not just that the mice check out (i.e., become disengaged in the W- context). The authors then use optical tools, combining cortexwide optogenetic inactivation (using localized inhibition in a grid-like fashion) with widefield calcium imaging to map what regions are necessary for the task and what the local and global dynamics are. Classic whisker sensorimotor nodes (wS1/wS2/wM/ALM) behave as expected with silencing reducing whisker-evoked licking. Retrosplenial cortex (RSC) emerges as a somewhat unexpected, context-specific node: silencing RSC (and tjS1) increases licking selectively in W−, arguing that these regions contribute to applying the "don't lick" policy in that context. I say somewhat because work from the Delamater group points to this possibility, albeit in a Pavlovian conditioning task and without neural data. I would still recommend the authors of the current manuscript review that work to see whether there is a relevant framework or concept (Castiello, Zhang, Delamater, 'The retrosplenial cortex as a possible 'sensory integration' area: a neural network modeling approach of the differential outcomes effect of negative patterning', 2021, Neurobiology of Learning and Memory).
  
  The widefield imaging shows that RSC is the earliest dorsal cortical area to show W+ vs W− divergence after the whisker stimulus, preceding whisker motor cortex, consistent with RSC injecting context into the sensorimotor flow. A "Context Off" control (continuous white noise; same block structure) impairs context discrimination, indicating the continuous background is actually used to set the rule (an important addition!) Pre-stimulus functional-connectivity analyses suggest that there is some activity correlation that maps to the context presumably due to the continuous background auditory context. Simultaneous opto+imaging projects perturbations into a low-dimensional subspace that separates lick vs no-lick trajectories in an interpretable way.
  
  In my view, this is a clear, rigorous systems-level study that identifies an important role for RSC in context-dependent sensorimotor transformation, thereby expanding RSC's involvement beyond navigation/memory into active sensing and action selection. The behavioral paradigm is thoughtfully designed, the claims related to the imaging are well defended, and the causal mapping is strong. I have a few suggestions for clarity that may require a bit of data analysis. I also outline one key limitation that should be discussed, but is likely beyond the scope of this manuscript.
  
  Major strengths
  
  (1) The task is a major strength. It asks the animal to generate differential motor output to the same sensory stimulus, does so in a block-based manner, and the Context-Off condition convincingly shows that the continuous contextual cue is necessary. The auditory tone control ensures this is more than a 'motivational' context but is decision-related. In fact, the slightly higher bias to lick on the catch trials in the W+ context is further evidence for this.
  
  (2) The dorsal-cortex optogenetic grid avoids a 'look-where-we-expect' approach and lets RSC fall out as a key node. The authors then follow this up with pharmacology and latency analyses to rule out simple motor confounds. Overall, this is rigorous and thoughtfully done.
  
  (3) While the mesoscale imaging doesn't allow for cellular resolution, it allows for mapping of the flow of information. It places RSC early in the context-specific divergence after whisker onset, a valuable piece that complements prior work.
  
  (4) The baseline (pre-stim) functional connectivity and the opto-perturbation projections into a task subspace increase the significance of the work by moving beyond local correlates.
  
  Key limitation
  
  The current optogenetic window begins ~10 ms before the sensory cue and extends 1s after, which is ideal for perturbing within-trial dynamics but cannot isolate whether RSC is required to maintain the context-specific rule during the baseline. Because context is continuously available, it makes me wonder whether RSC is the locus maintaining or, instead, gating the context signal. The paper's results are fully consistent with that possibility, but causality in the pre-stimulus window remains an open question. (As a pointer for future work, pre-stimulusonly inactivation, silencing around block switches, or context-omission probe trials (e.g., removing the background noise unexpectedly within a W+ or W- context block), could help separate 'holding' from 'gating' of the rule. But I'm not suggesting these are needed for this manuscript, but would be interesting for future studies.)
  
  We thank the reviewer for the comprehensive summary of our work.
  
  We also thank the reviewer for highlighting the work from the Delamater group (Castiello et al., 2021), and we now briefly discuss this paper on P. 14 Lines 434-437 writing: “RSC was shown to contribute to negative patterning in behavioral tasks requiring rats to learn that the simultaneous presentation of two stimuli lead to an opposite outcome than each individual stimulus (Castiello et al., 2021).”
  
  We also agree with the reviewer’s noted ‘Key limitation’ regarding the role of RSC as either maintaining context representation or serving a gating function. The reviewer proposes an exciting set of further experiments inactivating RSC at different time points to investigate when RSC activity is needed. We hope to carry out such experiments in the future. We now include a brief discussion of this interesting point on P. 14-15 Lines 455-459 writing: “First, further inactivation experiments would shed light on the timing at which RSC activity is necessary for the integration of contextual information. Specifically, it would be of great interest to inactivate RSC at different time points such as during the intertrial interval or at the transition between contexts.”
  
  We have of course also addressed each of the more detailed comments from the “Recommendations for the authors” section, please see below.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors aim to understand the neural basis of context-dependent sensory processing and decision-making.
  
  Strengths:
  
  They used an innovative behavioral paradigm where the action-outcome association changes independent of the sensory stimulus. This theoretically allows the authors to disentangle the effect of behavioral context on sensory processing. Using this approach combined with optogenetic silencing, they discover that RSC activity is necessary for suppressing a lick response when the stimulus switches to the unrewarded context.
  
  Weaknesses:
  
  Sensory processing appears to be entangled with jaw/tongue movement initiation. Activity in M1 and RSC during auditory-evoked lick responses appears to be identical to activity during whisker-evoked lick responses, indicating that movement initiation is the main driver of M1/RSC activity, rather than changes in the flow of sensory information. If sensory information were the main driver of the initial M1/RSC response, then auditory evoked responses should have a longer latency. Perhaps this is beyond the resolution of the calcium indicator or imaging frame rate. It is not clear from the data shown if differences in S1 activity when comparing W+ and W- stimulation are caused by context-sensitive sensory processing or whisker movement following whisker deflection.
  
  We thank the reviewer for the comments on our work and we agree that separating sensory processing and movement initiation is very important. In the revised manuscript, we have carried out several new analyses to specifically address the points of the reviewer. The most important point is that context-dependent activity in RSC emerges at ~50 ms after the whisker stimulus, which precedes any differences in movements of the jaw or whisker. Although sensory and motor representations become increasingly entangled after stimulus delivery, we think that the first ~100 ms after the whisker stimulus is a relatively safe period for analysing sensory processing and decision making before overt context-dependent differences in movements.
  
  Addressing the specific point “Activity in M1 and RSC during auditory-evoked lick responses appears to be identical to activity during whisker-evoked lick responses, indicating that movement initiation is the main driver of M1/RSC activity, rather than changes in the flow of sensory information.” - We have now directly compared the pattern of cortical activity evoked by whisker and auditory stimuli in correct trials in the W+ context (new Figure 3 – figure supplement 2). As expected, activity in wS1/wS2 and A1 is stronger in whisker and auditory trials respectively, following their sensory modalities. However, we also evidence a stronger response of wM1/wM2 in whisker trials as early as 40 to 60 ms following the stimulus, showing the specificity to the whisker system. We also observe a stronger response of RSC to whisker than to auditory stimulus. The auditory and whisker evoked responses are therefore different.
  
  Addressing the specific point “If sensory information were the main driver of the initial M1/RSC response, then auditory evoked responses should have a longer latency. Perhaps this is beyond the resolution of the calcium indicator or imaging frame rate.” – As stated above, the responses to auditory and whisker stimuli are different.
  
  Addressing the specific point “It is not clear from the data shown if differences in S1 activity when comparing W+ and W- stimulation are caused by context-sensitive sensory processing or whisker movement following whisker deflection.” - We think that the data shown in Figure 3F-H indicate that differences in S1 activity when comparing W+ and W- stimulation are not directly caused by context-sensitive sensory processing. On P. 9 Lines 270273 we write: “Early after stimulus onset, whisker deflection evoked similar activation of primary and secondary whisker somatosensory cortices (wS1 and wS2) in both W+ and W− contexts.” Indeed, context separation in wS1/wS2 only emerged later than 100 ms, which is indeed confounded by the difference in movement evoked by the sensory stimulus (now quantified in new Figure 3 – figure supplement 4). On the contrary RSC and wM1/2 responses to the whisker stimulus were different in W+ and W- at early time points (~50 ms for RSC and ~80 ms for wM1/2) which is consistent with context dependent sensory processing. At least 2 hypotheses could explain the absence of early difference in whisker evoked activity in wS1/wS2 between W+ and W-. The first one is that sensory activity in wS1/wS2 is not modulated by contextual information at all, while the alternative option would imply that sensory activity is mediated by different neuronal populations depending on context with an overall similar average response. We think this is an interesting question which we hope to address in future experiments using Neuropixels recordings and multiphoton cellular imaging to address the single neuron representation of whisker stimulus in wS1/wS2 according to context in the task presented here.
  
  We have of course also addressed each of the more detailed comments from the“Recommendations for the authors” section, please see below.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Suggestions to strengthen the manuscript (no new data collection)
  
  (1) The block-switch dynamics were clearly demonstrated behaviorally. It would be very powerful to mirror this with an analysis of neural data around the block-switch: how do the various areas adjust immediately after a shift in the continuous contextual sound? Does the RSC show any evidence of changing activity patterns? How does the within-trial activity dynamic look as a function of the number of trials from the context switch? This could be done with the data collected for Figure 3 (for within-trial dynamics), but also for the pre-stimulus baseline activity data (Figure 4A-B).
  
  We thank the reviewer for raising this interesting point. We have now investigated the change of cortical activity at the transition between contexts (new Figure 3 – figure supplement 5). At the context transition, both to W+ and to W- contexts, we observed a rapid activation of the auditory cortex (new Figure 3 – figure supplement 5A). In addition, there appeared to be a slightly higher activation of RSC when transitioning to W- rather than to W+ (new Figure 3 – figure supplement 5A). In the future, it will be of great interest to further investigate this phenomenon.
  
  We also evaluated the whisker deflection-evoked responses of the different cortical regions according to the number of whisker trials from context switch (new Figure 3 – figure supplement 5B&C). This analysis revealed that while the sensory response in wS1 and wS2 were constant over the time course of a context block, the response of wM1/2 and especially RSC became progressively lower in the W- context, consistent with the behavioral results in Figure 1 supporting time-dependent contextual integration.
  
  Overall, these results strengthen the role of RSC and wM1/2 in integrating contextual information to guide the response to the whisker stimulus, and we thank the reviewer for raising this important point.
  
  (2) It might be useful to state 'earliest among the imaged dorsal cortical areas,' and briefly acknowledge potential subcortical contributors (since those were not explored and could be earlier than cortical areas).
  
  We agree with the reviewer. In the Summary, on P. 2 Line 39-40 we now write: “Widefield calcium imaging revealed that retrosplenial cortex was the first dorsal cortical area to show context discrimination in response to whisker stimulation”. On P. 8 Lines 257-258, we now write: “To investigate the spatiotemporal neural dynamics underlying task execution, we recorded calcium activity across the dorsal cortex in transgenic mice”. On P. 13 Lines 416-420 we now write: “Functional imaging of cortical activity with two different genetically-encoded calcium indicators each showed similar spatiotemporal dynamics of whisker sensory processing with the earliest contextdependent divergence in signalling being detected in RSC, out of the imaged dorsal cortical areas (Figure 3).” On P. 15 Lines 470-473, we now write: “Finally, it is of course important to note that many subcortical regions (as well as non-dorsal cortical regions, which were not imaged) are likely to contribute importantly to context-dependent task performance.”
  
  (3) Fit a simple exponential/logistic to lick probability vs time-since-switch (your Figure 1Hstyle analysis) to report a time constant with CIs; it will help quantify the integration of the continuous cue.
  
  We thank the reviewer for this suggestion. We have fitted an exponential to the grand average data to quantify the time constants for integration of contextual information before the presentation of the first whisker stimulus of the block (see new Figure 1H). On P. 6 Lines 170-173 we now write: “To assess whether this temporal integration would differ between contexts we fitted an exponential to the time evolution of the lick probability. This suggested a faster transition to the W+ context than to the W- context (W+ time constant: 9.4 s, W- time constant: 15.5 s) (Figure 1H).”
  
  (4) Because catch-trial false alarms are higher in W+ than W−, report per-context d′ and criterion for whisker trials (using signal detection theory); this separates sensitivity from bias and makes the behavioral shift more interpretable. It is also further proof that the behavior is contextual (versus a compound stimulus, for example).
  
  We have computed the d’ and criterion for the whisker trials in the W- and W+ contexts. (see new Figure 1 - figure supplementary 1D). As suggested by the reviewer, this further supports that the behavior is driven by contextual information.
  
  (5) For the pre-stimulus seed-correlation analysis, can you regress out the pupil/jaw/whisker activity to confirm whether the context modulation is (or is not) movement-driven? It would be helpful to better understand whether the baseline correlation is driven by differences in lowlevel factors between the contexts, versus the higher-level decision rule/context.
  
  The reviewer raises an interesting point. However, we did not find a straightforward way to regress out movements, and thus we leave this point for future in-depth analysis. On P. 11 Lines 354-357 we now write: “It is important to note that these context-dependent changes in resting-state functional connectivity could relate to the overt context-dependent movements in the prestimulus baseline (Figure 1I&J) and/or a manifestation of higher-level internal rule representations.”
  
  (6) For the earliest divergence analysis, is this consistent across animals and across sessions within animals? Can you show per-mouse distributions of first-crossing times (d′>2) for RSC vs wM1/2/wS2? This would help provide confidence in this key finding.
  
  The d’ presented in Figure 3H is computed as the discriminability between contexts at the population level, meaning that at each timepoint (from Figure 3F) we compared the 2 distributions built on N=6 mice. As such if the divergence between context was not consistent across animals this d’ would be low. That said, as suggested by the reviewer, we further investigated this context divergence at single mouse level and single session level. Our analysis supporting the main finding (Figure 3F-H) is shown in new Figure 3 – figure supplement 3.
  
  First, we show the results for a single mouse across sessions in Figure 3 – figure supplement 3A. We show the stimulus aligned activity in correct whisker trials in both contexts for the 3 recording sessions. For each session we quantified the main effect size defined as the difference of the trial average between contexts. Plotting the difference of mean response, we consistently observed that RSC ramps-up before wM1/2 for the 3 sessions.
  
  Second, across all individual mice: we further aggregated the session average responses to show discriminability between context for each region at the single mouse level (Figure 3 – figure supplement 3B). We show that RSC is the first region to exhibit context separation in 4 out of the 6 mice that we recorded. In 2 other mice all regions seemed to show context separation but without clear temporal ordering.
  
  Finally, when averaging across mice, we observed a clear separation and first discrimination in RSC (Figure 3F-H and Figure 3 – figure supplement 3C).
  
  Overall, these further analyses suggest that the early divergence of RSC activity appears to be robust with a consistent mean difference in single sessions and single mice, as well as across the population of mice. We think this analysis has strengthened our manuscript and we thank the reviewer for the valuable suggestion.
  
  (7) For the opto mapping data, could you provide P(lick) effect sizes with CIs per grid site? It would also be nice to summarize the qualitative dichotomy: RSC/tjS1 increases licking in W−; canonical wS1/wS2/wM/ALM decreases licking across contexts (to my understanding).
  
  We now provide the P(lick) effect sizes for the main cortical areas studied in the paper in Figure 2 – figure supplement 1C. This shows the relative change in lick probability in optogenetic trials compare to control trials for each mouse.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Do mice move their whiskers after stimulus onset? If so, are these movements dependent on behavioral context? What causes the increase in S1 activity during auditory-evoked response trials?
  
  To answer the reviewer’s questions we have further investigated whisker movements following the sensory stimuli (whisker and auditory correct trials) in both contexts. The results of this analysis are presented in new Figure 3 – figure supplement 4.
  
  We find that mice move their whiskers shortly after the whisker stimulus in both contexts. The time course of whisker angle in correct whisker trials is similar in both contexts with a discriminability index (d’) consistently below 1. The whisker speed in response to stimulus is slightly higher in the W+ context compared to W- with a d’ slightly above 1 after ~100 ms. We also observed evoked whisker movements in auditory trials independent of context. Thus, whisker movements are indeed evoked by the sensory stimuli, but the overall context-dependent modulation of whisker movements is weak. The early differences in whisker-evoked cortical activity in W+ compared to W- contexts are therefore more likely related to the integration of contextual information than to differences in evoked movements.
  
  The reviewer is correct to point out that wS1 activity increases in auditory trials (Figure 3E). The response is initially very weak, but becomes more prominent after ~100 ms following the auditory tone. We do not know the underlying mechanisms, but there are several likely explanations. First, as discussed above, there are indeed some whisker movements evoked in response to the auditory stimulus (Figure 3 – figure supplement 4), which could result in sensory input to wS1. Equally, the increase could relate to licking, given the broad representation of movements in cortex and an appropriate reaction time in auditory trials (Figure 3C). Alternatively, wS1 activity in auditory trials could also be related to input connectivity from auditory cortex, top-down input from frontal cortex or subcortical regions such as high-order POm.
  
  (2) What do the authors think is causing the W+ vs W- difference in S1/S2 activity approximately 100ms after whisker deflection?
  
  The late W+ vs W- difference in wS1/wS2 activity could be explained by several factors. First this could be due to the difference in whisker movements after ~100 ms as shown in Figure 3 – figure supplement 4. Second this could be driven by the lick vs no lick activity (see reaction time in Figure 3C for whisker trials ~110 ms). Finally, this could be partly due to some movement independent top-down contextual information reaching wS1/wS2 at late time points. Overall, our claim in the paper is that there was no contextual difference in whisker primary and secondary cortices at early time points (before movement). On P. 9 Lines 270-273 we explicitly write: “Early after stimulus onset, whisker deflection evoked similar activation of primary and secondary whisker somatosensory cortices (wS1 and wS2) in both W+ and W− contexts.” In contrast, our main findings are grounded in the divergence of cortical activity in RSC and wM1/2 at early time points (<100 ms).
  
  (3) The choice of PC3 seems arbitrary. Is there no task-relevant information in PC1 and PC2?
  
  We appreciate the point raised by the reviewer and have clarified the reasoning leading to PC3 selection in the main text, where on P. 12-13 Lines 384-391 we now write: “The loadings of the first principal components were uniformly distributed and could reflect a late movement driven activation distributed across all cortical areas (Figure 4 – figure supplement 2C&D). PC2 loadings show variation along the anteroposterior axis that could reflect differences between sensory and motor regions but its time course does not separate between lick and no lick in control conditions (Figure 4 – figure supplement 2C&D). The loadings of PC3 highlighted task-related cortical regions and its time course exhibited clear differences comparing lick and no-lick trials.” In addition, we now also show the time courses for PC1 and PC2 in Figure 4 – figure supplementary 2D.
  
  Overall, the reasoning is the following:
  
  PC1 has spatially-homogeneous positive loadings (Figure 4 – figure supplementary 2C) and activity along PC1 gradually ramps up following sensory stimulation (Figure 4 – figure supplementary 2D). It is likely driven by widespread activation of the cortex following the whisker stimulus and the lick response. As such we believe that the taskrelated information captured by PC1 is movement related and not necessarily informative about processing of whisker and context.
  
  PC 2 has loadings varying along the antero-posterior axis (Figure 4 – figure supplementary 2C), which could be relevant for the task, but its time-course does not discriminate between lick and no lick neither in W+ nor W- (Figure 4 – figure supplementary 2D).
  
  PC3 has both loadings that vary between several cortical regions involved in the task (Figure 4 – figure supplementary 2C) and a time course that separates between lick and no lick in both contexts (Figure 4 – figure supplementary 2D). We thus focus on PC3 to investigate the effect of optogenetic inactivation on whisker stimulus evoked activity.
  
  The remaining components beyond PC3 contain a very small fraction of variance and were thus not considered.
  
  (4) Figure 3 - Supplement 1: What explains the change in fluorescence in GFP/tdT mice during W+ stimulation? Is it brain movement on the z-dimension? Could this explain differences in calcium imaging results?
  
  We thank the reviewer for this question. The nature of intrinsic signals is a complex topic, but brain movement is unlikely to contribute importantly, because under similar behavioral conditions we (and others) typically find brain movements to be on the scale of a few microns. The three most widely-reported contributions to intrinsic optical changes in cortex relate to:
  
  (i) Light scattering – as neurons integrate synaptic inputs and fire action potentials, the neuronal elements swell slightly due to the ionic and water fluxes (see for example Vincis et al. Cell Reports 2015, doi: 10.1016/j.celrep.2015.06.016). This reduces the refractive index mismatch between the intracellular and extracellular space. This in turn reduces light scattering, which could result in fluorescence increases.
  
  (ii) Hemodynamics – changes in blood volume and changes in oxygenation/deoxygenation will change the absorption of light at different wavelengths, in an activity-dependent manner (also forming the basis of BOLD fMRI signals).
  
  (iii) Flavoproteins – endogenous fluorescent proteins, such as flavoproteins present at high levels in mitochondria, have been reported to change their fluorescence depending upon neuronal activity, presumably in relationship to increased mitochondrial activity.
  
  We therefore think it is very important to image GFP/tdTomato-expressing mice as controls, and we would suggest that this should be carried out more commonly in the field. Indeed, similar to our results, another study (Yogesh et al., eLife 2025, doi: 10.7554/eLife.104914) recently reported upon the importance of carefully examining intrinsic fluorescence changes, which were found to be present in both wide-field and two-photon imaging of GFP expressing mice.
  
  Our results reported in Figure 3 – figure supplement 1, show that GFP/tdTomato signals over the first ~120 ms following whisker stimulation were much smaller that the equivalent changes in GCaMP6f/jRGECO1a-expressing mice, and therefore would only have a minor contribution to our analyses. However, we refrained from analysing fluorescence changes at later post-stimulus times, because the intrinsic signals indeed become increasingly prominent as the mice initiate licking.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.10.31.685738v2
osf.io osf.io

Continuous developmental changes in word recognition support language learning across early childhood

1
1. Public_Reviews 19 May 2026
  
  in eLife (unscoped)
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  General note
  
  We have issued a new release of the general Peekbank database, 2026.1, which includes more data integrity checks and several more datasets. As a result of this release, the underlying dataset we use in our paper has shifted slightly. The shifts represent a relatively small proportion of the total data and thus these changes have caused only relatively minor changes to our numerical results. We also highlight that we now include a small amount of data regarding children younger than 12 months, increasing the developmental range of our analysis (see Figure 1).
  
  Reviewer 1 (Public review):
  
  The limitations of the study are acknowledged to some extent, but need to be improved and ensured that they run throughout the manuscript. Thus, in the discussion, the authors note that the approach is observational and exploratory, and highlight for me a key alternative explanation of the findings, namely that faster children could be faster due to their larger vocabulary, rather than faster children learning more words. Indeed, the latter explanation for the relationship is called into question, given that growth in speed was not related to growth in vocabulary. Here, the authors note that the null result may be related to the fact that they do not sufficiently precise estimates of growth slopes, rather than taking the alternative explanation seriously that there may not be as causal a link between being a faster word learner and a better word learner (learn more words).
  
  Thank you very much for your challenging and thoughtful comments. In hindsight we did not realize that the way we were writing about our results was ambiguous between several interpretations (one of which we endorse and one of which we do not).
  
  We respond below to the specific suggestions about causal directionality in the longitudinal analysis, but we certainly believe that we cannot draw strong conclusions about causality from our dataset and have attempted throughout the paper to remove causal language that might have crept into our interpretation.
  
  In response to your comments, we have made a number of key revisions aimed at qualifying and clarifying our points:
  
  The abstract now prominently notes that our design is observational: “In an observational study…”
  
  The abstract notes a positive and a negative result in the relationship between word recognition and vocabulary: “Further, across a range of longitudinal models, speed, accuracy, and vocabulary were coupled. Children with overall faster word recognition tended to show faster vocabulary growth, though developmental growth in word recognition skill was not specifically associated with growth in vocabulary.”
  
  The abstract removes potential casual language in the final sentence: “... these findings support the view that word recognition is a skill that develops gradually across early childhood and that this skill is deeply intertwined with early language learning.”
  
  A new paragraph in the Results introduces the potential hypotheses investigated via the longitudinal models.
  
  The final paragraph of the Results section sharpens the contrast between two possible growth hypotheses: “However, we did not find evidence for the stronger version of this claim: in neither the non-linear growth model nor the linear SEM did we find evidence that increases in speed were related to increases in vocabulary size. Thus, our findings do not support a ‘virtuous cycle’ model in which increases in recognition specifically lead to increases in vocabulary size.”
  
  We hope these changes lead to a manuscript that better aligns with the limitations of the study.
  
  This is especially since, but correct me if I’m wrong here, the current vocabulary size is not taken into consideration in the model examining vocabulary growth. Given the increasing number of studies showing that current vocabulary knowledge predicts vocabulary growth (Laing, Kalinowski et al, Siew & Vitevitch), one simple alternative explanation is that current vocabulary knowledge predicts both current word recognition skill and later vocabulary knowledge. Is there anything in the data speaking against this hypothesis?
  
  We think the reviewer’s overall point is generally correct, as we described above, but we want to clarify a specific statistical point. The non-linear longitudinal model of vocabulary growth does in fact take into account a child’s average vocabulary size. (This point feels tricky in a non-linear model but it’s actually quite similar to a linear model for the purposes of this discussion). Basically, vocabulary (at all timepoints) is modeled as a function of age, with both main effects and interactions with age. Critically, each participant is also modeled as having a random intercept capturing their deviation from the average growth pattern across ages (as expressed by the fixed effects). In this model, the “main effect” (here captured by the intercept for the logistic curve in the model) that we observe for speed indicates that vocabulary growth for individuals is predicted to be faster (their curve is shifted left) if their RTs are fast. The presence of the random effects in this model thus “controls” for the fact that some participants have overall higher vocabularies (and are shifted up relative to the average growth curve).
  
  But, we note that this model does not show an “interaction effect” (here captured by the null effect of RT on the slope parameter in the logistic model). That’s one of the null effects that we now call out much more prominently in the abstract and end of the results (per our response above).
  
  Equally, while the SEM examines vocabulary growth controlling for age, I wonder about the other way around. What would happen to the effect of age on word recognition skill (in the LME model, S8) if one were to add concurrent vocabulary size? So does chronological age explain word recognition skill or vocabulary knowledge? Right now, the manuscript describes this effect purely related to chronological age, but is it age per se or other cognitive abilities, including a key change across development, namely, vocabulary size? Thus, the presentation of the skill learning hypothesis suggests that age is a proxy for experience, while you actually have here a very nice proxy for experience in terms of children’s vocabulary size.
  
  Again, thank you for engaging with this tricky set of issues. Overall, our goal is to adjust the manuscript to reflect points of agreement; in particular, we agree that age is a proxy for language experience, vocabulary, and other cognitive changes, and we have stated this explicitly now in the intro to the factor analyses: “In our prior analyses, chronological age acts as a proxy for greater language experience and larger vocabulary as well as a host of other correlated developmental changes in cognition. Now we explicitly explore relations to vocabulary growth and the triadic relationship between age, word recognition, and vocabulary.”
  
  On the statistical side, we do think that the NLME (non-linear mixed effects; the logistic growth mode) effectively controls for average vocabulary size, as described above. The longitudinal SEM also relates vocabulary growth to growth in word recognition skill. In both models, we find no evidence for coupled growth; instead the evidence points to children with higher baseline word recognition skill showing faster growth in vocabulary (speed intercept significantly related to vocabulary slope, -.14, p < .01) but not the reverse (vocabulary intercept not strongly related to speed slope; -.01, ns).
  
  More generally, we hope our edits to the paper, detailed above, both clarify this tricky set of issues and also remove inappropriate casual language throughout.
  
  Critically, while the discussion is more nuanced, the way the abstract is concluded and the way the Introduction is phrased suggest that the study is able to answer a causal question, which, as the authors themselves note, is not possible. The abstract, for instance, states that word recognition becomes faster, more accurate and less variable...consistent with a process of skill learning. And also that this skill plays a role in supporting early language learning, which is very causal language. I don’t think you can really claim that you are testing the two hypotheses you suggest here. The work is definitely embedded in the context of these hypotheses, but are you really able to test them? My worry is that while the discussion is more nuanced, the extent to which this study will then be cited down the line as showing that children learn more words down the line because they are faster at recognizing words, and anything that you can do to tamper with such interpretations would be good for the literature. For me, this should not just be relegated to the discussion but should be touched upon in the abstract and Introduction.
  
  Thanks for pushing us to be more precise with how we frame and describe our findings. We agree with the reviewer that our findings do not warrant strong conclusions about the causal role of word recognition skill in vocabulary growth. Per our response above, we have now tried to carefully revise our language throughout the paper (in particular, in the abstract and introduction, as noted by the reviewer).
  
  Finally, it would help to talk more about the mechanisms at work in any relationship between word recognition and language learning. It seems to me that this would rely on some predictive processing framework, given the description on page 4, and it would be good to make this clear (faster and more accurately you can recognize a ball, better use this evidence to infer the speaker’s intended meaning).
  
  Thanks, this is a great point. We’ve revised this text and added references to predictive processing, unpacking a problematic paragraph into two:
  
  “Familiar word recognition -- as measured by LWL -- is hypothesized to play a key role in language learning (19). The idea, in a nutshell, is that the faster and more accurately a child can process incoming words, the more opportunities they have for learning. Consider a child hearing the utterance "Can you put the ball in the crate?" The better the child can recognize the word "ball", the better they can use this evidence to help infer the speaker's intended meaning, allowing possible inferences about the meaning of the less familiar word, "crate" (20).
  
  “Real time language processing, including word recognition, relies heavily on predictive processing, in which comprehenders integrate expectations from prior linguistic context with noisy and ephemeral incoming signals (21, 22). The more input a child receives, the better their predictions are likely to be, and hence the more they can learn (19, 23). Indeed, measurements of children's language input at home are consistently associated with their vocabulary size (24, 25). And, in line with this predictive processing framework, one important study found that children's word recognition speed mediated the longitudinal relationship between home language input and vocabulary growth (26). Thus, word recognition is thought to be a key support for ongoing word learning.”
  
  Equally, when referring to word recognition, it would be good to clarify what this refers to - how well a child knows what a word refers to (and in the context of LWL, what it does not refer to) or how quickly it directs attention to what is referred to.
  
  Thanks, we’ve added a capsule definition in the second paragraph, and added the sentence “This procedure [LWL] measures the general construct of word recognition by operationalizing knowledge of a meaning as visual attention to a specific named referent.” We hope this clarifies the relationship between LWL and word recognition.
  
  With regards to the data, I wonder if there is a clustering of kids past 24 months that is happening here, looking at Figures 1 and 2, where it seems like there is less change past the 24-month point. Is there any way to look at whether the effect of age or vocabulary on word recognition is not linear but asymptotic?
  
  Thanks for pointing this out; we do see what you are talking about but think it’s being handled appropriately in the analysis. In Figure 1 it clearly looks like changes to RT are asymptotic – this is why we analyze the logarithm of RT throughout the paper. In Supplement S6 we show that reaction time is indeed best fit by a log-log function. Your question about Figure 2 asks whether there is further structure beyond the log-log fit; in Supplement S7 we show some analyses that suggest a polynomial fit is not better than the log-log fit; there is some small additional linear effect of age over and above the log-log fit, but it’s minor and pretty hard to interpret in our view.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Page 3. Word production may manifest in overt behaviour but need not reflect complete knowledge. A child can say the word dog and use it to refer to a cat.
  
  This is a good point. Since we are not able to speak to the precision of meaning representations (an important issue in its own right), we have omitted the phrase “with incomplete knowledge.”
  
  Page 4. The first two sentences of the paragraph beginning with word recognition ability... don’t go together. The second sentence does not support the claim that word recognition plays a role in language learning.
  
  Thanks, we’ve tried to smooth out this transition as part of unpacking the role of predictive processes.
  
  Page 4. “predicts children’s standardized test scores years later” - make clear what test scores are here.
  
  We added some additional details. The specific tests were the CELF (expressive language) and the KABC (IQ), but we thought too much detail might be distracting.
  
  Page 5. I love Table 1, but would like for the data to be weighted somehow. So, given that some studies had a lot more trials and more children, what percentage of the data did this study contribute? That allows a clearer view of how biased the sample is in certain studies. The x in CDIS and longitudinal could be aligned to the right. I kept wondering why there was an x near some trials.
  
  Thanks, we’ve adjusted the table to add the percentage of the total dataset (in trials) due to each study and fixed the alignment issue.
  
  Page 6. 12 million individual samples: what samples are these? Individual data points per trial per time point. Making this clear would be great.
  
  Clarified, thanks.
  
  Page 9. Your accuracy measures only seem to consider the target. From what I remember of my preferential looking days, this measure usually also includes the distractor. Why do you not do this? This is especially since you have such a wide age range, so if a 12-month-old only looks for about 50 per cent of the trial and spends that time looking at the target, that is very different from a child who looks at the screen all of the trial and spends less time looking at the target here.
  
  Sorry for any lack of clarity: we do in fact compute accuracy as the ratio of looking to target over looking to target plus looking to distractor. We have added this information to the parenthetical referenced above: “... accuracy (more target looking; computed as the ratio of target to target plus distractor looking)”.
  
  Page 12. I only found out that age was in this model by looking at S9.
  
  Thanks for mentioning this omission, we’ve clarified in the text: “We initially add age as an additional variable to our models to explore whether this factor structure relates to age; later we treat age as a predictor of latent factors.”
  
  Page 12. Isn’t it trivial that speed and accuracy show negative covariance, especially given how you measure accuracy? Thus, if I take longer to fixate the target, I have less time to look at the target during the trial. If, however, I included the distractor in my accuracy measure, then I could still take longer to look at the target, but still look more at the target than the distractor.
  
  Thanks for mentioning that this covariance is not the key result of interest; that observation didn’t come out in the text. Now we note that this covariation is “... as expected since they [speed and accuracy] are derived from the same data.” Note per above that accuracy is computed as target / target + distractor looking; even so, your observation is correct: slower looking at the target means lower accuracy at least to some degree.
  
  Page 19. If you excluded data from trials with less than 50% of timepoints, how did this vary across age? Arguably, your study has to worry less about this, given your sample size, but it would be nice to know, which you could include in the percentage of data that each study contributed to the final sample.
  
  Thanks, we’ve added this information to a new table in S1.
  
  Reviewer #2 (Public review):
  
  First, I wasn’t entirely clear about what the authors meant by “word recognition ability”. For much of the manuscript (including the use of the term “word recognition ability” itself), this comes across as an intrinsic ability or skill that improves with development. Alternatively, the speed and accuracy metrics taken from studies in Peekbank might capture children’s increasing knowledge of the common, concrete words typically used in these studies. To me, this is a somewhat different construct from a general skill at recognizing words. It would be helpful if the authors could clarify which construct they intend to capture, or if it is not possible to distinguish between these constructs from the Peekbank data.
  
  In response to this comment and related comments above, we’ve added text to the first two paragraphs trying to clarify the general construct that we’re talking about – recognizing the meaning of a word in real-time language comprehension. We’ve also clarified several times throughout the introduction that we’re talking about familiar word recognition, that is, the ability to recognize specific known words. Further, we directly acknowledge the issue above in the introduction:
  
  “Critically, most word recognition paradigms use words that children at the target age are reported to understand and produce. They are thus not indices of vocabulary size but rather measures of how quickly and accurately the child can recognize a familiar spoken word and use it to guide their visual attention to a referent. However, it is unknown the extent to which specific responses reflect an individual child's general speed of language processing versus their familiarity of specific words.”
  
  Second, and relatedly, if the source of the age-related improvements is increasing experience with the common concrete words used in the Peekbank studies, then one might expect word recognition and improvements with age to be related to word frequency, given that more frequent words are experienced more often. Word frequency predicts word knowledge when assessed using CDI data. Can effects of frequency be detected in Peekbank word recognition metrics? If not, why? Similarly, is the speed and accuracy of word recognition in Peekbank data related to CDI-derived word age of acquisition, and again, if not, why?
  
  This is a fascinating set of ideas, and one that we’ve pursued extensively using the Peekbank data. Unfortunately, we think it is out of scope for the current paper, which focuses on child-level metrics (including vocabulary and processing measures). Right now the current paper doesn’t include any analysis of individual words.
  
  Just to expand a bit on the problem here: unfortunately, modeling word recognition as a simple linear function of (log) word frequency is only possible in the case that distractors are held constant (e.g., “ball” always has “book” as its distractor), because distractor frequency plays an important role in the recognition process. However, in our dataset, words are paired with many different distractors across studies. This property means a fairly complex model of the LWL decision process would be necessary for a model to successfully predict effects for individual words. While such a model is an exciting research goal, it’s not something we can include in the current manuscript.
  
  Finally, there is a bit of a risk of the main findings of this paper coming across as a foregone conclusion. I.e., how could it be otherwise that word recognition improves with development?
  
  Reviewer #2 (Recommendations for the authors):
  
  Regarding the feedback about the risk of the findings coming across as a foregone conclusion - perhaps a primary place in the paper where it would be useful to clarify this point is on page 6, in the paragraph beginning, “We investigate two specific hypotheses here. First, one influential theory...”. Here, it might be worth clarifying whether there are alternative ideas about the emergence of word recognition in childhood that predict different patterns, so that the findings of the current paper can be framed as shedding new light on word recognition in development, rather than a confirmation of the common-sense idea that word recognition must improve over development.
  
  Thanks, we appreciate this feedback and it’s something we’ve struggled with in this project. Our conclusion is that this paper does not constitute a binary hypothesis test of e.g., whether word recognition is linked to vocabulary development. Instead, we lean into the idea that there are empirical issues (rather than hypotheses) that have not been quantified sufficiently. Thus, we end the revised introduction with the following paragraph:
  
  “Across both of these issues, the contribution of our work here lies in the detailed quantitative description of development. Nearly every theory of language learning assumes some role for continuous developmental change in word recognition, but these assumptions have not previously been anchored to specific measurements. Hence neither the functional form of the assumed changes nor their concurrent and predictive relationships to vocabulary have been quantified. We leverage the Peekbank dataset to accomplish these goals.”
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

osf.io/preprints/psyarxiv/dtv2f_v3
www.biorxiv.org www.biorxiv.org

Brainstem neurons coordinate the bladder and urethral sphincter for urination

1
1. Public_Reviews 19 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We would like to express our deep appreciation to the editor and reviewers for their constructive comments and suggestions, which have significantly improved the quality of our manuscript. In response, we have carefully revised the manuscript, addressed all comments, and performed additional experiments and analyses to strengthen our findings.
  
  (1) We repeated retrograde tracing using CTB-647 to verify precise targeting of SPN and DGC neurons, as shown in the new Figure 7.
  
  (2) We performed dual retrograde tracing combined with fiber photometry or optogenetic activation to investigate the role of PMC dual-projecting neurons in the control of urination, as shown in Figure supplements 11 and 12.
  
  (3) We conducted new experiments activating PMC<sup>ESR1+</sup> neurons after PDNx to assess their role in urination, as shown in new Figure 6.
  
  (4) We added a more detailed analysis of the dynamics of neural responses in PMC<sup>ESR1+</sup> neurons in Figure supplements 3F-3G.
  
  (5) We analyzed peak Ca<sup>2+</sup> signals in the PMC during and after the onset of EMG bursting, as shown in Figure supplement 4F.
  
  (6) We added a comparison of spontaneous and light-induced spikes in PMC<sup>ESR1+</sup> neurons, as shown in Figure supplements 3B–3C.
  
  (7) We expanded the Discussion to address how PMC<sup>ESR1+</sup> neurons coordinate bladder contraction and sphincter relaxation to control both the initiation and suspension of urination.
  
  We hope these revisions meet the reviewers' expectations and contribute to the improvement of our manuscript.
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Urination requires precise coordination between the bladder and external urethral sphincter (EUS), while the neural substrates controlling this coordination remain poorly understood. In this study, Li et al. identify estrogen receptor 1-expressing neurons (ESR1+) in Barrington's nucleus as key regulators that faithfully initiate or suspend urination. Results from peripheral nerve lesions suggest that BarEsr1 neurons play independent roles in controlling bladder contraction and relaxation of the EUS. Finally, the authors performed region-specific retrograde tracing, claiming that distinct populations of BarEsr1 neurons target specific spinal nuclei involved in regulating the bladder and EUS, respectively.
  
  Strengths:
  
  Overall, the work is of high quality. The authors integrate several cutting-edge technologies and sophisticated, thorough analyses, including opto-tagged single unit recordings, combined optogenetics, and urodynamics, particularly those following distinct peripheral nerve lesions.
  
  We are grateful for your insightful and constructive comments, which affirmed the importance and technical depth of our work. Thank you for dedicating your expertise and time to reviewing our manuscript. Guided by your suggestions, we have revised the paper as detailed below.
  
  Weaknesses:
  
  (1) My major concern is the novelty of this study. Keller et al. 2018 have shown that BarEsr1 neurons are active during urination and play an essential role in relaxing the external urethral sphincter (EUS). Minimally, substantial content that merely confirms previous findings (e.g. Figures 1A-E; Figures 3A-E) should be move to the supplementary datasets.
  
  Thank you for this valuable and constructive comment. We fully agree that the novelty of our study relative to Keller et al., 2018 must be made explicit. Keller et al. established that PMC<sup>ESR1+</sup> neurons are active during socially evoked urine-marking behavior (voluntary urination) and demonstrated their essential role in relaxing the EUS. Their study mainly focused on behavioral context and EUS relaxation. In contrast, our work addresses a distinct, mechanistic question: how these same neurons participate in reflexive, physiological urination and coordinate both bladder detrusor contraction and EUS relaxation.
  
  Novel aspects of the present study:
  
  (1) Temporal dynamics of PMC<sup>ESR1+</sup> neurons during reflexive micturition.
  
  Using opto-tagging and single-unit recordings, we reveal the precise firing pattern of PMC<sup>ESR1+</sup> neurons during reflexive voiding. Simultaneous fiber photometry, cystometry, and EUS-EMG recordings demonstrate that population-level activity of PMC<sup>ESR1+</sup> neurons precedes and tightly correlates with both bladder contraction and EUS relaxation a coordination not previously demonstrated.
  
  (2) Causal role in reflexive urination.
  
  Manual closed-loop optogenetic inhibition at the onset of reflexive voiding acutely terminates EUS bursting and bladder contraction, immediately halting urine release.
  
  (3) Dual control of bladder and EUS.
  
  Optogenetic activation combined with selective pelvic or pudendal nerve transection shows that PMC<sup>ESR1+</sup> neurons drive both bladder contraction and EUS relaxation, revealing a coordinating role beyond EUS relaxation alone.
  
  (4) Anatomical substrate for coordinated control of bladder contraction and EUS relaxation in reflexive urination.
  
  Retrograde tracing identifies three spinal-projecting sub-populations: SPN-only, DGC-only, and dual-targeting neurons, providing a circuit-level explanation for the simultaneous control of bladder and EUS.
  
  Following your suggestion, panels that merely replicate Keller et al. (former Figures 1A–1E and Figures 3A–3E) have been moved to new Figure Supplements 1 and 7, respectively, so that the main figures now emphasize the new mechanistic findings.
  
  (2) I also have concerns regarding the results showing that the inactivation of BarEsr1 neurons led to the cessation of EUS muscle firing (Figures 2G and S5C). As shown in the cartoon illustration of Figure 8, spinal projections of BarEsr1 neurons contact interneurons (presumably inhibitory) that innervate motor neurons, which in turn excite the EUS. I would therefore expect that the inactivation of BarEsr1 should shift the EUS firing pattern from phasic (as relaxation) to tonic (removal of relaxation), rather than stopping their firing entirely. Could the authors comment on this and provide potential reasons or mechanisms for this finding?
  
  Thank you for this crucial comment. We apologize that the representative EUS-EMG traces in Figures 2G and S5C were too small to be clearly seen and that the corresponding results description was not sufficiently accurate. We have now replaced these EMG traces with enlarged versions (revised Figures 2G and S5C) and revised the corresponding Results section (lines 184, 197, 340-341). Based on the enlarged traces, we found that acute photoinhibition of PMC<sup>ESR1+</sup> neurons at the onset of phasic EUS-EMG bursting shifted the EUS firing pattern from large-amplitude phasic bursts to low-amplitude tonic firing. This suggests that ongoing activity of PMC<sup>ESR1+</sup> neurons is required to maintain phasic EUS bursting. A similar shift from phasic to tonic EUS-EMG activity during optogenetic silencing of PMC<sup>ESR1+</sup> neurons was reported by Keller et al., 2018 (Figure supplement 8C), confirming the reproducibility of the phenotype. We propose that the potential mechanism of this low-amplitude tonic activity may be mediated in part by a spinal reflex pathway (the guarding reflex) for preventing urination, whereby the loss of PMC<sup>ESR1+</sup> neurons-mediated supraspinal facilitation reduces inhibition of spinal interneurons, leading to enhanced baseline excitability of EUS motor neurons in response to bladder afferent input during bladder distension (William C. de Groat et al., Comprehensive Physiology. 2015, PMID: 25589273).
  
  (3) Current evidence is insufficient to support the claim that the majority of BarEsr1 neurons innervate the SPN but not DGC. The current spinal images are uninformative, as the fluorescence reflects the distribution of Esr1- or Crh-expressing neurons in the spinal cord, along with descending BarEsr1 or BarCrh axons. Given the close anatomical proximity of these two nuclei, a more thorough histological analysis is required to demonstrate that the spinal injections were accurately confined to either the SPN or the DGC.
  
  Thank you for raising this important concern. To rigorously verify that our spinal injections were confined to either the SPN or the DGC, we performed new retrograde-tracing experiments in ESR1-Cre and CRH-Cre mice. We injected a mixture of AAV-Retro-DIO-mCherry or AAV-Retro-DIO-EGFP with the retrograde tracer CTB-647 specifically into the SPN or DGC (Methods, lines 465-466). Only animals in which CTB-647 fluorescence was strictly limited to the target nucleus, without detectable spread to the adjacent region, were included in the analysis (new Figures 7A and 7E). These results confirm our original observation that PMC<sup>ESR1+</sup> neurons comprise three distinct spinal-projection subpopulations: one (19.0%) targeting the SPN, one (52.2%) innervating the DGC, and a third (28.8%) projecting to both regions (Results, lines 304–306; new Figures 7F–7H). In addition, the majority of PMC<sup>CRH+</sup> neurons project to the SPN but not the DGC (new Figures 7B–7D; Results, lines 297–301). We have assembled new Figure 7 using the newly acquired spinal images and the validated data.
  
  Reviewer #1 (Recommendations for the authors):
  
  From the abstract: "Anatomically, PMCESR1+ cells possess two subpopulations projecting to either the pelvic or pudendal nerve". I don't think these neurons directly project to either nerve.
  
  Thank you for this precise comment. We apologize for incorrectly stating that PMC<sup>ESR1+</sup> cells project directly to the pelvic or pudendal nerves. In the revised Abstract (lines 32–36) we have rephrased the sentence to clarify the actual anatomy: “Anatomically, PMC<sup>ESR1+</sup> neurons consist of three distinct spinal-projection-based subpopulations: one targeting the sacral parasympathetic nucleus (SPN), one innervating the dorsal gray commissure (DGC), and a third that projects to both regions, thereby enforcing the coordination of bladder contraction and sphincter relaxation in a rigid temporal sequence.”. We trust this revision now accurately reflects the anatomical findings.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The authors have performed a rigorous study to assess the role of ESR1+ neurons in the PMC to control the coordination of bladder and sphincter muscles during urination. This is an important extension of previous work defining the role of these brainstem neurons, and convincingly adds to the understanding of their role as master regulators of urination. This is a thorough, well-done study that clarifies how the Pontine micturition center coordinates different muscle groups for efficient urination, but there are some questions and considerations that remain.
  
  Strengths:
  
  These data are thorough and convincing in showing that ESR1+PMC neurons exert coordinated control over both the bladder and sphincter activity, which is essential for efficient urination. The anatomical distinctions in pelvic versus pudendal control are clear, and it's an advance to understand how this coordination occurs. This work offers a clearer picture of how micturition is driven.
  
  We sincerely thank you for highlighting the rigor of our study and for recognizing the advance in understanding how PMC<sup>ESR1+</sup> neurons exert coordinated, anatomically segregated control over bladder and sphincter. We also appreciate the constructive suggestions that helped us further improve clarity, which we address point-by-point below.
  
  Weaknesses:
  
  The dynamics of how this population of ESR1+ neurons is engaged in natural urination events remains unclear. Not all ESR1+ neurons are always engaged, and it is not measured whether this is simply variation in population activity, or if more neurons are engaged during more intense starting bladder pressures, for instance. In particular, the response dynamics of single and doubly-projecting neurons are not defined. Additionally, the model for how these neurons coordinate with CRH+ neuron activity in the PMC is not addressed, although these cell types seem to be engaged at the same time. Lastly, it would be interesting to know how sensory input can likely modulate the activity of these neurons, but this is perhaps a future direction.
  
  Thank you for this insightful comment. First, we agree that not all ESR1+ neurons are consistently engaged during urination (Figure 1B). Because bladder pressure was not measured during the opto-tagging experiments, we cannot determine whether this reflects trial-to-trial variability in population activity or pressure-dependent recruitment of additional neurons. We speculate that stronger starting bladder pressures may recruit a larger subset of ESR1+ neurons, analogous to graded, pressure-dependent recruitment observed in peripheral sensory neurons (Bruns et al., J Neural Eng. 2011, PMID: 21878706; Marshall et al., Nature. 2020, PMID: 33057202).
  
  Second, using fiber photometry recording and optogenetic activation, we examined the dynamics of dual-projecting neurons in the PMC that were retrogradely labeled from the SPN and DGC. Their activity correlated with bladder contraction and sphincter relaxation, and optogenetic activation sequentially induced these events to trigger urination (see Recommendation #8). Although retrograde labeling captured only a subset of dual-projecting neurons, the results indicate that they coordinate bladder and sphincter activity.
  
  Third, previous studies suggest that PMC<sup>CRH+</sup> cells are associated with bladder contraction and likely serve as an integration center for context-dependent micturition behavior (Hou et al., Cell. 2016, PMID: 27662084; Ito et al., Elife. 2020, PMID: 32347794). We therefore propose that PMC<sup>CRH+</sup> cells establish the baseline conditions and contextual readiness for voiding, whereas PMC<sup>ESR1+</sup> cells act as the executive command to reliably initiate and execute the event.
  
  Finally, we agree that sensory inputs likely modulate PMC<sup>ESR1+</sup> neuron activity. Although this falls beyond the scope of the present study, it represents an important avenue for future investigation.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) In the introduction, the authors write that Keller 2018 only showed this ESR1 population to induce EUS relaxation, but those results also do show bladder contraction with photostimulation of this population. While the authors' work extends this finding in important ways, this should be acknowledged (line 60).
  
  Thank you for this important correction. We have now revised the Introduction to explicitly acknowledge that stimulation of neurons expressing estrogen receptor 1 (ESR1) in the PMC (PMC<sup>ESR1+</sup>) contributes to sphincter relaxation and increased bladder pressure (Introduction, lines 60-62), as originally reported by Keller et al., 2018.
  
  (2) I think a more detailed analysis of the dynamics of neural responses in the PMC ESR1 neurons would be valuable. For example: are the same cells always engaged before micturition, or do different populations activate on different trials? Can the authors comment on the half of the opto-tagged ESR1 population that is not firing during urination? Do they ever fire? A cell-by-cell analysis of which neurons are engaged over multiple trials would be very valuable to understand the dynamics of population activity. Figure 1H shows cumulative sessions, but what do single sessions look like?
  
  Thank you for these valuable comments. In response, we have performed refined single-trial analyses of neuronal activity, as detailed in the point-by-point replies below.
  
  For example: are the same cells always engaged before micturition, or do different populations activate on different trials?
  
  Among 11 PMC<sup>ESR1+</sup> units that showed urination-related excitation, 8 units exhibited a consistent firing increase in every voiding trial, whereas the remaining 3 increased their discharge in >78 % of trials (Figure 1B; new Figure supplement 3F). Thus, the same PMC<sup>ESR1+</sup> cells are recruited repeatedly, rather than distinct populations being activated on different trials. We have added this clarification to Results (lines 106–108).
  
  Can the authors comment on the half of the opto-tagged ESR1 population that is not firing during urination? Do they ever fire? A cell-by-cell analysis of which neurons are engaged over multiple trials would be very valuable to understand the dynamics of population activity.
  
  Approximately half of the opto-tagged PMC<sup>ESR1+</sup> cells showed no increase in firing rate during urination, yet exhibited spontaneous spikes at other times (new Figure supplement 3G), confirming their electrical competence. Because the PMC also participates in defecation, uterine activity, and other pelvic functions (Rouzade-Dominguez et al., Eur J Neurosci. 2003, PMID: 14686905; Schellino et al., Frontiers in Neuroanatomy. 2020, PMID: 33013330; Quaghebeur et al., Auton Neurosci. 2021, PMID: 34391125), these ESR1+ neurons may serve functions other than urination. We have now added this cell-by-cell analysis and discussion to the manuscript (Results, lines 108-112).
  
  Figure 1 H shows cumulative sessions, but what do single sessions look like?
  
  As shown in new Figure supplements 3F–3G, single-session raster plots reveal that PMC<sup>ESR1+</sup> neurons display consistent firing patterns across individual trials. Neurons whose firing rate increased during urination did so in most trials (Figure supplement 3F), whereas neurons unrelated to voiding remained silent or showed no discernible rate change during voiding across trials (Figure supplement 3G). These single-session observations are consistent with the cumulative population analysis shown in Figure 1H (new Figure 1B).
  
  (3) Supplemental Figure 4: It seems clear from this figure that NVCs are only occurring when the sphincter fails to engage. Can the authors quantify how often this is the case?
  
  Thank you for this important point. We have now quantified the occurrence of non-voiding contractions (NVCs) across all 229 bladder contraction events from 3 mice shown in Supplemental Figure 4. NVCs were observed exclusively when the external urethral sphincter failed to relax, accounting for 62/229 events (27.1 %), whereas coordinated voiding contractions (VCs) occurred in the remaining 167 events (72.9 %). These new data are presented in Figure supplement 4C.
  
  (4) Continuing from the above point: the authors say that the insufficient top-down drive or strength of activity from PMC ESR1 neurons is why NVCs occur. In looking closely, it also seems there is a small hump and subsequent increase in the calcium signal when the EUS bursting begins (particularly clear in Supplementary Figure 4). Could this instead mean that the bursting/urethral activity itself is feeding back onto the PMC to continue/enhance its activity, and it is instead the lack of sphincter bursting that results in the NVC? Could the authors analyze the signal during and after bursting starts? This model is consistent with one of the classic reflexes defined by Barrington, in which urethral fluid flow/activation enhances bladder contraction. The Figure 4 transection experiments do not fully answer this, as the authors are driving activity in the PMC at this time, but they could test this using PDN transection with fiber photometry recording.
  
  Thank you for this important point. We fully agree that EUS bursting may provide excitatory feedback to the PMC that sustains or even amplifies its activity, and that the absence of such feedback could underlie NVCs. To test this possibility, we re-analyzed the fiber-photometry traces aligned to the onset and offset of each EUS bursting (new Figure supplement 4). A small but consistent hump in the Ca<sup>2+</sup> signal appeared before bursting onset and the Ca<sup>2+</sup> signal continued to rise throughout the bursting (Figure supplement 4B, yellow arrow). The amplitude at bursting offset was significantly higher than both the NVC peak and the level recorded at bursting onset. These observations support the interpretation that urethral fluid flow/activation supplies excitatory feedback that reinforces PMC activity and bladder contraction, consistent with Barrington’s classic reflex. We have incorporated these new analyses into the revised manuscript (lines 145–155 and Figure supplement 4F).
  
  We agree that the positive-feedback loop described by Barrington’s classic urethra-to-bladder reflex is an intriguing mechanism. However, the PDN-transection experiment in Figure 4 was designed to determine if bladder contractions triggered by PMC<sup>ESR1+</sup> cells can proceed in the absence of sphincter bursting, not to evaluate this reflex. Incorporating simultaneous fiber-photometry recording into the PDN-transection experiment would therefore go beyond the scope of the present study. In future work we are keen to combine PDN transection with fiber photometry to further determine whether the urethra-to-bladder reflex contributes to the sustained PMC activity observed in our paradigm.
  
  (5) In Figure 4, is the timing of sphincter engagement different with ChR2 stimulation from what normally occurs? It appears that the bursting happens immediately upon activation whereas bladder contraction is a bit delayed.
  
  Thank you for this important observation. We have carefully re-examined the EMG traces from all animals shown in Figure 4. We confirm that the onset of sphincter bursting activity during ChR2 stimulation is indeed more rapid than during natural reflex voiding; nevertheless, the onset of phasic sphincter bursting during ChR2 stimulation remained delayed relative to the intravesical pressure rise (see Figure 8B).
  
  The immediate sphincter discharge visible in some trials was tonic EUS discharge or rare irregular bursting, not the typical EUS bursting. This tonic pattern corresponds to the spinal guarding reflex that suppresses urine leakage (Fowler et al., Nature Reviews Neuroscience. 2008, PMID: 18490916; Keller et al., Nature Neuroscience. 2018, PMID: 30104734). These segments were identified by their amplitude and spectral content and excluded from burst-onset analysis. Our analysis protocol therefore distinguishes tonic guarding activity from true phasic bursting, ensuring that only the latter was used to determine burst timing.
  
  (6) The explanation on line 299 about how spinal reflexes are impinging on this circuit is confusing. I agree that the bladder contraction stopping later than the EUS signal likely has something to do with spinal reflexes, but it seems this could instead be feedback from the urethral fluid flow, which continues bladder contractions (urethra-destrusor facilitative reflex). Could the authors clarify their thoughts here?
  
  Thank you for highlighting this ambiguity. We agree that the delayed cessation of bladder contraction could equally reflect either (1) the urethra-to-bladder facilitative reflex driven by ongoing urethral fluid flow or (2) spinal reflexes that we described. In the revised manuscript (Results, lines 343–349), we have re-worded the paragraph to make this dual possibility explicit, thereby avoiding an overly strong emphasis on spinal mechanisms alone.
  
  (7) A note on phrasing: the authors frequently say PMCESR1 cells drive sphincter relaxation, but then show an effect on sphincter bursting. Experienced readers might realize that relaxation and bursting are connected, but this might be confusing for readers and should be clarified in the text.
  
  Thank you for highlighting the potential ambiguity. We agree that the sentence “PMC<sup>ESR1</sup> cells drive sphincter relaxation” can seem paradoxical when our data show increased EUS bursting. In adult mice, the EUS does not remain continuously relaxed during voiding; instead, it generates rhythmic bursting composed of high-frequency spike clusters (active periods) alternating with low tonic activity (silent periods), resulting in rhythmic contractions and relaxations of EUS. This phasic activity acts as a pump that facilitates urine flow through the narrow rodent urethra (Kadekawa et al., Am J Physiol Regul Integr Comp Physiol, 2016, PMID: 26818058). The EUS bursting activity we recorded is consistent with the results reported in previous studies (Keller et al., Nat Neurosci, 2018, PMID:30104734; Ito et al., Elife, 2020, PMID:32347794).
  
  Consequently, when PMC<sup>ESR1</sup> neurons initiate bursting, they simultaneously generate the relaxation phases that separate the spikes. To make this explicit we have replaced the phrase “PMC<sup>ESR1+</sup> cells drive sphincter relaxation” with “PMC<sup>ESR1</sup> neurons trigger EUS bursting, which generates rhythmic sphincter contractions and relaxations.” (Results, page 7, lines 219-221). We have applied similar clarifications throughout the revised manuscript (Results, lines 125-129). We hope this revision eliminates any apparent contradiction.
  
  (8) The question remains as to which neurons (dual projecting, single projecting, or all?) are active in natural urination. This is possible to do through dual injection of retrograde virus in SPN and DGC that could coordinately turn on Gcamp, but this challenging experiment is perhaps beyond the scope of this paper. Even still, the authors could discuss their model for whether the dual- and single-projecting neurons are all engaged at once in a natural urination event. Do the authors have any data that could provide insight as to when these sub-populations are active? Results from the opto-tagging in Figure 1 (and comment #2 about single neuron firing properties) might provide a foundation for hypotheses or insights.
  
  Thank you for this valuable suggestion. We have now performed the experiment you proposed: dual injection of retrograde virus (AAV-Retro-Cre and AAV-Retro-DIO-GCaMP6s) in SPN and DGC were used to selectively label PMC dual-projecting neurons, and a 200-µm optic fiber was implanted above the PMC to record their Ca<sup>2+</sup> dynamics during natural urination (Figure supplement 11A and Methods, lines 470–474, 652-655). Dual-projecting neurons exhibited robust activation throughout the entire voiding phase that was tightly correlated with intravesical pressure rise and EUS bursting (Figure supplements 11A–11H). However, technical limits of current retrograde tools preclude selective isolation of single-projecting (SPN-only or DGC-only) subsets for independent fiber-photometry recordings and injection restricted to one target unavoidably labels both single- and dual-projecting cells. We now state this technical limitation explicitly (Discussion, lines 426-430).
  
  Accordingly, in the revised Discussion (lines 389-406), we integrate fiber-photometry Ca<sup>2+</sup> signals with single-unit data from opto-tagged recordings to propose several testable, non-mutually-exclusive models for how dual- and single-projecting PMC<sup>ESR1+</sup> neurons are engaged during natural urination: “Based on population dynamics obtained by fiber photometry (Figures 1D-1H, Figure supplements 1A-1F, and Figure supplements 11A-11H) and single-neuron firing properties recorded via optrode (Figures 1A-1C), we propose several mechanistic models for the engagement of dual- and single-projecting PMC<sup>ESR1+</sup> neurons during natural micturition. One possibility is that all three populations (dual-projecting, SPN-projecting and DGC-projecting neurons) are co-activated, with the dual-projecting subset acting as a “bridging amplifier” that sustains rising bladder pressure while coordinating EUS relaxation. Alternatively, SPN-projecting neurons may be recruited first to initiate bladder contraction, followed by DGC-projecting neurons that evoke EUS bursting and facilitate urine entry into the urethra; once flow begins, the urethro-detrusor facilitative reflex could recruit dual-projecting neurons to further enhance voiding efficiency. In addition, contextual or state-dependent urination—such as scent-marking behavior characterized by multiple voiding events with smaller volumes than reflexive urination—may predominantly rely on sequential and cooperative activation of single-projecting neurons. Other recruitment sequences remain conceivable. Future studies combining diverse urination-related behavioral paradigms with simultaneous recordings from projection-specifically labeled PMC neurons will be required to validate and refine these models.”
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The paper by Li et al explored the role of Estrogen receptor 1 (Esr1) expressing neurons in the pontine micturition center (PMC), a brainstem region also known as Barrington's nucleus (Hou et al 2016, Keller et al 2018). First, the author conducted bulk Ca2+ imaging/unit recording from PMCESR1 to investigate the correlations of PMCESR1 neural activity to voiding behavior in conscious mice and bladder pressure/external urethral muscle activity in urethane anesthetized mice. Next, the authors conducted optogenetics inactivation/activation of PMCESR1 to confirm the contribution to the voiding behavior also conducted peripheral nerve transection together with optogenetics activation to confirm the independent control of bladder pressure and urethral sphincter muscle.
  
  We sincerely thank you for providing a thoughtful summary and insightful comments on our study.
  
  Weaknesses:
  
  (1) The study demonstrates that pelvic nerve transection reduces urinary volume triggered by PMC ESR1+ cell photoactivation in freely moving mice. Could the role of pudendal nerve transection also be examined in awake mice to provide a more comprehensive understanding of neural involvement?
  
  Thank you for this valuable suggestion. We conducted an additional experiment to determine the contribution of the pudendal nerve to PMC<sup>ESR1+</sup> neuron-driven voiding in awake mice. Bilateral pudendal nerve transection (PDNx) reduced the optogenetically evoked urine volume compared with sham-operated controls, yet photoactivation of PMC<sup>ESR1+</sup> neurons still reliably induced urination after PDNx (new Figure 6). Thus, bilateral integrity of the pudendal nerve is required for efficient PMC<sup>ESR1+</sup> neuron-driven voiding, most likely by transmitting the signals that entrain rhythmic EUS bursting. These data and experimental details have been incorporated into Figure 6, Results (lines 272–276), and Methods (lines 542–545).
  
  (2) While the paper primarily focuses on PMCESR1+ cells in bladder-sphincter coordination, the analysis of PMCESR1+-DGC/SPN neural circuits - given their distinct anatomical projections in the sacral spinal cord - feels underexplored. How do these circuits influence bladder and sphincter function when activated or inhibited? Also, do you have any tracing data to confirm whether bladder-sphincter innervation comes from distinct spinal nuclei?
  
  Thank you for this critical comment. To determine how PMC<sup>ESR1+</sup> neurons that target distinct sacral nuclei influence bladder–sphincter coordination, we first focused on the dual-projecting subset in a new experiment (Figures supplement 11 and Methods, lines 470–477, 652-655, 669-673). Dual retrograde virus injections into SPN and DGC selectively labelled PMC dual-projecting neurons, a subset of which are ESR1+. Fiber-photometry recordings showed that these cells were active during bladder contraction and sphincter relaxation (Figure supplements 11E-11H), whereas optogenetic activation reliably initiated urination: bladder pressure rose immediately and was followed by rhythmic EUS bursting (Figure supplements 11I-11N and 12B; Results, lines 309-313, 332-335). Thus, the dual-projecting sub-population is sufficient to coordinate bladder contraction with sphincter relaxation. Current retrograde tools do not allow selective isolation of single-projecting (SPN-only or DGC-only) subsets; injecting only one target unavoidably labels both single- and dual-projecting cells. Consequently, we cannot yet compare the functional impact of pure SPN-only versus DGC-only PMC populations. This limitation is now stated explicitly in the revised Discussion (lines 426–430).
  
  In our 2025 paper (Yan et al., Commun Biol, 2025, PMID: 40259086), we used PRV-based retrograde tracing to show that SPN and DGC constitute two separate spinal nuclei controlling the bladder and the EUS, respectively. Classic studies have reached the same conclusion (Yao et al., Nat Neurosci, 2018, PMID: 30361547; Karnup & De Groat, IBRO Reports, 2020, PMID: 32775758; Karnup, Auton Neurosci, 2021, PMID: 34391124). These citations and a concise summary have been added to the Results (lines 289–294).
  
  (3) Although the paper successfully identifies the physiological role of PMCESR1+ cells in bladder-sphincter coordination, the study falls short in examining the electrophysiological properties of PMC ESR1+-DGC/SPN cells. A deeper investigation here would strengthen the findings.
  
  Thank you for this thoughtful suggestion. While a detailed electrophysiological characterization of PMC<sup>ESR1+-DGC/SPN</sup> neurons would provide complementary information, the primary goal of the present study was to define the in vivo functional dynamics and behavioral role of these neurons during natural urination. As you suggested, further electrophysiological analysis of PMC<sup>ESR1+-DGC/SPN</sup> neurons will be an important direction for our future work.
  
  (4) The parameters for photoactivation (blue light pulses delivered at 25 Hz for 15 ms, every 30 s) and photoinhibition (pulses at 50 Hz for 20 ms) vary. What drove the selection of these specific parameters? Moreover, for photoactivation experiments, the change in pressure (ΔP = P5 sec - P0 sec) is calculated differently from photoinhibition (Δpressure = Ppeak - Pmin). Can you clarify the reasoning behind these differing approaches?
  
  Thank you for this opportunity to clarify our experimental design. The photoactivation protocol (25 Hz, 15 ms pulses) was chosen because PMC<sup>ESR1+</sup> neurons faithfully follow this frequency without depolarisation block and it reliably triggers voiding (Keller et al., Nat Neurosci, 2018, PMID:30104734). For photoinhibition we originally stated “50 Hz, 20 ms pulses”, but this was an error. Consistent with the same study (Keller et al., Nat Neurosci, 2018, PMID:30104734), we used continuous light (constant illumination) to maintain sustained suppression. The Methods section has been corrected (lines 659-661, 690-691).
  
  The ΔP formula was tailored to the temporal profile of each manipulation. For activation, ΔP (P<sub>5 sec</sub> - P<sub>0 sec</sub>) captures the rapid pressure rise after light onset; the same window was used in (Hou et al., Cell. 2016, PMID: 27662084). For inhibition, because saline infusion produces rhythmic reflex voiding, we delivered light at the onset of EUS bursting (i.e. when pressure was already at ~peak). Inhibition abruptly stops the bladder contraction, so the bladder cannot return to its pre-void baseline. The Δpressure (P<sub>peak</sub> – P<sub>min</sub>) was therefore used to quantify the extent to which the ongoing pressure wave was aborted by photoinhibition. P<sub>min</sub> is the lowest value reached before the next infusion-driven upswing, making the metric insensitive to the slow baseline drift produced by continuous infusion. These clarifications have been added to the Methods (Methods, lines 676-677, 679-680, 692-693).
  
  (5) The discussion could further emphasize how PMCESR1+ cells coordinate bladder contraction and sphincter relaxation to control urination, highlighting their central role in the initiation and suspension of this process.
  
  Thank you for this valuable comment. We have revised the Discussion to emphasize that PMC<sup>ESR1+</sup> neurons coordinate urination by sequentially driving bladder contraction followed by sphincter relaxation through their dual projections to the SPN and DGC. We also emphasized that this coordination is essential for the initiation and effective execution of voiding (Discussion, lines 369-388). In addition, in the revised Discussion (Discussion, lines 389-406), we integrate fiber-photometry Ca<sup>2+</sup> signals with single-unit data from opto-tagged recordings to propose several testable, non-mutually-exclusive models for how PMC<sup>ESR1+</sup> cells are engaged during natural urination.
  
  (6) In Figure 8, The authors analyze the temporal sequence of bladder pressure and EUS bursting during natural voiding and PMC activation-induced voiding. It would be acceptable to consider the existence of a lower spinal reflex circuit, however, the interpretation of the data contains speculation. Bladder pressure measurement is hard to say reflecting efferent pelvic nerve activity in real time. (As a biological system, bladder contraction is mediated by smooth muscle, and does not reflect real-time efferent pelvic nerve activity. As an experimental set-up, bladder pressure measurement has some delays to reflect bladder pressure because of tubing, but EUS bursting has no delay.) Especially for the inactivation experiment, these factors would contribute to the interpretation of data. This reviewer recommends a rewrite of the section considering these limitations. Most of the section is suitable for the results.
  
  We agree with the reviewer that bladder pressure, mediated by smooth muscle contraction, provides an indirect measure of efferent pelvic nerve activity and is subject to both physiological and experimental delays. Regarding potential delay from the tubing system, pressure propagates in fluid at approximately 1000 m/s (Kela & Pekka, Proceedings of World Academy of Science Engineering & Technology, 2009, DOI: 10.5281/zenodo.1080526). Given that the total tubing length in our setup is 0.5-1 meter, this gives an estimated transmission delay of only 0.5-1 ms. However, this delay is negligible compared with the observed time difference (~700 ms) between the cessation of EUS bursting and the termination of bladder contraction. Theoretically, pressure transmission is not expected to introduce a temporal delay. However, we cannot exclude the possibility that the pressure measurement itself may impose such a delay, because bladder pressure does not necessarily reflect efferent pelvic nerve activity in real time. Future studies using simultaneous recordings of bladder pressure and pelvic nerve discharges will help clarify whether a true temporal delay exists. Nevertheless, we agree that additional physiological or peripheral factors may also contribute to this difference in timing. As suggested by the reviewer, we have revised the discussion to consider the potential influence of other factors, such as urethra-detrusor facilitative reflex (Results, lines 343-349).
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) In opto-tag experiments, a comparison of average AP waveform during behavior and during light stimulation should be included as criteria. It should be mostly the same waveform.
  
  Thank you for bringing this to our attention. We have now added this comparison as an inclusion criterion in the revised manuscript. Figure supplement 3B shows representative examples of the average waveforms, and Figure supplement 3C displays the distribution of correlation coefficients between spontaneous and light-evoked spikes for all recorded PMC<sup>ESR1+</sup> units, all of which exhibited r > 0.8.
  
  (2) Optical fiber implantation seems to be done in two different methods. In Figure 1 and Figure 2, the fiber tip is positioned just above PMC but in Figure 3 it seems to be angled. The information should be included in the Methods section.
  
  Thank you for this important comment. We have now clarified in the Methods that for Figures 1 and 2, the optical fibers were implanted vertically above the PMC, whereas for Figure 3, the left optical fiber was implanted at a 33° lateral angle targeting the PMC (Methods, lines 499-503).
  
  (3) In the closed-loop inhibition experiments of Figure 2, the parameters to start closed-loop photo-inactivation were not described in the method. If it is a manual closed loop, it should be described clearly.
  
  Thank you for raising this important point. We apologize for omitting these details in the original Methods. We have now added a complete description of the manual closed-loop photo-inhibition protocol, including the triggering criteria and operator-controlled timing, in the revised Methods section (lines 602–605).
  
  (4) In Figure 7A/E the authors provide a spinal cord image to show the injection site, but the image is misleading. The figure only shows AAV-infected CRH/ESR1 neurons in the spinal cord section. It does not indicate the AAV injection site or the terminal distribution.
  
  Thank you for your important comment. We apologize for providing a spinal cord image that did not accurately depict the injection site. To rigorously verify that our spinal injections were confined to SPN or DGC, we performed new retrograde-tracing experiments in ESR1-Cre and CRH-Cre mice. A mixture of AAV-Retro-DIO-mCherry or AAV-Retro-DIO-EGFP with the retrograde tracer CTB-647 was injected specifically into SPN or DGC. Only animals in which CTB-647 fluorescence was strictly limited to the target nucleus, without spread to the adjacent region, were included (new Figures 7A and 7E). These data confirmed our original observations and have been pooled in Figure 7. The manuscript and figure have been updated accordingly (Results, lines 297-301, 304-306; Methods, lines 465–466).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.01.616107v2
www.biorxiv.org www.biorxiv.org

Quantifying microbial fitness in high-throughput experiments

1
1. Public_Reviews 19 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank both editors and the three reviewers for their constructive criticism of our work. As a result of these comments, we have made several significant revisions to the paper that we believe strengthen and clarify our major results:
  
  (1) Following suggestions from Reviewers #1 and #3, we have have improved our introduction to the different fitness concepts (lines 105–148) and streamlined the discussion of the logit encoding (lines 175–190). In particular, we have moved the most technical points to the SI (Sec. S3).
  
  (2) Based on criticisms of our usage of the population dynamics model from Reviewers #1 and #3, we significantly revised our explanation of the motivation and interpretation of this model (lines 284–310 and 323–336) and our discussion of the generalizability of these results (lines 678–728), including the possible effects of interactions besides resource competition.
  
  (3) Following a request from Reviewer #3, we have expanded our analysis of epistasis to systematically test all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 344–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).
  
  (4) Following concerns from Reviewers #2 and #3 about the limited empirical data, we have expanded our analysis of the LTEE data (new main text Fig. 4, revised text on lines 416–439, and revised SI Figs. S16–S18) and have analyzed two new benchmarking datasets for bulk fitness to test our predictions (new main text Fig. 6, new Results subsection on lines 561–590, and new SI Figs. S24 and S25).
  
  (5) Following the criticism of Reviewer #3 about the lack of a clear recommendation on fitness quantification that provides the greatest value for a given scientific question, we have better explained what we think the scientific consequences of fitness are as a motivation for our analysis (lines 82–88, 319–322, and 615–630) and replaced the final flowchart figure with a step-by-step guide in the Methods to implement our recommendations in practice (lines 964–982).
  
  Reviewer #1 (Public review):
  
  The authors point out that the fitness estimates obtained from different experimental assays (monoculture, pairwise competition, or bulk competition) are not generally equivalent, not even with regard to the fitness ranking of different genotypes. Using a computational model based on experimentally measured growth phenotypes for knockout strains in yeast, as well as data from Lenski’s Long Term Evolution Experiment (LTEE), they derive a set of best practice rules aimed at extracting the optimal amount of information from such experiments.
  
  The study is very complete on a technical level and I have no suggestions for further analyses. However, I feel the readability and the conceptual focus of the manuscript could be significantly improved by rearranging the material with regard to the contents of the main text vs. the Methods and the Supplement. Detailed recommendations:
  
  (1) Regarding readability, the large number of references to material in the Methods and Supplement fragment the main text and make it difficult to follow.
  
  We understand the challenges these references pose to the flow of the main text; we have attempted to keep those references to a minimum, while ensuring that technical details of the work are fully documented and referenced for completeness.
  
  (2) Conceptually, it seems to me that the current presentation obscures the reasons why we should care about fitness in the first place. In the first paragraph of Results, the authors define fitness “as any number that is sufficient to predict the genotype’s relative abundance x(t) over a short-time horizon”. To me, this seems like an extremely narrow and not very interesting definition. Instead, I view fitness as an intrinsic property of a genotype that allows us to predict its performance under a range of conditions, including in particular conditions that are different from the experimental setup that was used to obtain the fitness estimates. The latter viewpoint is well expressed in Supplementary Section S1, where the authors discuss the notion of fitness potential. I would recommend to move at least part of this discussion to the main text.
  
  We appreciate the reviewer’s viewpoint and have moved that conceptual discussion from the SI to the beginning of the Results section to give readers a broader perspective on fitness (lines 105–148). We use “potential” in analogy with potential energy in physics and have clarified this on lines 126–135.
  
  What we call fitness potential, like the other notions of fitness we discuss in this paper (relative and absolute fitness), is still specific to an environmental condition. Fitness as a property intrinsic to a genotype and independent of any environment, as the reviewer mentions, is an interesting concept but beyond the scope of this paper, which is focused on analyzing fitness measurements that are inevitably environment-specific and we have clarified this on lines 142–148. While it is true that this definition of fitness is narrow, it is what can be empirically measured directly, and thus we believe it is crucial to understand how to best interpret that data.
  
  By comparison, the arguments in favor of the logit encoding that currently opens the Results session are rather straightforward and could be shortened significantly.
  
  We agree and have condensed this section (lines 175–192).
  
  (3) Similarly, the modeling strategy used in this work is quite subtle and needs to be explained more fully in the main text. The authors use growth traits (lag time, growth rate, and yield) extracted from monoculture experiments on a yeast knockout collection and feed them into a specific mathematical model to simulate pairwise and bulk competition scenarios. Since a key claim of the work is that monoculture experiments are generally poor predictors of competitive fitness, the basis for this conclusion and the assumptions on which it is based need to be described clearly in the main text. In the current version of the manuscript, this information has been largely relegated to the Methods section.
  
  We agree that our motivation for the population dynamics model and growth curve data was not clearly explained. We have significantly revised this section of the Results in the main text (lines 284–310).
  
  In particular, we recognize the potential for misunderstanding this material we do not intend the relative fitness values calculated from this model to be interpreted as predictions of the true relative fitness between yeast deletion strains. Rather, we use the population dynamics model for our proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). We have added a statement to highlight existing work on monoculture predictors for competition outcomes [32, 34, 36, 37] on lines 453–459.
  
  Reviewer #1 (Recommendations for the authors):
  
  In the discussion of the LTEE in Section S8, the authors write on page 8 that “we couldn’t fit the fitted values a,b in ref. 29 so we were unable to check it”. I don’t understand this sentence - is the claim that the fit in ref. 29 was incorrect?
  
  We have clarified this point in the SI (now Sec. S9). Our point was not that the fit in Wiser et al. 2013 is incorrect, but merely that we could not find the exact values of the fitted parameters they obtained documented in their paper, so we could not compare our own fitted parameters directly to theirs.
  
  Also, at the end of the section, the authors refer to theory work on the long-term fitness trend in the LTEE. Here, two early references arguing for a logarithmic increase in fitness could be mentioned as well:
  
  International Journal of Modern Physics B 12,:361-391 (1998) Evolution and Extinction Dynamics in Rugged Fitness Landscapes Paolo Sibani, Michael Brandt, and Preben Alstrøm
  
  J. Stat. Mech. (2008) P04014 Evolution in random fitness landscapes: the infinite sites model Su-Chan Park and Joachim Krug
  
  We thank the reviewer for providing these two references and have added them to the list of previous works on long-term fitness trends at the end of the section (now Sec. S9).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The manuscript “Quantifying microbial fitness in high-throughput experiments” provides a comprehensive analysis of the various approaches to quantifying fitness in microbial evolution, focusing on three primary factors: encoding of relative abundance, time scale of measurement, and the choice of reference subpopulation. The authors systematically explore how these choices impact fitness statistics and provide recommendations aimed at standardizing practices in the field. This manuscript aims to highlight the impact of differing fitness definitions and the methodologies utilized for analysis and how that can significantly alter interpretations of mutant fitness, affecting evolutionary predictions and the overall understanding of genetic interactions in the experiments. Although this manuscript focuses on a critical issue in the quantification of fitness in high throughput experiments, it heavily relies on only one experimental dataset (Warringer et al 2003) and one organism i.e, Yeast (Saccharomyces cerevisiae) grown in a defined medium, the environmental influence is not completely captured. While the theoretical framework is strong, more experimental examples with more organisms (i.e., more datasets) in their analysis and comparison would enhance the manuscript, especially its conclusion.
  
  We have expanded our analysis of competition data from the Long-Term Evolution Experiment in E. coli (lines 416– 439), including adding a main text figure (Fig. 4) along with the three SI figures (Figs. S16–S18). We have also added two completely different data sets that directly test our predicted discrepancies in fitness estimates from bulk competition experiments. From this data we have added a new main text figure (Fig. 6), two new SI figures (Figs. S24 and S25), and a new section at the end of the Results (lines 563–590).
  
  We wish to clarify, though, that the aim of this study is to develop theory on fitness quantification choices and minimal examples to demonstrate the potential for discrepancies between these choices. While we appreciate the reviewer’s interest in understanding how discrepancies in fitness statistics vary across organisms and environments, that is an empirical question beyond the scope of this paper.
  
  Strengths:
  
  The choices for quantifying fitness in evolution experiments are critical and highly relevant given the increasing prevalence of high-throughput experiments in evolutionary biology. The authors methodically categorize fitness statistics and their implications, providing clarity on a complex subject. This structured approach aids in understanding the nuances of fitness measurement. The manuscript effectively highlights how different choices in fitness measurement can influence fitness rankings and the understanding of epistasis, which is important for modeling evolutionary dynamics.
  
  Weaknesses:
  
  The theoretical framework is robust, but the manuscript could benefit from more empirical examples to illustrate how different fitness quantification methods lead to varied conclusions in experiments.
  
  Please see our response to the previous comment on this point.
  
  The discussion on the choice of reference subpopulation could be expanded with the influence of the environment or the condition. Different types of reference groups might yield different implications for fitness calculations, and further elaboration would enhance this section.
  
  While we agree that studying how environmental conditions affect fitness is an important and interesting problem, it goes beyond the scope of this paper, which focuses on the basic theory of quantifying microbial fitness from highthroughput experiments. Applications of this theory to empirical questions about environmental variation would be best served by their own studies. We have added a statement clarifying this goal (lines 144–148).
  
  We are unsure how the choice of reference subpopulation is related to this issue. In our view, if the goal of a mutant fitness measurement is to predict how that mutant would behave when arising spontaneously and competing against its immediate ancestor, the gold-standard reference subpopulation must always be the mutant’s immmediate ancestor, or another mutant that is known to be phenotypically equivalent to the ancestor (e.g., neutral mutants in the case of a large mutant library). Other choices of reference subpopulations would not provide directly meaningful information in this regard.
  
  The authors overgeneralize some findings; for instance, the implications of fitness measurement choices could vary significantly across different microbes or experimental conditions. A more detailed discussion would strengthen the conclusion.
  
  We certainly agree that the consequences of fitness quantification choices could vary significantly across organisms and environments; our goal for this paper is to demonstrate what discrepancies are possible in principle and in particular how they depend on basic features of microbial population dynamics (e.g., variation in yield). We have added two separate paragraphs in the Discussion section to address the generalizability of our results in the context of pairwise (lines 678–710) and bulk fitness measurements (lines 711–728).
  
  Overall, this manuscript is a significant contribution to the field of evolutionary biology, addressing a critical issue in the quantification of fitness but lacks more experimental support to make it a wider claim. By systematically exploring the factors that influence fitness measurements, the authors provide valuable insights that can guide future research - the framework is computationally thorough but needs a more detailed explanation of concepts instead of generalizing.
  
  We have improved our explanation of several of the important concepts. In particular, we have significantly revised our explanation of the population dynamics model (lines 284–310) to emphasize its role as a null model to demonstrate how fundamental aspects of microbial growth are sufficient to cause discrepancies between fitness statistics. We have also revised two paragraphs on the generalizability of our results in the Discussion section (lines 678–728).
  
  Further work is needed, particularly to incorporate empirical examples and expand certain discussions to include environmental variation and their impact, which would improve clarity and applicability.
  
  We have added a sentence at the beginning of the Results section to acknowledge the environmental dependence of fitness (lines 142–148). We believe further discussion of that issue is beyond the scope of this paper, as it would require a significant amount of additional data and/or environmental modeling.
  
  Reviewer #2 (Recommendations for the authors):
  
  In addition to the comments from the previous sections, other specific comments:
  
  (1) Figure 5 needs to be populated with additional parameter details. For example, include brief descriptions of each parameter involved in the encoding, time scale, and reference choices. This will help users understand the implications of each choice. Adding these details will make the flow diagram more comprehensive, aiding researchers in implementing these steps more clearly.
  
  Following this comment and another comment about this figure from Reviewer #3, we decided to replace this figure with a new Methods section with step-by-step instructions (lines 964–982).
  
  (2) Duplication in Line 620: “Nevertheless, the fact that we see the fact that we see...” This redundancy needs to be corrected.
  
  We thank the reviewer for pointing this out; we have rewritten this paragraph.
  
  (3) More experimental data comparisons and their assessment concerning various microbial systems and multiple environmental conditions are recommended to support the claim.
  
  Please see our responses to the related public comments.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors present analyses of different fitness measures derived from empirical data from yeast knockout mutants and the long-term evolution experiment (LTEE) with Escherichia coli to explore discrepancies and identify preferred methods to estimate relative fitness in high-throughput experiments. Their work has three components. They first discuss the different “encodings” of relative abundance data and conclude that logit transformations are preferred because they transform nonlinear abundance trajectories into linear trajectories with greater predictive power. Next, they compare per-generation with per-growth cycle relative fitness estimates inferred from simulations of pairwise competitions based on published growth traits for the yeast strains and on published pairwise competition measurements for the LTEE data. Both data sets show quantitative and qualitative (i.e. rank order) discrepancies of estimates across different time scales, which are highlighted by considering possible underlying causes (i.e. trade-offs between growth traits) and consequences (i.e. epistasis among mutations affecting different growth traits). Finally, the authors compare simulated pairwise and bulk (i.e. where many mutants compete during a growth cycle in a single environment) competition assays based on the yeast knock-out mutants and demonstrate an optimal ratio of collective mutants to wild-type strains that minimizes both sampling error and overestimation of fitness estimates when compared with pairwise competitions.
  
  Strengths:
  
  The study deals with a highly relevant topic. Fitness is central to general evolutionary theory, but also poorly defined and implies different traits for different organisms and conditions. For microbes, which are often used in evolution experiments, high-throughput experiments may yield different measures to quantify abundance over time, from individual growth traits to bulk competition experiments. Hence, it is relevant to consider discrepancies among those measures and identify preferred measures with respect to predicting population dynamics and evolutionary processes. The present study contributes to this aim by (i) making readers aware of differences among commonly used fitness estimates, (ii) showing that simulated (yeast) and calculated (E. coli) competitive fitness may differ across time scales, and (iii) showing that bulk competitions may yield relative fitness estimates that are systematically higher than pairwise competitions. The study is rather thorough on the theory side, with extensive derivations and analyses of various fitness measures using their resource competition model in the Supplementary Information. The study ends with a few practical recommendations for preferred methods to infer relative fitness estimates, that may be useful for experimentalists and stimulate further investigations.
  
  Weaknesses:
  
  The study has several limitations. Perhaps the most apparent limitation is the lack of a clear answer to the question of which fitness measure is best “in the light of first principles”. The authors show clear discrepancies between fitness estimates across different time scales or using different reference genotypes in bulk competition and provide useful recommendations based on practical considerations (e.g. using pairwise competitions as the “golden standard”), but it remains unclear whether these measures provide the greatest value for the questions researchers may want to answer with them (e.g. predict shifts in genotype frequencies).
  
  We agree on the importance of considering the scientific questions researchers want to answer in determining the best way to quantify fitness. We have revised both the Introduction (lines 82–88) and the Discussion (lines 615–630) to more clearly explain possible downstream questions researchers may wish to answer with fitness data, and thus why discrepancies in that data based on analysis choices may be important.
  
  We believe that the text does provide a specific recommendation (second subsection of the Discussion, lines 635– 658) for how to quantify relative fitness: using the logit encoding (rather than other encodings), measuring fitness per-cycle (rather than per-generation), and using the wild-type or a phenotypically-equivalent proxy as reference subpopulation to calculate pairwise fitness in a bulk competition (rather than using the mutant library as a whole). This recommendation is based on first principles: the logit encoding is based on the principle of the logistic equation as the null model of relative abundance dynamics (lines 635–637), the choice of the per-cycle timescale is based on the principle that in non-steady state environments the time scale for measuring selection should not depend on the wild-type growth (lines 640–645), and the choice of reference population is based on the principle that a mutant’s fitness should serve as a predictor of its dynamics when arising de novo at low frequency and competing against its wild-type (lines 648–653).
  
  A second limitation is that the authors analyse fitness differences arising solely from resource competition, whereas microbes often interact via other mechanisms, e.g. the production of anticompetitor toxins, cross-feeding of metabolites, or lack of growth to enhance their persistence in stress conditions. Without simulations of these processes, understanding discrepancies among fitness measures is necessarily limited.
  
  We agree that other interactions are important in many microbial ecosystems and could affect measurements of fitness. We discuss the possibility of these other interactions and their potential consequences for fitness on lines 697– 710.
  
  We focus on resource competition in this paper, however, for two reasons. One is that we are using it as a null model: resource competition is always present, and thus it provides an important baseline for discrepancies in fitness statistics in the absence of any other assumptions. Indeed, our results are that this minimal assumption alone is sufficient to produce a wide range of significant discrepancies, which provides the proof of principle that choices of fitness quantification matter. We have clarified this in a revised explanation of the population dynamics model on lines 294–304.
  
  The second reason is that fitness measurements of the type discussed in this paper are typically performed on mutants that have only small genetic differences with their ancestor (e.g., a point mutation or gene deletion). While more complex interactions between such similar genotypes are not impossible, we expect them to be rare, in which case resource competition is the only interaction. Explicit modeling of other interactions is an important question for future work, but would require more detailed models and data of those phenomena, and thus would go beyond the scope of the present study. We have added a sentence to explain our emphasis on resource competition on lines 298–301 and 690–697.
  
  In addition, the analysis of trade-offs between growth traits causing these discrepancies during resource competition seems confounded by biases in measurement error or parameter estimation, at least for growth rate and lag time (Figure 2B), where the replicate estimates for the wildtype show a similar negative correlation.
  
  The tradeoff between growth traits was only an incidental observation and is not necessary for the fitness statistic discrepancies we analyze in this paper; the only important pattern in the growth traits is the existence of mutants with reduced yields (so as to reduce the wild-type log fold-change in a competition) as well as variation in one other trait under selection (lag time or growth rate in this model). We have clarified this mechanism on lines 328–336, which is demonstrated by Fig. S7. Since these tradeoffs are not relevant to the results and we agree that their significance may be unreliable due to the noisiness of the data, we have removed mention of them.
  
  Third, the study does not validate relative fitness predictions from growth traits (as is done for the yeast mutants) with measured relative fitness estimates using competition assays, while such data are available, e.g. for the LTEE. This would strengthen their inferences about preferred fitness measures.
  
  The goal of our modeling with the yeast growth trait data is not to test the ability to predict competition experiments from monoculture data; that has been the focus of previous studies [32, 34, 36, 37]. Rather, we use the population dynamics model for a proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). The yeast growth curve data merely provides realistic parameters for this model, to ensure we are studying a biologically relevant regime of the dynamics. To avoid this misconception, we have revised our explanation of this model and the data on lines 284–310.
  
  Fourth, the analysis of epistasis between mutations affecting different growth traits (shown in Figure 3) based on the LTEE data could be better introduced and analysed more comprehensively. Now, the examples given in panels C-F seem rather idiosyncratic and readers may wonder how general these consequences of using fitness estimates based on different time scales are.
  
  We agree that this analysis was incomplete and missed an opportunity to emphasize this important consequence of fitness quantification. We have thus expanded this analysis into a systematic test of all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 346–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).
  
  Finally, the study is generally less accessible to experimentalists due to the extensive and principled treatment of specific population dynamic models and fitness inferences. This may distract from the overarching aim to identify fitness measures that are most accurate and useful for predictions of population dynamics and evolutionary processes.
  
  We appreciate this concern as we do hope to make the paper as broadly accessible as possible, especially to experimentalists who measure microbial fitness. To this end, we have reduced the technical discussion of encodings in the first section of the Results (lines 164–187); revised explanations of the population dynamics model (lines 284–310), importance of growth trait variation (lines 328–336), and epistasis (lines 346–395) to better emphasize the conceptual intuition of these parts; and added a step-by-step guide for our recommended best practices of quantifying fitness in bulk competition experiments (lines 964–982).
  
  In this light, the motivation for the initial discussion of the importance of how to best encode relative abundance (Figure 1) is unclear. Also, the conclusion, that logit encoding is preferred, because it linearizes logistic growth dynamics and “improves the quality of predictions”, is not further motivated. Experimentalists using non-linear models to infer fitness from growth curves or competition assays may miss the relevance of this discussion.
  
  The motivation for the discussion of encodings is that it is one of the choices made differently by researchers, mainly using either the logit (more common in experimental evolution and population genetics studies) or log encoding (more common in TnSeq analyses). As such we believe it is important to explain where this choice comes from (a transformation of relative abundance data to make it approximately linear in time, and thus amenable to characterization by a single slope parameter) and why we believe the logit encoding is more logical in most cases. We have streamlined and revised this subsection to make it clearer (lines 164–187).
  
  Our argument for favoring the logit encoding in most cases is based on the logistic model being a null model for relative abundance dynamics (Sec. S3). In light of the reviewer’s comments, we have realized this may be confusing because there are two common usages of logistic dynamics that are biologically distinct. What we mean by logistic model is the dynamics of relative abundance x of a mutant in competition with other genotypes:
  
  Here s turns out to be the relative fitness under the logit encoding. On the other hand, researchers also use a logistic ODE to describe the dynamics of absolute abundance N of a single strain in monoculture (e.g., as in a growth curve):
  
  We believe the reviewer’s last point refers to Eq. (2), whereas our argument about the logit encoding is based on Eq. (1). We have added a note to clarify this distinction for the reader (lines 192–196).
  
  Reviewer #3 (Recommendations for the authors):
  
  In addition to my general comments in the public review, I have several more specific recommendations:
  
  (1) Line 183-189: unclear why logit-based relative fitness is preferred. Abundance data are not typically binomial.
  
  We agree this claim about abundance data was incorrect and have removed it. We have revised the section to focus on motivating the logit encoding from logistic dynamics of relative abundance as a null model for most systems (main text lines 175–187 and Sec. S3).
  
  (2) Line 205: it may be mentioned that s(logit) is the same as the “selection rate constant” often used in microbial studies.
  
  We have added a sentence clarifying the equivalence of the logit-encoded relative fitness to the selection coefficient in population genetics (lines 188–190).
  
  (3) Line 368: why do mutations that increase biomass yield also increase WT LFC? Is this, because they grow slower and hence allow the WT more time to grow?
  
  Mutants with higher yield allow the wild-type to achieve higher log fold-change because those mutants consume fewer resources per cell, which frees up more resources for the wild-type to consume and increase its overall growth. It’s not about growth rate or time, as this would occur even for mutants whose growth rates are identical to the wild-type’s. We have revised our explanation of how variation in growth traits differentially affects fitness statistics (lines 323–340) and epistasis (lines 361–378).
  
  (4) Line 382-386: you may want to cite Ram et al. (2019, 10.1073/pnas.1902217116), who also did such analyses for experimental data from E. coli.
  
  We have cited this work as Ref. [34].
  
  (5) Line 415: perhaps use “bulk relative fitness” instead of “total relative fitness”, to contrast with “pairwise relative fitness”.
  
  We acknowledge the language in this section can be subtle. However, “bulk” is not a sufficient identifier for the concept of total relative fitness as bulk competition experiments (with many genotypes competing simultaneously) can be used to measure either total relative fitness or pairwise relative fitness. (In pairwise competition experiments with only two genotypes, these two types of fitness are identical.) As such we adhere to our original language but have added words to clarify which type of experiment (bulk or pairwise) we are talking about in a given context (e.g., on lines 495–504).
  
  (6) Line 451-453: why does a population in bulk competition consume resources more slowly than in pairwise competitions?
  
  Mutant libraries used in bulk competition experiments usually include a large number of deleterious mutants, which grow more slowly than the wild-type. Thus these populations typically consume resources more slowly than a population in a pairwise competition would, where a large part of the population is the wild-type.
  
  (7) Line 565: I don’t understand how one can compare relative fitness to other timescales.
  
  Relative fitness, as we’ve defined it, has units of rate, since it describes the rate of change of relative abundance (or an encoding of it) over some time scale (e.g., a batch growth cycle or a generation). Therefore it can be compared to other times scales of the system, such the rate of new mutations arising or the rate of genetic drift fluctuations, as long as they are measured in the same units. This comparison is important to population genetics analyses, such as determining whether the population is in the strong selection-weak mutation limit or the clonal interference regime.
  
  (8) Line 620 repeats text.
  
  Thank you, we have revised this paragraph and removed the typo.
  
  (9) Figure 1C+D: the link between the scenarios on the left and the graphs on the right may be better explained. For example, it may help to make explicit that the 4 scenarios in panel C show the same relative fitness per cycle and that mutant and wildtype have the same growth rate, but different growth periods in both scenarios in panel D. It is also unclear whether the grey dot links to the upper scenario in D.
  
  We have clarified this issue in the caption and changed the colors to avoid this confusion.
  
  (10) Figure 2E: it is unclear why “mutants with equal fitness are assigned the lowest rank”.
  
  This was a technical comment about how to handle ties in our analysis of mutant rankings, but it is moot since no exact ties actually occur in our simulations. We have removed this remark to avoid confusion.
  
  (11) Figure 2F: the axis labels are confusing, as for the WT estimates no LFC mutant exists. It would also help to make explicit in the legend against which WT replicate/reference strain each strain has competed.
  
  We agree the inclusion of wild-type replicates in this plot was confusing and unnecessary, so we have removed them. The mutants compete against a wild-type with traits defined by their median values across all wild-type replicates; this is noted in Fig. 2A and the Methods section on our analysis of this data (lines 809–813).
  
  (12) Figure 5: I am not sure this is needed, as its information is rather limited.
  
  We agree and have removed this figure.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.20.608874v2
www.biorxiv.org www.biorxiv.org

Nucleolar dynamics are determined by the ordered assembly of the ribosome

1
1. EMBOpress 18 May 2026
  
  in Review Commons
  
  Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
  
  Learn more at Review Commons
  
  Reply to the reviewers
  
  Reviewer #1
  
  Summary: This manuscript has presented a high-throughput fluorescence recovery after photobleaching (HiT-FRAP) platform to screen genes affecting the dynamics of the nucleolar scaffold nucleophosmin (NPM1). The platform included the siRNA-based screening of 65 RNA helicases, 9 phylogenetically related helicase pairs, and 290 ribosomal proteins along with selected assembly factors. These factors were classified as those accelerating or decelerating NPM1 dynamics based on the t1/2 measurements. Combined with nucleolar morphological changes, the authors identified that depletion of early-stage (A-F) and later-stage (G-H) LSU assembly factors resulted in different nucleolar phenotypes, suggesting the pre-ribosome assembly can impact nucleolar morphology. Further exploring the potential mechanis m suggested that the NPM1's intrinsically disordered region (IDR) contributed to the nucleolar organization and dynamics.
  
  Together, this well-designed study uncovered that the ribosome assembly, both the early and late ribosomal precursors can influence biophysical properties of the nucleolus. Below please find our concerns for the authors to consider to strengthen the major conclusions.
  
  Major comments:
  
  The main conclusion that NPM1's biophysical states directly impact its interaction strength with ribosome intermediates (and thereby nucleolar dynamics) should be further strengthened as listed below:
  
  1). Given the nucleolus's complexity, an additional GC factor, or/and one more marker of other nucleolar regions, should be examined to substantiate the proposed impact of LSU-associated factors on nucleolar morphology (Figures 3, 4).
  
  We thank the reviewer for this very important point. We have now included representative images for representative hits in major phenotypic clusters co-stained for SURF6, another GC marker, which shows similar localization patterns as NPM1 (Fig. S4B). For other nucleolar subcompartments, we have included images obtained from a cell line harboring endogenously tagged FBL-mNeonGreen (a marker for the DFC) for representative hits (Fig. S4A). We see a similar overall distribution of the DFC within the GC (i.e. DFCs distribute to fill the area of the disrupted GC), confirming our screen results. We look forward to further examining the changes in nucleolar subcompartment architecture in future work.
  
  As additional support, we note that we probed NOG2, NOP53, and NOP2 in our IF results, all of which are GC-localized factors. We see a very similar distribution for these factors in our hits as for NPM1 (see Fig. S8D). In addition, FISH data for pre-rRNA precursors show similar morphological patterns as NPM1, further confirming our results (Fig. S7). We have noted this in text and have also included representative images in supplement.
  
  2). Additional experiments are needed to support the proposed model that ribosomal intermediates, especially the pre-LSU complexes could determine nucleolar biophysical properties through the interaction with NPM1. Their direct interaction by biochemical assays should be provided. Also, when analyzing the interaction with other nucleolar factors, the authors should provide data that show NPM1 mutant expression levels were comparable to endogenous levels (Figures 4, 6).
  
  We agree that directly probing NPM1's interactions with LSU precursors is critical to supporting our model, and we have addressed this through several complementary biochemical approaches. First, we performed immunoprecipitation of tagged NPM1 (NPM1-mScarlet, IP-ed using RFP-trap agarose) and assessed interaction with pre-LSU rRNA transcripts via Northern blot (Fig. 5D). We find that NPM1 interacts strongly with the 32S pre-rRNA. Second, we performed sucrose gradient sedimentation and find that NPM1 preferentially co-migrates with pre-60S complexes (Fig. 5B). Together with previous reports of NPM1-pre-LSU interactions, these data provide direct biochemical support for the proposed interaction.
  
  To test whether interaction strength with pre-LSUs could regulate NPM1 dynamics, we next asked whether our NPM1 mutants that differ in their dynamics in turn interact differentially with pre-LSU complexes. Using co-IP Northern blot for ITS2 and sucrose co-sedimentation, we find that NPM1 mA3 pulls down more 32S and co-sediments more robustly with pre-60S complexes, while NPM1 mB2 shows reduced association (Fig. 5D, E; Fig. S10F, G). These data support that the strength of the NPM1-pre-LSU interaction is a determinant of NPM1 exchange dynamics, and, by extension, of nucleolar biophysical properties.
  
  Exogenous mutant NPM1 is expressed at approximately 10% of endogenous levels (Fig. S10A). We address this in two ways. First, all interaction comparisons are made between WT and mutant exogenous constructs, not against endogenous NPM1, controlling for expression level differences. Second, we observe similar effects on interactions both in the presence of endogenous NPM1 and in null backgrounds, indicating that the differences we detect reflect NPM1 mutation, not expression level.
  
  3). Northern Blotting should be done to dissect which pre-rRNA intermediates interact with NPM1 and contribute to the nucleolar dynamics (Figures 4B, D, F). These additional experiments should be feasible within a reasonable timeframe.
  
  We agree with the reviewer and have performed northern blots for major hits in our different nucleolar phenotypes, and results reinforce what we see by FISH and qPCR (Fig. S6B). Briefly, depletion of the “RNA Exosome” hit SKIV2L2 results in smearing of pre-rRNA precursors that harbor both ITS1 and ITS2 and an accumulation of the 12S, in keeping with its role in end-processing of these transcripts. For “Other” hit PHF5A, we see an enrichment for 47S/45S/41S species, consistent with an early precursor stall. Notably, we do not see this phenotype for depletion of “Other” hit CNOT1, which suggests multiple processing defects may lead to a similar nucleolar phenotype. Treatment with PolI inhibitor CX5461 shows a depletion in ITS1 containing transcripts, and minimal impact on ITS2-containing transcripts, similar to FISH results. Lastly, depletion of “LSU” hits NOP53 and RPF2 leads to accumulation of the 32S and 12S species, in keeping with accumulation of abortive pre-LSUs.
  
  In addition, the authors should provide the code and the hardware control procedures for HiT-FRAP to ensure reproducibility.
  
  We thank the reviewer for this thoughtful suggestion. We have made our software available on GitHub (https://github.com/jess-sheu/colony_blob_bleacher) and archived on Zenodo
  
  (https://doi.org/10.5281/zenodo.20275447).
  
  According to the authors' statement, all the experiments are adequately replicated, and the statistical analysis is adequate.
  
  Minor comments:
  
  To enhance clarity and focus, consider the following:
  
  1). Simplifying the HiT-FRAP screening section (Fig. 1-3) would emphasize the significant findings.
  
  We have simplified text throughout to better highlight significant findings.
  
  2). Expanding analysis and experimental validation could help to solidify the interdependency between rRNA / ribosome precursors and the NPM1- driven nucleolar dynamics (Fig. 4-5). Indeed, additional experiments suggested above in the major concerns should be supplemented here.
  
  We have performed additional experiments to demonstrate the interdependency between ribosomal precursors and their interaction with NPM1 in shaping nucleolar dynamics, as described above.
  
  Reviewer #1 (Significance (Required)):
  
  This work has established a powerful toolkit, named HiT-FRAP, to identify factors involved in the organization and regulation of the membrane-less nucleolus, which will be useful for understanding the complexity not only the nucleolus, but likely other condensates in cells in the future. Using this platform and with the Granular Component (GC)-localized NPM1 as an indicator of nucleolar morphology, the authors found that the biophysical properties of the nucleolus are sensitive to the ordered assembly of ribosomes, in particular the LSU maturation steps at the GC. This finding is important as it suggests the interdependency between the dynamic rRNA processing and the functional assembly and morphology of the nucleolus. Further studies are warranted to analyze the dynamics of other nucleolar constituents, particularly those localized at other sub-nucleolar regions, to fully depict how exactly the nucleolar function is coordinated with its biophysical properties.
  
  Reviewer #2
  
  Reviewer #2 (Evidence, reproducibility and clarity (Required)):
  
  Summary: The nucleolus is a multiphase biomolecular condensate whose primary function is ribosome biogenesis. There are mounting evidences that the material state of condensates is important for their function. Here the authors have probed how the material property of the nucleolus responds to inhibitions of ribosome biogenesis.
  
  They have assessed nucleolar dynamics (molecular diffusivity) of a nucleolar protein, NPM1, by fluorescence recovery after photobleaching (FRAP). NPM1 is a protein that labels the periphery of the nucleolus (the so-called granular component, GC). (The nucleolus has 3 main subcompartments: the internal fibrillar centers, the middle dense fibrillar components, and the GC).
  
  One of the main findings of the work is that inhibition of late steps of ribosome biogenesis increases fluidity (faster recovery of NPM1), while inhibition of earlier (and inhibition of mRNA processing -but see below) rather increases rigidification (slower recovery). They then attempt to correlate what is interpreted as biophysical changes to pre-ribosomal intermediates and interaction with NPM1.
  
  Practically, the authors have produced reporter cell lines (HeLa) expressing stably (CRISPR engineering) mono or bi-allelic fluorescent version of NPM1; they have developed a powerful platform to conduct high throughout FRAP (this is really good); they have calibrated their system, initially with basic perturbations (ATP depletion, proteasome inhibition, etc), and then they focused on a family of trans-acting factors: the helicases, investigating systematically their effect on NPM1 recovery. They then extended their initial candidate-based screen to additional factors (using STRING interactions). This is nice and useful. Later in the work, they include in their analysis additional (morphological) features of nucleoli to cluster functionally their hits, as was done earlier by others in similar works. Finally, using recently published structural data (CryoEM), they attempt to correlate groups in the cluster with particular pre-ribosomal species. This part is less advanced and weaker than the initial part of the paper (screens and FRAP measurements).
  
  Major comments:
  
  -A major comment is with the compositional analysis of precursor intermediates that should be better defined. The stage assignment of particles is not quite as good as the screening part of the paper. At the RNA level, the authors provided FISH, as histograms of quantifications (see e.g. Fig 4D, and Fig SS6E). It would be necessary to show images, and to perform biochemistry. At the protein level, the authors provide immunostaining, but it does not really prove the detected protein is part of a particle,..
  
  We thank the reviewer for this important critique. We have taken several steps to address both the stage assignment and biochemical characterization concerns.
  
  Regarding stage assignment: We have consolidated our LSU phenotypic clusters (previously LSU1 and LSU2) into a single "late pre-LSU" group based on their shared features and proximity in PCA space. We want to be clear that this consolidation is intended to more accurately represent what our data can support: the screen reliably identifies factors whose perturbation produces a coherent late LSU assembly phenotype, and we do not wish to overstate the resolution of state assignment from imaging data alone. Sub-cluster distinctions are retained in supplementary materials for transparency. We have revised language throughout to reflect this framing.
  
  Regarding biochemical characterization of intermediates: We have now performed Northern blots on strong hits within our phenotypic groups (Fig. S6B). For LSU cluster hits, we observe accumulation of the 32S and 12S species, indicating a stall in ITS2 processing, which is directly consistent with our ITS2 FISH results and confirms that the RNA-level phenotypes reflect genuine pre-rRNA processing defects rather than indirect effects. For "Other" group factor PHF5A, we observe 47/45/41S accumulation consistent with an early processing stall. We have also added representative FISH images to Fig. S7 to allow direct visual assessment of RNA-level phenotypes.
  
  Regarding protein-level particle assignment: We agree that IF alone cannot establish that assembly factors are incorporated into discrete pre-ribosomal particles rather than existing as free factors. To more directly test whether the LSU cluster phenotypes reflect accumulation of genuine pre-ribosomal particles rather than mislocalized free factors we used NOP53 knockdown as a representative LSU cluster perturbation and, similar to RPF2 knockdown, see an accumulation of ITS2 and NOG2 in the nucleolus by FISH and IF (Fig. 4E). We then performed nuclear sucrose gradient fractionation and found that NOG2 co-migrates with the LSU peak and does not enrich in soluble fractions (Fig. 4F-H), supporting the interpretation that late pre-LSU particles accumulate in the nucleolus upon disruption of LSU cluster genes. Importantly, we also observe a strong decrease in co-sedimentation of NPM1 with the LSU peak upon depletion of NOP53 (Fig. 4G,H). This result, together with the Northern blot and FISH data, provides biochemical and cell biological evidence that the nucleolar phenotypes we identified by HiT-FRAP are associated with accumulation of late LSU assembly intermediates.
  
  -Another concern is to know if NPM: a GC component located periphery of the condensate and a late assembly factor is an appropriate marker for assessing the effects on nucleolar material state of all (including early and late) inhibitions.
  
  Would factors involved in earlier ribosomal assembly steps, and localized more internally would not be better tools to evaluate change in material states caused by alterations in early steps?
  
  We appreciate this important point and agree that NPM1 reports primarily on GC dynamics. However, we would argue this is a feature rather than a limitation for two reasons.
  
  First, the GC is the terminal assembly compartment through which pre-ribosomal particles must transit before nuclear export. Perturbations to earlier assembly steps, including FC/DFC-localized processes, likely propagate into GC dynamics, because stalled or aberrant particles accumulate in or are excluded from the GC. NPM1 FRAP thus functions as a downstream integrator of upstream assembly status, not only a reporter of GC-proximal events. This interpretation is consistent with our observation that depletion of early factors (and, therefore, depletion of downstream intermediates) do produce detectable NPM1 phenotypes in our screen. Second, the pattern of our screen results supports rather than undermines this logic: the striking enrichment of late LSU factors and near-complete absence of SSU hits is precisely what one would predict if NPM1 reports selectively on pre-LSU flux through the GC. A sensor that reported indiscriminately on all condensate perturbations would not produce this specificity.
  
  We do acknowledge, however, that NPM1 cannot report on material state changes that are compartmentally confined to the FC or DFC and do not propagate outward. Extending this approach to internal markers remains an important future direction. To clarify the scope of our readout, we have revised the text to specify that we are monitoring GC dynamics, and we have added representative images of fibrillarin localization in Supplemental Figure S4A to illustrate the relationship between DFC and GC compartments in our experimental system.
  
  -About the engineered cell lines used for screening by FRAP (Fig 1S): NPM1-mNeonGreen (biallelic with reduced expression of NPM1) and mScarlet (heterozygous): There is a need to characterize pre-rRNA processing in both cell lines to show they are not affected for ribosome biogenesis. This is important information since the entire work is based on these cells.
  
  We have performed a Northern blot across the cell lines used in this paper as compared to their parent cell line and see no substantial difference in rRNA processing. We have included this data as Supplemental Figure 1D.
  
  The screening cells are HeLa cells implying they are not physiologically regulated for p53. Nucleolar surveillance is a key regulatory surveillance loop triggered by ribosome biogenesis inhibitions leading to p53 stabilisation. How could this affect this work? Should key findings be confirmed in diploid p53 positive cells?
  
  We acknowledge that our choice of HeLa cells limits our ability to distinguish cell-type-specific responses from more universal mechanisms and have added an explicit discussion of cell choice in the main text. To begin exploring the impact of p53, we performed gene depletions for representative hits across phenotypic clusters in untransformed, diploid hTERT-RPE cells that were lentivirally-transduced with NPM1-mScarlet and assessed nucleolar morphological phenotypes at smaller scale (Figure S6C, Supplementary Text). At baseline, RPE cells show more and smaller nucleoli than HeLa cells, which may reflect a difference in basal nucleolar assembly and, potentially, ribosome biogenesis, in keeping with previous observations that transformed cells rely more heavily on ribosome biogenesis than non-transformed.
  
  Upon gene depletion, we found that hits from the "RNA exosome" cluster shows a different phenotype than seen in HeLa cells, where we observe less size difference and a marked decrease in eccentricity, which may reflect a p53 or cell type specific response. Depletion of the “Other” cluster gene PHF5A results in a milder though qualitatively similar phenotype as seen in HeLa cells, with nucleolar rounding and an increase in NPM1 intensity. Depletion of “LSU”-associated hits in RPE cells very robustly replicated most of the nucleolar features we observed in HeLa, which suggest that these are likely generalizable responses to LSU disruption. We have included this data in Supplementary Figure 5C. We note that we did not directly test whether p53 is stabilized upon depletion of our hits in RPE cells, and whether p53 activation feeds back on condensate dynamics remains an open area for future work. However, the concordance of LSU-associated phenotypes across HeLa and RPE cells, which differ substantially in p53 status, transformation state, and baseline nucleolar architecture, supports the generalizability of our core findings.
  
  -About factor depletion, e.g. helicases, it's important to consider direct versus indirect effects on ribosome biogenesis, the timeline of depletion should be well described in the paper. Apparently, most factors, including the helicases were depleted for 72 hours, this is very long considering most of them play important roles in essential processes for cell homeostasis implying severely reduced growth at the time of capture (and the possibility of indirect effects).
  
  We thank the reviewer for this important point. To directly address depletion timeline, we performed time courses for strong hits and monitored nucleolar morphology at 24 and 48 hour intervals (now included in Fig. S3D). Morphological changes begin to emerge by 48 hours across phenotypic classes; for the RPF2 LSU phenotype specifically, nucleolar expansion and decreased NPM1 intensity are detectable as early as 24 hours, inconsistent with a general stress response and more consistent with a direct downstream consequence of LSU assembly disruption. Moreover, despite all targeted genes being essential for homeostasis, phenotypic profiles are cluster-specific and associated with multiple genes of coherent function, which suggests that observed impacts are downstream of specific pathway inhibition rather than a general cellular stress response.
  
  -Another cause of concern is that some perturbations (factor depletion) affect very deeply nucleolar structure/morphology (eg uL2 depletion shown in Fig 2C); how easy/difficult was it to control/make sure that a correct area was obliterated in the FRAP experiment using the (remarkable) data-adaptive approach. For cases where the nucleolus was deeply affected how did you check that a significant nucleolar area had been selected for analysis? It would be good to describe this in the text.
  
  We manually ensured our segmentation protocol accurately captured nucleoli, defined by higher intensity regions of NPM1, for all depletion cases during screen development. As this is the key factor in ensuring where the bleach point is, most bleaches, even in disrupted cases, bleached the nucleolar interior. To address this point, we have included figures in the supplement (Fig. S4D) that show bleaching time courses for select highly disrupted hits uL2 and eL39.
  
  Fig 6C, interaction of NPM1 constructs with pre-ribosomes: the authors have tested interaction with select nucleolar proteins (NOP53, NOP2, NOG2, and uL2), which is not the same as preribosomes.
  
  It would be important to see the interactions with precursors (Fig S9C, now histograms) please show the actual data, this was tested by qPCR, please show classical northern blots as RTqPCR have shown their limits in such applications.
  
  Indeed, we cannot distinguish between assembly factors/ribosomal proteins that are associated with NPM1 in their latent, non-pre-LSU bound state versus those that are part of a developing ribosome. We have addressed this gap in several ways. Firstly, we have performed IP-northern blots for tagged NPM1-mutants, as suggested, and find that the mA3 mutant co-IPs more 32S than WT, while the mB2 binds less (Fig. 5D). We also performed sucrose gradient analysis of pre-ribosomal complexes and find that the mA3 mutant co-sediments more with the pre-60S peak, while mB2 co-sediments less (Fig. 5E). These findings are consistent with in vitro findings in the field that B2 mediates interactions with rRNA, while A3 occludes B2 through intramolecular interactions. Collectively with our co-IP western data, we believe the evidence strongly suggests that NPM1 mutants interact differentially with pre-LSU complexes.
  
  -Minor comments:
  
  -The effects of mRNA processing disruption on nucleolar dynamics could be (is most likely) very indirect (the so-called "slow hits"). The respective time course of inhibitions is important to describe.
  
  We direct the reviewer to our response above for other phenotypes. For our "slow hit" / "Other" cluster, we also used the splicing inhibitor PladB as an orthogonal approach. Strikingly, nucleolar rounding was detectable within less than one hour of treatment, well before any general cell health effects would be expected, while dynamics changes required approximately 24 hours — suggesting that morphological and biophysical responses are kinetically separable and that the early morphological response is directly downstream of splicing inhibition. We have included a representative rounding timecourse in Fig. S8E.
  
  Reviewer #2 (Significance (Required)):
  
  -General assessment: strengths and limitations
  
  Strengths: -The automated platform for high throughput FRAP\
  
  -The authors develop a potentially interesting model where they attempt to connect rigidification/fluidity of a condensate to its function in assembly of large ribonucleoprotein complexes. -The manuscript reads very well; it has been prepared with great care (figures). Some complicated concepts are explained very well (Introduction/Discussion). Limitations: -particle stage assignment based on FISH and immunostaining only. The authors have not demonstrated that the LSU1 cluster = state F and LSU2 cluster = states G/H
  
  -Advance: -Technological advance, high throughput FRAP, a powerful platform to interrogate macromolecular diffusivity.
  
  -Several nucleolar screens have been conducted in the past (but at steady-state, not using FRAP), in these works textural and morphological features were used together with dimensionality reduction techniques to define functional clusters of genes that impact the homeostasis of the nucleolus. Often these references are cited but it could be useful to expand a bit on some of the earlier findings to bring the new ones in perspective. Some clusters (typically, the transcriptional cluster that disrupts the nucleolus; and the late binder ribosomal proteins) have been well identified before.
  
  -Audience: Cell biologists, scientists involved in ribosome biogenesis research, scientists with an interest in helicases. The growing condensate community.
  
  -Describe your expertise: ribosome biogenesis, structure-function relationships in the nucleolus, technological development in microscopy.
  
  Reviewer #3 (Evidence, reproducibility and clarity (Required)):
  
  Summary: The authors use high throughput FRAP (HiT-FRAP) in arrayed genetic screens of HeLa cells expressing nucleophosmin (NPM1)-fluorescent protein variants to monitor the biophysical properties of the nucleolus in response to genetic perturbations. HiT-FRAP uses a data adaptive imaging strategy to automatically identify and photobleach fluorescently labeled organelles in living cells and acquire movies for FRAP. Quantitative analysis of FRAP curves include t1/2 and mobile fraction. NPM1 was monitored since it is an important nucleolar scaffolding protein that is thought to interact with many pre-ribosome intermediates.
  
  The authors depleted 65 RNA helicases (+ 9 pairs) with siRNA and found that 15 of them either increased or decreased t1/2. Knockdowns were confirmed with western blotting. RNA helicase knockdowns with faster NPM1 diffusion were associated with large subunit (LSU) assembly. Most RNA helicase knockdowns with slower NPM1 diffusion were associated with early rRNA processing via the small subunit (SSU) intermediate. The authors screened an additional 290 gene depletions of many ribosomal proteins and assembly factors. With this expanded set of perturbations, they categorized nucleoli based on four morphological features in addition to t1/2 and mobile fraction. Using principal component analysis (PCA), the authors identified clusters of genes with similar effects on NPM1 dynamics and nucleolar morphology. From this secondary screen, the majority exhibited slower NPM1 dynamics. The knockdowns associated with faster NPM1 dynamics were associated with LSU assembly, similar to the helicase experiments. The authors further analyzed several mutants of NPM1 to elucidate the likely interactions between the scaffolding protein and ribosome biogenesis factors. The accumulation of early ribosomal intermediates were associated with decreases in NPM1 dynamics, and accumulation of late intermediates led to increased NPM1 dynamics. The findings established a link between the biophysical properties of the nucleolus and the stages of ribosome biogenesis.
  
  Major comments:
  
  The claims are supported by experimentation.
  
  No additional experiments requested.
  
  The experiments are adequately replicated, and statistical analysis is sufficient. • Methods are very detailed, which should facilitate reproducibility. Minor comments:
  
  Prior studies are referenced appropriately.
  
  A bit more coverage of background on the nucleolar scaffolding protein, nucleophosmin (NPM1) would be helpful in the introduction, perhaps in favor of the details on ribosome biogenesis o Paragraph 2 could be shorter or placed elsewhere
  
  We thank the reviewer for this suggestion and have now included some background on NPM1 in the introduction and have shortened paragraph 2.
  
  • Figures
  
  o In Figures 2 - 5: explicitly state in the figure caption what dotted lines are encircling (entire cell?)
  
  We have now included this in the figure captions (they encircle the nucleus).
  
  o In Figures 2 - 5: explicitly state what the mp-inferno LUT intensity in the images is quantitating (amount of NPM1?)
  
  We have now included this in the figure captions (NPM1/mScarlet intensity).
  
  o Figure 7: more detail in the figure caption
  
  We have now expanded our model figure caption.
  
  • The paper is quite dense with a lot of nice work, discussing many different genetic perturbations. It feels a bit overwhelming, and I think the biological significance gets somewhat lost in the presentation of all the data. Perhaps some of the presentation of results can be moved to the supplement in favor of a "leaner" main text. Currently, there are only figures in the supplement, but I feel that some of the text that is not central to the key conclusions can be moved to the supplement. I found myself getting a bit bogged down and having to re-read several times to catch the takeaway messages. Some of the clarifying statements that are found in the discussion section can be moved to the results section. In short, some reorganization would help with readability. One suggestion is to move the Inhibition of rRNA transcription or the RNA exosome leads to nucleolar fragmentation and/or the Perturbation of mRNA processing pathways results in slowed NPM1 dynamics and accumulation of rRNA precursors in the nucleolus to the supplement.
  
  We thank the reviewer for this helpful suggestion. Due to this and other reviewers, we have now simplified discussion of phenotypic groups, including combining the “LSU” phenotypes into a single group and discussing LSU1/2 in the supplementary text. In addition, while we have chosen to keep the “rRNA transcription/exosome” and “Other” descriptions in the main text, they have been condensed and included in one main section with the other ribosome biogenesis phenotypes to highlight this key takeaway. Remaining discussion of phenotypes is now in supplemental text, as suggested.
  
  Reviewer #3 (Significance (Required)):
  
  • General Assessment: The main claim of the paper is that nucleolar phenotype (measured by morphology and NPM1 diffusivity) is correlated with stages in ribosome assembly - i.e. the stage of ribosome assembly determines the biophysical properties of the nucleolus. A strength of the study is the wide range of genetic perturbations tested enabled by the high throughput FRAP. With FRAP, I do worry a bit about using t1/2 as the sole dynamic measurement, but it is not a deal breaker. The authors introduce morphology as another way to characterize the nucleoli. • The claims are well supported by extensive experiments and data. The experiments are well designed, and proper controls were conducted. To validate the method, the authors used perturbations of NPM1 dynamics from the literature including ATP depletion, blocking glycolysis and oxidative phosphorylation, inhibition with MG132, and treatment with sodium arsenite. They observed slower NPM1 diffusivity under all validation conditions. • Advance: The authors have introduced a high-throughput technique for extracting diffusivity with FRAP, yielding a lot of data, but I think the paper suffers a bit in trying to present so much data in the main text. The mechanistic biological insights are compelling but get a bit overshadowed. Improved organization can help the messages come across more clearly. • To my knowledge, there is not a similar study in the literature as the detailed mechanisms of ribosome biogenesis are not well studied. • Audience: The audience for this manuscript seems to be biophysical researchers, thought there may be broader interest due to the wide screening of genetic perturbations. • Expertise: I have evaluated this manuscript from the perspective of a single-molecule biophysicist that studies protein-protein interactions between ribosome biogenesis factors. I am not an expert in FRAP, but I use FCS.
  
  PeerReviewed
Visit annotations in context

Tags

PeerReviewed

Annotators

EMBOpress

URL

biorxiv.org/lookup/doi/10.1101/2023.09.26.559432
www.biorxiv.org www.biorxiv.org

Brawn before bite in endemic Asian eutherian mammals after the end-Cretaceous extinction

1
1. Public_Reviews 18 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study uses dental traits of a large sample of Chinese mammals to tract evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis -- mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.
  
  Strengths:
  
  This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper and I think the results will be of interest to a broad audience.
  
  Weaknesses:
  
  For the original draft of the manuscript, I had four major concerns with the study, especially related to the sampling, diet, and evidence for the 'brawn before bite' hypothesis. I still believe that the original issues that I raised may be weaknesses of the study. For example, there is still limited discussion on diets (even though the dental topographic analyses used in the study are designed for inferring diets). And I find the results a little challenging to interpret because teeth of multiple positions are included in the same samples, which seems problematic. That said, the authors have addressed each of my previous concerns and have made major revisions, including running new analyses, and thus I support the paper.
  
  This revised submission includes only minor changes aimed at clarifying the main text.
  
  Reviewer #2 (Recommendations for the authors):
  
  I appreciate that the authors made many improvements to their study based on reviewers' comments. I don't have any remaining major issues with the paper, but I do have several minor comments.
  
  Thank you for taking the time to provide additional helpful feedback on our study. We have made minor revisions to the manuscript based on your suggestions. Please see our point-by-point response below.
  
  Lines 48-50. I reiterate my suggestion in my previous review to explicitly state which clade is being discussed, which is important because several major mammal groups beyond placentals (metatherians, multituberculates, dryolestoids, gondwanatherians) survived the K-Pg and had very different diversification patterns. You mention "mammal taxonomic diversity" but in the next sentence say "This initial placental mammals diversification ..." and later mention "stem placental/eutherian lineages." To stay consistent, you might replace "mammal" (L48) and "placental mammals" (L50) with "eutherian(s)" (usually defined as stem + crown placentals). If you follow this suggestion, then elsewhere in the paper I recommend replacing "mammals" with "eutherians" for consistency.
  
  Thank you for this suggestion. We modified the use of “mammals” throughout the text to general reference to the group only; specific mentions of the dataset analyzed are revised to “eutherians.”
  
  Lines 75-83. I respect the authors' hesitancy to reconstruct specific diets for the fossil taxa (L75-83), especially considering that dental topographic analyses (DTAs) often struggle to differentiate diets in extant taxa (e.g., Pineda-Munoz et al. 2016 Methods Ecol Evol). I still think that the authors might be able to interpret dietary trends from their results (e.g., an increase in average OPCR values indicating a shift toward more herbivorous diets) - I think discussing dietary trends would be an interesting discussion topic later in the paper. That said, I also recognize that different DTA results seem to show conflicting dietary trends (based on my limited knowledge of those metrics) so maybe that complicates things too much.
  
  We concur with Reviewer 2 that dietary inferences of DTA data are premature, especially given the ongoing controversies of its use in studies of extant mammal teeth. We kept our current scope of discussion unchanged.
  
  Lines 75-77. "early mammals ... are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction." But your fossils (eutherians) are certainly within 'phylogenetic brackets' of modern clades (therians, i.e. Eutheria + Metatheria). Maybe you're alluding to the fossils being stem lineages of extant subgroups like Ungulata, which means we can't bracket them specifically within those eutherian subgroups? So, I recommend revising or expanding your statement for clarity. Also, the considerable phylogenetic uncertainty for Paleocene groups (e.g., Halliday et al. 2015) complicates this issue, which you could mention.
  
  We modified the sentence to now say “Additional complications with ecomorphological analysis of these stem eutherians include the uncertainty in their dietary ecology, having diverged prior to the crown radiation, and uncertainty in phylogenetic positions of Paleocene taxa [7]; thus, they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction.”
  
  Line 84. "We investigated dental topography-performance shifts ...". You haven't introduced dental topography or even mentioned teeth yet, and "performance shifts" is vague. So, this phrase might confuse readers. Maybe you can just erase it and start the sentence with "We investigated the timing of ecomorphological ..."?
  
  We made the recommended revision.
  
  Lines 104-105 (and elsewhere). "Dental traits paralleled Paleocene global and regional environmental conditions" and "We found that dental topographic trait variability in Paleocene mammals in south China tracked global and regional climatic changes". These conclusions seem a little too assertive to me. Your sample is grouped into 3 rough time bins (of somewhat uncertain ages) and is from a relatively small geographic range - that seems like very limited information for inferring links between dental patterns and climatic changes, especially global patterns. I think it's worth HYPOTHESIZING that dental traits are linked to environmental/climatic changes (with results like those in Figure 2A & B as evidence to support that hypothesis), but I wouldn't make that claim with any confidence. So, I recommend that you temper your relevant conclusion statements. For example, for Line 105, you could replace "We found ..." with "We posit ..." (L105). I would make similar changes to similar statements throughout the paper (e.g., L243).
  
  Thank you for this suggestion to temper our phrasing. We edited throughout the text to make our interpretations less assertive.
  
  Figure 1 (and your response to reviewers). Why was the timescale changed to 65.5 Ma for the K-Pg boundary? The K-Pg is 66 Ma (not 65.5), which is the age you mention in the text (e.g. Pg 3 L39) and is well established in the literature - see recent papers from the Paul Renne lab for a more exact age.
  
  We revised the figure to have the K-Pg at 66 Ma.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.24.678280v3
www.biorxiv.org www.biorxiv.org

Visuomotor mismatch EEG responses over occipital cortex of freely moving human subjects

1
1. Public_Reviews 18 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank you for the time you took to review our work and for your feedback! The main changes to the manuscript are:
  
  (1) We have performed additional experiments to increase the number of recordings from frontal and occipital electrodes (previously 51 (occipital: O1+O2) and 26 (frontal: Fp1+Fp2), now 133 and 102). The additional data have strengthened many of our results, including for example the trend for a latency difference between occipital and frontal electrodes that was likely underpowered and is now significant (Figure 3E). We have updated all relevant figures to include the additional data (Figures 2–6, Figure S4, Figure S5). None of the main conclusions have changed.
  
  (2) As suggested by reviewer 1, we have conducted additional experiments to rule out the possibility that the observed effects were driven by the temporal order of open and closed loop sessions (new Figure S6). We also found another 9 participants who were willing to go on the ‘vomit comet’ of six degrees of freedom (6DOF) playback (previously 5, now 14). These data have further strengthened our conclusion that playback halt responses in 4DOF and 6DOF playback are not substantially different (Figure S4).
  
  (3) To address the point of reviewers 2 and 3, that mismatch negativity (MMN) responses would be larger on temporal electrodes, we conducted additional experiments in which we also recorded from temporal electrodes T3–T6. We have now added a comparison of visuomotor mismatch and MMN responses on T3–T6 electrodes as Figures S8–S9. On all electrodes, visuomotor mismatch responses were larger than MMN responses.
  
  (4) As suggested by reviewer 1, we have added an analysis of the experience-dependent changes in mismatch responses comparing frontal and occipital responses early and late in the session (new Figure 4).
  
  (5) As suggested by reviewer 2, we conducted additional experiments in an independent cohort of participants (note, without concurrent EEG) to measure eye movements triggered by visuomotor mismatches. We found eye-movement speed and blink/eye-closure changes, but these had longer latency than visuomotor mismatch responses (Figure S7).
  
  (6) Finally, as suggested by reviewers 2 and 3, we applied independent component (ICA) and time–frequency analyses to the EEG data. We show these results and explain why they are not applicable or useful in our case in the responses below.
  
  Please note, during the revision, we found that a part of our analysis used a bandpass of 0.2-100 Hz while a 1-100 Hz bandpass filter was used elsewhere. This has now been standardized to a 1-100 Hz bandpass filter, and the corresponding methods were updated. This resulted in no relevant changes to the figures. Additionally, the 50 Hz band-stop filter was erroneously described in the methods as 49-51 Hz. The filter used was 40-60 Hz, and the methods have been updated to reflect this.
  
  Reviewer #1 (Public review):
  
  In this paper, the authors wished to determine human visuomotor mismatch responses in EEG in a VR setting. Participants were required to walk around a virtual corridor, where a mismatch was created by halting the display for 0.5s. This occurred every 10-15 seconds. They observe an occipital mismatch signal at 180 ms. They determine the specificity of this signal to visuomotor mismatch by subsequently playing back the same recording passively. They also show qualitatively that the mismatch response is larger than one generated in a standard auditory oddball paradigm. They conclude that humans therefore exhibit visuomotor mismatch responses like mice, and that this may provide an especially powerful paradigm for studying prediction error more generally.
  
  Asking about the role of visuomotor prediction in sensory processing is of fundamental importance to understanding perception and action control, but I wasn't entirely sure what to conclude from the present paradigm or findings. Visuomotor prediction did not appear to have been functionally isolated. I hope the comments below are helpful.
  
  (1) First, isolating visuomotor prediction by contrasting against a condition where the same video stream is played back subsequently does not seem to isolate visuomotor prediction. This condition always comes second, and therefore, predictability (rather than specifically visuomotor predictability) differs. Participants can learn to expect these screen freezes every 10-15 s, even precisely where they are in the session, and this will reduce the prediction error across time. Therefore, the smaller response in the passive condition may be partly explained by such learning. It's impossible to fully remove this confound, because the authors currently play back the visual specifics from the visuomotor condition, but given that the visuomotor correspondences are otherwise pretty stable, they could have an additional control condition where someone else's visual trace is played back instead of their own, and order counterbalanced. Learning that the freezes occur every 10-15 s, or even precisely where they occur, therefore, could not explain condition differences. At a minimum, it would be nice to see the traces for the first and second half of each session to see the extent to which the mismatch response gets smaller. This won't control for learning about the specific separations of the freezes, but it's a step up from the current information.
  
  In theory, it is correct that the open loop (playback) session is predictable. However, this is relatively unrealistic. The open loop session is a 5-minute sequence that participants have only experienced once before, when they were generating it in the closed loop session a couple of minutes earlier. It is unlikely that participants would remember the entire sequence to a precision of less than a second, which is what they would need to predict the mismatch event. However, the reviewer is correct that it is possible that the mismatch events lose salience with time, for example as a consequence of participants losing interest in the task with time, or by undergoing some form of adaptation. To address this, we repeated the experiments with the sequence of closed and open loop sessions reversed (Figures S6A-S6C), and we analyzed the responses as a function of time within the session (Figures S6D and S6E), as suggested.
  
  The reversed-order design consisted of (1) open loop session: a playback, in which participants viewed the recorded closed loop session of a previous participant. This was followed by (2) a closed loop session, in which participants actively walked through the tunnel and experienced visuomotor mismatch events. Using this design, we again found that responses in the closed loop session were significantly larger than in the open loop session (Figures S6A-S6C).
  
  In addition, we analyzed both new and previously collected data as a function of time in the session. We computed moving average responses across 10 mismatch or playback halt trials at different percentages of progress through the paradigm (Figures S6D and S6E). This analysis revealed no consistent experience-dependent changes that could account for the observed differences between closed and open loop session. While there was indeed some form of experience dependent attenuation of visuomotor mismatch responses (see new Figure 4), the difference at the transition from mismatch to playback halt (and vice versa) far exceeded these adaptation effects (Figures S6D and S6E). This analysis was performed only on data from participants for whom we had both closed and open loop sessions and met our inclusion criteria.
  
  We used a similar analysis to test whether early and late responses within a session systematically differed (new Figure 4). Here, to maximize the chance of finding a difference, we compared early (first five) and late (last five) trials. Behaviorally, participants reduced their walking speed following mismatch events, with a significantly larger reduction during early trials (14.3%) than during late trials (5.7%) (Figure 4A). Neural responses mirrored this pattern primarily on frontal electrodes: frontal activity showed a clear attenuation from early to late trials (Figure 4B), consistent with the reduction in behavioral responses. In contrast, changes on occipital electrodes were much smaller between early and late trials (Figure 4C-4D). Thus, experience-related modulation is substantially stronger in frontal compared to occipital regions.
  
  In sum, we do not believe that the difference between visuomotor mismatch responses and playback halt responses can be explained by differences in the predictability of mismatch and playback halt events.
  
  (2) Second, the authors admirably modified their visual-only condition to remove nausea from 6 df of movement (3D position, pitch, yaw, and roll). However, despite the fact it's far from ideal to have nauseous participants, it would appear from the figures that these modifications may have changed the responses (despite some pairwise lack of significance with small N). Specifically, the trace in S3 (6DOF) and 2E look similar - i.e., comparing the visuomotor condition to the visual condition that matches. Mismatch at 4/5 microvolts in both. Do these significantly differ from each other?
  
  Yes, the 6DOF playback halt response shown in the previous Figure S3 and the mismatch response shown in previous Figure 2E are significantly different (Author response image 1).
  
  Author response image 1.
  
  Comparison of visuomotor mismatch response (A) and 6DOF playback halt response (B) from the original submission with statistics of the comparison (C).
  
  Nevertheless, to strengthen this conclusion, we collected additional data in the 6DOF condition. We show the comparison for participants for whom both closed loop (active) and open loop sessions (6DOF) were recorded within the same recording session (14 participants) in Figure S4. Consistent with our previous findings, visuomotor mismatch responses were significantly larger than 6DOF playback halt responses (Figures S4A-S4C). And we found no evidence of a difference between 6DOF and 4DOF playback halt responses (Figures S4D and S4E).
  
  (3) It generally seems that if the authors wish to suggest that this paradigm can be used to study prediction error responses, they need to have controlled for the actions performed and the visual events. This logic is outlined in Press, Thomas, and Yon (2023), Neurosci Biobehav Rev, and Press, Kok, and Yon (2020) Trends Cogn Sci ('learning to perceive and perceiving to learn'). For example, always requiring Ps to walk and always concurrently playing similar visual events, but modifying the extent to which the visual events can be anticipated based on action. Otherwise, it seems more accurately described as a paradigm to study the influence of action on perception, which will be generated by a number of intertwined underlying mechanisms.
  
  We are not entirely sure we understand the point here correctly. If the reviewer is suggesting that visuomotor coupling is not describable by the ideas of predictive processing, we disagree. However, given that the papers the reviewer is pointing to are premised on what seems to be a somewhat unorthodox interpretation of predictive processing when it comes to cortical circuits, we suspect this is contributing to the misunderstanding here. Let us briefly explain. In the two papers, Press and colleagues argue that most experiments cannot distinguish between “predictive cancellation” and “gated suppression”. This is indeed relatively tricky, even when one has single neuron data. The question is, does movement simply suppress sensory feedback (as is likely the case e.g. in the famous example of the cricket), or does movement result in a precise removal of only the self-generated sensory reafference? The first good evidence of the latter happening in any system is quite recent (Keller and Hahnloser, 2009). The premise the authors build their argument on is that the theory posits that “the brain predictively ‘cancels’ expected action outcomes from perception” (from the abstract of one of the papers). This is incomplete. The minimum circuit for predictive processing is composed of 3 neuron types: positive prediction error neurons, negative prediction error neurons, and internal representation neurons. Only the positive prediction error neurons have the predictive cancellation property the authors discuss. This is not the case for either negative prediction error neurons, or for the internal representation neurons. Negative prediction error neurons are excited by predictions and suppressed by sensory input (i.e. if anything, they are “predictively amplified”). This circuit is relatively well characterized in mouse cortex – for a brief summary see (Keller and Mrsic-Flogel, 2018). Note, this is not our idea of course – the original formulation of predictive processing (Rao and Ballard, 1999) was built to explain end-stopping. These are responses to the absence of an expected line that were stronger than would be expected from classical theories (i.e. negative prediction error responses). In mouse visual cortex, we know that a sudden break in the coupling between locomotion and visual flow selectively activates layer 2/3 negative prediction error neurons. Thus, if human cortex also implements a predictive processing like circuit with positive and negative prediction error neurons, we would expect a break in visuomotor coupling to drive a measurable response in visual cortex (by exciting the population of negative prediction error neurons – this is also why we are quite excited by the phase reversal of visual and mismatch responses as this could indicate that mismatch activates negative prediction error neurons first and positive prediction error neurons later, and vice versa for visual stimulation – negative prediction error neurons are more superficial in cortex (O’Toole et al., 2023)). We do indeed find a response over occipital cortex consistent with the negative prediction error response we observe in mouse cortex. The difficulty in distinguishing “predictive cancellation” and “movement driven suppression” comes only when looking at positive prediction error type responses (that are suppressed by predictive inputs) but does not apply to negative prediction error responses. The predictive processing circuit we are testing is the one described by (Keller and Mrsic-Flogel, 2018; Rao and Ballard, 1999), and here the break in visuomotor coupling is a stimulus that drives negative prediction error responses. Note, other authors who have thought about cortical implementations of predictive processing (e.g. (Bastos et al., 2012)) have glossed over the problem that individual neurons cannot trivially encode both positive and negative errors. Prediction errors are a signed quantity. If neurons signal prediction errors in firing rates and are close to zero firing rate at baseline (as is the case in layer 2/3 of cortex), they cannot (short of rather exotic ideas) encode a signed prediction error. Hence such proposals are not very useful for thinking about prediction error responses in cortex. For these reasons, we see no problem with referring to the response as a prediction error response. This is in line with a large body of mouse research (using a nearly identical paradigm) on the topic.
  
  One could of course argue that gated suppression could also mean that movement relieves suppression. Thus, one could assume that some neurons are suppressed by movement while others are enhanced. If one allows for enough neuron and stimulus specificity in the precision of the movement related suppression and enhancement of responses, the two models (predictive processing and gated suppression) become equivalent, and the discussion becomes semantic. See (Vasilevskaya et al., 2023) for an extended discussion on this point, and the reasons why we think predictive processing is a more useful model than gated suppression (keep in mind, gated suppression only explains the data if we allow for stimulus/neuron specific gain factors of the suppression, in which case the two models are equivalent).
  
  More minor points:
  
  (1) I was also wondering whether the authors may consider the findings in frontal electrodes more closely. Within the statistical tests of the frontal electrodes against 0, as displayed in Figure 3c, the insignificance of the effect of Fp2 seems attributable to the small included sample size of just 13 participants for this electrode, as listed in Table S1, in combination with a single outlier skewing the result. The small sample size stands out especially in comparison to the sample size at occipital electrodes, which is double and therefore enjoys far more statistical power. It looks like the selected time window is not perfectly aligned for determining a frontal effect, and also the distribution in 3B looks like responses are absent in more central electrodes but present in occipital and frontal ones. I realise the focus of analysis is on visual processing, but there are likely to be researchers who find the frontal effect just as interesting.
  
  That is correct; our data in frontal electrodes was likely underpowered. The reason we have fewer data in frontal electrodes is that eye-blink artifacts are particularly strong in frontal channels, resulting in a larger proportion of trials failing to meet our data inclusion criteria. We have now added more data from frontal and occipital electrodes by including additional experimental sessions. In addition, we applied less stringent trial-exclusion criteria, requiring that no artifacts occur within the time window −0.5 to 1 s relative to the event trigger (instead of −0.5 to 2 s). This adjustment allowed us to retain a larger number of trials. As anticipated by the reviewer, this increase in data was sufficient to confirm a significant response to the visuomotor mismatch event at both frontal electrodes (Figure 3C). The expanded dataset also revealed a significant difference in response onset times between occipital and frontal electrodes (Figure 3E), an effect that was not significant previously. In addition, we have included analysis comparing early and late mismatch responses in frontal and occipital electrodes (Figure 4).
  
  (2) It is claimed throughout the manuscript that the 'strongest predictor (of sensory input) - by consistency of coupling - is self-generated movement'. This claim is going to be hard to validate, and I wonder whether it might be received better by the community to be framed as an especially strong predictor rather than necessarily the strongest. If I hear an ambulance siren, this is an especially strong predictor of subsequent visual events. If I see a traffic light turn red, then yellow, I can be pretty certain what will happen next. Etc.
  
  This is a statistical argument. Every movement – throughout life – is directly and immediately coupled to sensory feedback and has been throughout evolutionary history. The vast majority of visual input you receive (we estimate, well above 99%) is the consequence of your own movements (e.g. every few 100 ms your eye movements cause a full field change in your visual input). The same is likely true of proprioceptive and somatosensory input – the vast majority is the direct consequence of your own movements (not other people poking you). This is likely different in the auditory system where a much larger fraction of the input is externally driven (depending a bit on how much one likes to talk). But even here the best predictor is self-motion (most non-self-generated sounds one experiences in life are very difficult to predict with millisecond precision). The example the reviewer gives is a good illustration of this. Take the siren that hails the appearance of an ambulance. The siren tells us that an ambulance will appear, but not how it will look, not when exactly it will appear, and with only very low resolution as to where it will appear. Incidentally, if you ask people to draw an ambulance they tend to draw a WWII style white square vehicle with a red cross on the side – a style of ambulance they likely have not ever seen in life. Their visual predictions of what they are about to see are very low resolution. We catastrophically fail at making pixel perfect predictions from learned stimulus associations of this nature. The traffic light example is difficult to compare to visual feedback control of movement as it is a much simpler prediction of a single bit in the form of a change in color of an existing object.
  
  In addition, consider how often (in life) you have seen an ambulance after hearing it? 100 times maybe? Maybe less. How often have you seen traffic lights change - 10 000 times? 100 000 times? Now consider, how often you have experienced the visual consequences of moving your head or eyes to the left (keep in mind this includes micro saccades) – at a conservative, once per second, that is somewhere on the order of 1 000 000 000. This is not even in the same ballpark. Our brains can certainly learn to make the ambulance and traffic light type predictions - to some extent - but by far the best predictor of sensory feedback (simply by virtue of the physics of how our body interacts with the world) is self-motion.
  
  We think this is an argument we can make based on first principles, and one that is frequently overlooked in the field, as experiments often focus on training people or animals to learn novel associations that, especially in the case of mice, we often have no idea whether cortical circuits can even learn. We should focus experiments on the predictive systems our brains have evolved since long before the evolutionary appearance of ambulances and traffic lights. We understand that the reviewer may disagree with this, but unless the reviewer has a concrete example of an even stronger predictor (as measured by frequency of experience, consistency in coupling, and precision in timing – we can’t think of one), it is a point we will make.
  
  (3) The checkerboard inversion response at 48 ms is incredibly rapid. Can the authors comment more on what may drive this exceptionally fast response? It was my understanding that responses in this time window can only be isolated with human EEG by presenting spatially polarized events (cf. c1, e.g., Alilovic, Timmermans, Reteig, van Gaal, Slagter, 2019, Cerebral Cortex).
  
  We don’t know, but it is not inconsistent with previous reports. For example, compare the “standing” and “fast walking” target ERP responses in Figure 5 of (Gramann et al., 2010). Both here and in our data, the fast response peak is only really apparent in the direct comparison of visual responses recorded while participants were walking to those when they were stationary.
  
  While we have taken great care to calibrate the timing of the visual display with the EEG recording, one could be worried that the alignment is off by as much as tens of milliseconds. However, even if this were so, one could use P1 as a reference and determine that the fast peak roughly precedes P1 by about 40 ms. Which again would result in a latency of about 50 ms of the fast walking peak (assuming P1 peaks at about 90 ms). In sum, we have added a reference to the previous work (that we found thanks to the reviewer’s comment) but fear we have nothing intelligent to say beyond that.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study investigates whether visuomotor mismatch responses can be detected in humans. By adapting paradigms from rodent studies, the authors report EEG evidence of mismatch responses during visuomotor conditions and compare them to visual-only stimulation and mismatch responses in other modalities.
  
  Strengths:
  
  (1) The authors use a creative experimental design to elicit visuomotor mismatch responses in humans.
  
  (2) The study provides an initial dataset and analytical framework that could support future research on human visuomotor prediction errors.
  
  Weaknesses:
  
  (1) Methodological issues (e.g., volume conduction, channel selection, lack of control for eye movements) make it difficult to confidently attribute the observed mismatch responses to activity in visual cortical regions.
  
  (2) A very large portion of the data was excluded due to motion artefacts, raising concerns about statistical power and representativeness. The criteria for trial inclusion and the number of accepted trials per participant appear arbitrary and not justified with reference to EEG reliability standards.
  
  (3) The comparison across sensory modalities (e.g., auditory vs. visual mismatch responses) is conceptually interesting, but due to the choice of analyzing auditory mismatch responses over occipital channels, it has limited interpretability.
  
  We have responded to these points in the more detailed itemization below.
  
  The authors successfully demonstrate that visuomotor mismatch paradigms can, in principle, be applied in human EEG. However, due to the issues outlined above, the current findings are relatively preliminary. If validated with improved methodology, this approach could significantly advance our understanding of predictive processing in the human visual system and provide a translational bridge between rodent and human work.
  
  Reviewer #2 (Recommendations for the authors):
  
  Overall, the study addresses an interesting and underexplored question (translation of the visuomotor mismatch responses observed in rodents to humans). Below, please find a list of specific suggestions for improvement
  
  Introduction:
  
  (1) "updating internal representations and internal models" - what is the difference between the two, and why is it relevant to this study?
  
  In a nutshell, an internal model is the synaptic weight matrix that transforms between coding spaces. An internal representation is the activity pattern coding for the current representation. See (Aizenbud et al., 2025; Keller and Mrsic-Flogel, 2018) for more lengthy elaborations. The fact that the mechanism used for representation update can also be used to update internal models (i.e. solve the credit assignment problem) is likely the prime advantage of predictive processing (see work from the Bogacz lab). The relevance to the current study is justifying why predictive processing is a reasonable hypothesis for the function of cortex.
  
  (2) "Certain stimuli can be predicted from the preceding sensory input" vs. "Predictions can also be based on memory" - how are these two different? Do you mean specific (e.g., long-term associative or episodic) memory types in the latter?
  
  Correct, this is an arbitrary distinction that primarily makes sense in the light of experimental approaches. In this particular case, we were talking about spatial memory. We made this explicit to increase clarity.
  
  (3) "the strongest predictor - by consistency of coupling - is self-generated movement"
  
  (a) Externally induced movement, while not self-generated and therefore not predicted, will also generate sensory coupling, so is it really only about consistency?
  
  Externally induced movement (as in somebody else moving one’s arm we are not sure this is what the reviewer means) will induce sensory-sensory coupling but not sensorimotor coupling. We might be misunderstanding the point. In case the reviewer means stimuli that trigger movement as in us asking participants to walk, or a sudden startle stimulus that makes them jump in all such cases there are of course sensorimotor predictions. Sensorimotor predictions are driven by efference copies of the motor command thus all movements whether ‘voluntarily’ executed or triggered by an external stimulus will drive sensorimotor predictions. (All of this of course assumes that the predictive processing theory is correct.)
  
  (b) Do you mean temporal consistency (minimal lags), statistical contingencies (same movements linked to the same sensory inputs), or both? How does it differentiate sensorimotor/visuomotor mismatch responses from responses to incongruent stimuli in sensory modalities (e.g. audiovisual)?
  
  Both. We have rephrased the sentence to try to make this clearer. See also response to reviewer 1 minor point 2 above.
  
  How does it differentiate sensorimotor/visuomotor mismatch responses from responses to incongruent stimuli in sensory modalities (e.g. audiovisual)?
  
  Most cross-modal associations are much less consistent (the exact sound of a glass shattering is always slightly different and impossible for us to predict), and orders of magnitude less frequently experienced, than sensorimotor associations. Again, see also response to reviewer 1 minor point 2 above.
  
  (4) "Every movement is directly coupled to sensory feedback throughout life"
  
  This may be the case for proprioceptive and/or somatosensory feedback, but not necessarily for visual feedback (e.g., a mouse moving its tail), which is the topic of the study.
  
  Correct, there are movements that can be disconnected from visual feedback. Most of the time, most movements however are not, and we are studying one of the more prominent ones that is clearly not decoupled locomotion. The contrast we aim to highlight here very prominently is that there is still this vague idea in the field that you can take a participant, or a mouse, and expose them/it to a few tens or hundreds of trials of some sensory stimulus contingency and then probe for prediction error responses to a pattern only recently if at all learned. Given the life-long experience of subjects and mice, is it really surprising that oddball responses are less strong than a sensorimotor mismatch?
  
  (5) "However, the overall level of this motor-related activity is much higher than one would expect simply from predictions of visual feedback that are compared against visual input."
  
  Could you please clarify what one would expect in this case, and/or back it up with citations?
  
  This is in reference to the fact that there are very strong movement related signals in the mouse visual cortex that persist even when the mouse is in complete darkness. In darkness, movements should not trigger any visual feedback change hence the activity is difficult to explain as a movement related prediction of visual flow. We have rephrased this section of the introduction to make this clearer.
  
  (6) "The more precise the prediction and comparison, the less motor-related activity should be detectable in visual cortex."
  
  I think this conflates two issues. A good match between prediction and input would indeed result in sensory attenuation. However, sensory precision, at least in active inference, can upregulate prediction error responses. Since predictions cannot be assumed to be perfect (due to external or internal noise), increased precision may therefore augment activity. See e.g. https://doi.org/10.1007/s10339-013-0571-3
  
  We agree with the reviewer – the phrasing here was misleading. We do not mean precision in the predictive processing sense, but the precision of sensorimotor control necessary for the behavior. We have rephrased the corresponding section of the manuscript.
  
  (7) Neither the introduction nor the discussion refers to previous human EEG studies on sensorimotor mismatch responses, where sensory feedback doesn't match motor actions (e.g. https://doi.org/10.3758/s13423-021-01992-z ; https://www.sciencedirect.com/science/article/pii/S0028393214003777 ; https://www.sciencedirect.com/science/article/pii/S0028393219301265).
  
  The studies cited by the reviewer primarily test how discrete violations of learned action–outcome associations are represented in the brain, whereas our visuomotor mismatch paradigm probes violations of continuous sensorimotor coupling during ongoing action. The paradigms are conceptually different both in how strong the coupling is (lifelong vs. learned in the experiment), and in how prediction errors are likely used (visuomotor control vs. stimulus detection). We have added a brief part to our introduction discussing this.
  
  Results:
  
  (1) A very large proportion of the dataset was excluded due to movement artefacts. This is rather problematic as
  
  (a) the rationale behind finding mismatch responses is that motion-related (neural) signals should affect visual cortical activity, so it's essential to disentangle these neural signals from artefacts;
  
  Correct, we excluded 21.7% of the total data for visuomotor mismatch paradigm. Note, this percentage compares to other similar studies of EEG recordings during movement (Oliveira et al., 2016). By “problematic”, we assume the reviewer means the fact that we have artefacts, not that we exclude trials with artefacts. The movement artefacts are typically caused by the acceleration during stepping in participants with a heavy gait. None of these movement artefacts are time locked to any of the responses we investigate. Thus, they should just appear as increased levels of noise if not excluded. We don’t understand why the reviewer thinks this is particularly problematic for our analysis/conclusions (beyond the trivial consequence of increasing noise levels that would only cause us to underestimate the strength of the mismatch signals we report).
  
  (b) the criterion for the number of trials of 15 triggers (per condition?) is arbitrary and lower than widely used in the literature, so authors should demonstrate that this is a sufficient number to observe a measurable ERP even for those participants with 15 triggers;
  
  We have between 16 and 25 visuomotor mismatch events per participant. Author response image 2 is a selection of single participant examples with different number of trials. The number of mismatch events is limited by the fact that we introduce them approximately every 10 - 15 s and have a total duration of the closed loop session of 5 minutes. Thus, on average, we expect to have 24 mismatch events. But we are not sure we understand the logic of the comment, if we set exclusion too low, we just risk losing a response in the noise. And we clearly have stronger and higher signal to noise mismatch responses with an average of 20 trials compared to visual responses during movement with an average of 40 trials or MMN responses with an average of 28 trials.
  
  Author response image 2.
  
  Reliable ERPs can be observed with as few as 16 trials across EEG channels. (A) Histograms showing the distribution of the number of valid mismatch trials per participant for each electrode pair (Fp1–2, C3–4, P3–4, O1–2). (B) Representative EEG responses to visuomotor mismatch events from a single participant, recorded at electrode pairs Fp1–2, C3–4, P3–4, and O1–2. Waveforms were computed using the indicated number of trials (shown above each trace). Dashed vertical red lines are onset and offset of the visuomotor mismatch.
  
  (c) it seems that the seemingly static "visual" condition resulted in a larger proportion of data rejected due to movement (or, as later mentioned, nausea) than the "visuomotor" condition, which is counterintuitive and needs further explanation;
  
  This is a misunderstanding the ‘visual paradigm’ the reviewer is referring to are the experiments shown in Figure 1. Here we record visual responses in both sitting and walking participants. In this experiment, as in others, exclusion was primarily driven by part of the paradigm where the subjects were moving. To make this clearer we have added Table S2 to the manuscript that provides an overview of trials excluded by paradigm and session.
  
  (d) authors mention eye movements as a potential issue, which should be possible to detect from frontal channels. Additionally, it's not entirely clear how many datasets were discarded (the results section mentions 19/48 in the visual condition, then 4+11 in the playback condition - isn't this the same condition?)
  
  The visual paradigm corresponds to the data shown in Figure 1, in which participants viewed a flipping checkerboard in both a walking and a stationary session. The open loop session is part of the visuomotor paradigm shown in Figure 2, where participants were exposed to a replay of the visual flow that had been self-generated during the preceding closed loop session, including the visual flow halts that constituted visuomotor mismatches in the closed loop session. Please note, to avoid such confusion, we have attempted to standardize the usage of paradigm (visual vs. visuomotor) and session (sitting vs. walking, and closed loop vs. open loop) throughout. In addition, we have added a table to summarize the number of excluded trials by paradigm and session as Table S2 to the manuscript.
  
  In comments 1 and 2 of the public review, the reviewer also points out that we did not control for eye movements and we presume relatedly claims that we did not use common EEG reliability standards. Regarding the first point, we performed additional experiments in an independent cohort of participants to test whether eye movements could account for the visuomotor mismatch responses. We recorded eye movements during closed loop sessions and found that changes in eye speed (Figure S7A) or blink rate (Figure S7B) following the mismatch stimulus had a longer latency than visuomotor mismatch responses in EEG. This suggests that the visuomotor mismatch response cannot be explained by eye blinks or changes in eye movement speed. Regarding the second point, we are not sure we understand. Trial exclusion based on a fixed voltage threshold of 100 µV is relatively common, and our rejection rates are on par, and particularly on occipital electrodes even lower, with other work in EEG recordings during locomotion or movement (see e.g. (Oliveira et al., 2016)).
  
  Nevertheless, we did attempt to apply independent component analysis (ICA) based filtering to the EEG data (Delorme and Makeig, 2004). However, these methods were designed for high channel density recordings. With only 8 channels, ICA is unable to reliably isolate eye movement or motion artefact components of the EEG. To illustrate this, we tested two artifact-rejection strategies. In the first approach, components associated with non-neural artifacts (e.g., muscle activity, line noise, eye movements) were removed only if at least 90% of the component’s variance was assigned to a single artifact class (Author response image 3A). In the second, more permissive approach aimed specifically at reducing eye movement artifacts, components were removed if artifact-related activity exceeded 90% for non-eye artifacts, while the threshold for eye-related components was lowered to 60% (Author response image 3C). We lowered the threshold for excluding eye-related components to ensure that EEG signals influenced by eye movements were effectively removed. In both cases - whether the eye-component threshold was set to 90% or 60% - the averaged responses to visuomotor mismatch trials remained largely similar to the previously reported data, despite higher noise in some traces. Interestingly, when we then followed the ICA filtering by our voltage threshold based exclusion with a threshold of 100 µV, the resulting traces closely resembled the patterns described in the paper (Author response image 3B and 3D). Thus, we conclude the nonICA filtered responses are easier to interpret, free of any potential ICA filtering artifacts, and far less parameter choice (of the ICA filtering) dependent.
  
  Author response image 3.
  
  Removal of artifacts identified with ICA does not change the visuomotor mismatch responses. (A) Visuomotor mismatch responses recorded from occipital electrodes after artifact correction. Components associated with non-neural artifacts (e.g., muscle activity, line noise, eye movements) were removed only if ≥90% of the component’s variance was attributed to a single artifact class. Solid black line represents the mean, and shading indicates the SEM across participants. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but excluding trials with amplitudes exceeding 100 µV. (C) As in A, but components were removed if artifact-related activity exceeded 90% for non-ocular artifacts, while the threshold for eye-related components was lowered to 60%. (D) As in C, but excluding trials with amplitudes exceeding 100 µV.
  
  (2) The finding that mismatch responses are observed at all channels, with differences in amplitudes but not latencies, indicates that volume conduction may affect the results. I would strongly suggest accounting for this using a method appropriate for the very small number of channels, e.g., phase lag index.
  
  We are not sure we understand. The phase lag index is a method to estimate functional connectivity in a way that corrects for volume conduction (using phase lag). We make no claims about functional connectivity; thus, we are not sure what the reviewer is suggesting we do. The fact that the visual and visuomotor mismatch responses were measurable on all electrodes could indeed be in part explained by volume conduction, but we see no way to estimate the volume conduction contribution. From mouse calcium imaging data, we know that both visual and visuomotor mismatch responses spread across large parts of dorsal cortex (including frontal regions like the ACC).
  
  With the addition of new data, the latency difference between occipital and frontal electrodes - previously observed only as a trend - is now statistically significant (Figure 3E). Occipital responses emerge earlier than frontal responses, suggesting that mismatch-related activity likely originates in sensory visual regions and subsequently propagates to more frontal areas, as similar to what had been reported in mouse cortex (Heindorf and Keller, 2024).
  
  (3) The authors compare different types of mismatch responses (including auditory oddballs) in the same set of (occipital) channels, but doesn't this undermine the spatial specificity of the results? Classical auditory mismatch negativity is typically observed over central channels, so weaker amplitudes of auditory mismatch responses in occipital channels are likely trivially explained by modality differences. As such, I'm not convinced that this comparison is informative even in a qualitative manner.
  
  To address this point, we conducted additional auditory oddball experiments with recordings over the auditory cortex (channels T3, T4, T5, and T6). Given our central reference, these channels should capture the strongest mismatch negativity. The amplitude of the visuomotor mismatch response exceeded that of mismatch negativity on all tested channels (new Figures S8 and S9).
  
  (4) On a similar note, is the polarity reversal found for visual vs. mismatch responses specific to occipital channels?
  
  Thank you for this interesting question. In fact, polarity reversal was consistently observed across all recorded channels; this has now been added as a main figure to the manuscript (Figure 5).
  
  (5) Figure S4C seems to cut off one outlier, and I don't see this outlier included in the boxplot.
  
  Correct, that is why we describe the boxplots in the figure legend as: “Boxes mark median, quartiles, and range of data not considered outliers.” The axes were now adjusted to include all data points.
  
  Discussion:
  
  "A central tenet of the cortical circuit for predictive processing is the split into separate populations of neurons that compute positive and negative prediction errors (Keller and Mrsic-Flogel, 2018; Rao and Ballard, 1999)" - this may be the case for visuomotor mismatch signals or reward prediction errors, but signed PEs do not play a central role in other proposed microcircuits for predictive processing in the perceptual domain (e.g. Bastos)
  
  Signed prediction errors do not play a central role in proposed cortical microcircuits for predictive processing that do not burden themselves with making a concrete proposal for the implementation of the prediction error computation. The (Bastos et al., 2012) work is a good example of this. The equation for the error term provided in that paper is clearly signed (nothing stops the error from going negative), but no proposal is made for how layer 2/3 excitatory neurons are supposed to signal this quantity. With baseline activity levels close to zero in layer 2/3, there really is only one way to do this, and that is separate populations of negative and positive prediction error neurons. With non-zero baseline firing rate, one could do this bidirectionally around a mean firing rate (as is typically thought of dopaminergic RPE neurons). There are more abstract Bayesian implementations that assume logarithmic transformations that could also implement a prediction error-like system without negative firing rates. But given the absence of any physiological evidence, we will refrain from discussing these. However, most importantly, there is now considerable evidence for the existence of both negative and positive prediction error neurons in layer 2/3 of mouse visual cortex. Thus, by “cortical circuit for predictive processing” we here mean those that make biologically plausible proposals for prediction error computations. Also note, the (Rao and Ballard, 1999) model is probably the prime example for what the reviewer calls a proposed microcircuit for predictive processing in the “perceptual domain”.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Solyga, Zelechowski, and Keller present a concise report of an innovative study demonstrating clear visuomotor mismatch responses in ambulating humans, using a mobile EEG setup and virtual reality. Human subjects walked around a virtual corridor while EEGs were recorded. Occasionally, motion and visual flow were uncoupled, and this evoked a mismatch response that was strongest in occipitally placed electrodes and had a considerable signal-to-noise ratio. It was robust across participants and could not be explained by the visual stimulus alone.
  
  Strengths:
  
  This is an important extension of their prior work in mice, and represents an elegant translation of those previous findings to humans, where future work can inform theories of e.g., psychiatric diseases that are believed to involve disordered predictive processing. For the most part, the authors are appropriately circumspect in their interpretations and discussions of the implications. I found the discussion of the polarity differences they found in light of separate positive and negative prediction errors, intriguing.
  
  Weaknesses:
  
  The primary weaknesses rest in how the results are sold and interpreted.
  
  Most notably, the interpretation of the results of the comparison of visuomotor mismatches to the passive auditory oddball induced mismatch responses is inappropriate, as suboptimal electrode choices, unclear matching of trial numbers, and other factors. To clarify, regarding the auditory oddball portion in Figure 5, the data quality is a concern for the auditory ERPs, and the choice of Occipital electrodes is a likely culprit. Typically, auditory evoked responses are maximal at Cz or FCz, although these contacts don't seem to be available with this setup. In general, caution is warranted in comparing ERP peaks between two different sensory modalities - especially if attention is directed elsewhere (to a silent movie) during one recording and not during the other. The authors discuss this as a purely "qualitative" comparison in the text, which is appreciated, and do acknowledge the limitations within the results section, but the figure title and, importantly, the abstract set a different tone. At least, for comparisons between auditory mismatch and visuomotor mismatch, trial numbers need to be equated, as ERP magnitude can be augmented by noise (which reduces with increased numbers of trials in the average).
  
  To address this point, we conducted additional auditory oddball experiments with recordings over the auditory cortex (channels T3, T4, T5, and T6). Given our central reference, these channels should capture the strongest mismatch negativity. Nevertheless, the amplitude of the visuomotor mismatch response exceeded that of mismatch negativity on all tested channels (these results are now shown in the new Figures S8 and S9), and the response power was significantly greater for the visuomotor mismatch than for mismatch negativity. Independent of electrode we test, the visuomotor mismatch response has a power 5 to 10 times higher than that of the MMN response. And the number of trials per participant that met quality criteria was comparable between the visuomotor mismatch paradigm (mean = 23 trials) and the auditory mismatch paradigm (mean = 28 trials) (Author response image 4).
  
  Author response image 4.
  
  Number of trials included for analysis is comparable between visuomotor and oddball paradigm. (A) Histogram showing the distribution of the number of valid trials per participant for O1-2 electrode pair in visuomotor mismatch paradigm. (B) Same as in A but for deviant stimulus presentations in the oddball paradigm.
  
  And more generally, the size of the mismatch event at the scalp does not scale one-to-one with the size at the level of the neural tissue. One can imagine a number of variables that impact scalp level magnitudes, which are orthogonal to actual cortex-level activation - the size, spread, and polarity variance of the activated source (which all would diminish amplitude at the scalp due to polyphasic summation/cancelation). The variance of phase to a stimulus across trials (cross trial phase locking) vs magnitude of underlying power - the former, in theory, relates to bottom-up activity and the latter can reflect feedback (which has more variability in time across trials; the distance of the scalp electrode from the activated tissue (which, for the auditory system, would be larger (FCz to superior temporal gyrus) than for the visual system (O1 to V1/2)). None of this precludes the inclusion of the auditory mismatch, which is a strength of the study, but interpretations about this supporting a supremacy of sensory-motor mismatch - regardless of validity - are not warranted. I would recommend changing the way this is presented in the abstract.
  
  We agree with the point that the EEG response does not need to reflect the total cortical activation. However, the discussion in the abstract (and elsewhere) is in the context of clinical experiments where the underlying cortical activity pattern is irrelevant if it does not trigger a clinically measurable (by EEG in this case) response. The abstract only makes a comparison to MMN implicitly in this sentence “Second, a paradigm that can trigger strong prediction error responses and consequently requires shorter recording times could simplify experiments in a clinical setting.” We are not sure how to phrase this even more carefully – the statement at face value is a truism. The reviewer, we assume, takes exception to the unstated implication that visuomotor prediction errors trigger stronger responses than MMN. Given the data we have, we assume most authors would not consider it an overstatement to make that claim outright.
  
  Otherwise, the data are of adequate quality to derive most of their conclusions.
  
  The authors claim that the mismatch responses emanate from within the occipital cortex, but I would require denser scalp coverage or a demonstration of consistent impedances across electrodes and across subjects to make conclusions about the underlying cortical sources (especially given the latencies of their peaks). In EEG, the distribution of voltage on the scalp is, of course, related to but not directly reflective of the distribution of the underlying sources. The authors are mostly careful in their discussion of this, but I would strongly recommend changing the work choice of "in occipital cortex" to "over occipital cortex" or even "posteriorly distributed". Even with very dense electrode coverage and co-registration to MRIs for the generation of forward models that constrain solutions, source localization of EEG signals is very challenging and not a simple problem. Given the convoluted and interior nature of human V1, the ability to reliably detect early evoked responses (which show the mismatch in mouse models) at the scalp in ERP peaks is challenging - especially if one is collapsing ERPs across subjects. And - given the latency of the mismatch responses, I'd imagine that many distributed cortical regions contribute to the responses seen at the scalp.
  
  This is an excellent point we have rephrased throughout to “over occipital cortex” instead of “in occipital cortex”.
  
  I think that Figure 3C, but as a difference of visual mismatch vs halting flow alone (in the open loop) might be additionally informative, as it clarifies exactly where the pure "mismatch" or prediction error is represented.
  
  We performed the analysis as suggested (Author response image 5). Visuomotor mismatch responses are stronger on all electrodes compared to playback halt responses. This difference is also larger in data recorded on occipital electrodes.
  
  Author response image 5.
  
  Comparison of the difference between visuomotor mismatch and playback halt on all electrodes. Average response strength was calculated within a 100 ms window centered on the peak of the average visuomotor mismatch response across all electrodes. Boxes mark median, quartiles, and range of data not considered outliers. Each circle represents data from one participant. **: p<0.01, *: p<0.05, Fp1-2: 20 participants, C3-4: 31 participants, P3-4: 35 participants, O1-2: 32 participants.
  
  As a suggestion, the authors are encouraged to analyse time-frequency power and phase locking for these mismatch responses, as is common in much of the literature (see Roach et al 2008, Schizophrenia Bulletin). This is not to say that doing so will yield insights into oscillations per se, but converting the data to the time-frequency domain provides another perspective that has some advantages. It fosters translations to rodent models, as ERP peaks do not map well between species, but e.g., delta-theta power does (see Lee et al 2018, Neuropsychopharmacology; Javitt et al 2018, Schizophrenia research; Gallimore et al 2023, Cereb Ctx). Further, ERP peaks can be influenced by the actual neuroanatomy of an individual (especially for quantifying V1 responses). Time frequency analyses may aid in interpreting the "early negative deflection with a peak latency of 48 ms " finding as well.
  
  We have performed time–frequency power and phase-locking analyses for both visual responses (Author response image 6 and Author response image 7) and visuomotor mismatch and playback halt responses (Author response image 8 and Author response image 9), as suggested. We have added the results of these analyses here, as these are not fully developed yet. We may add these to a future publication, for which we would properly want to quantify stability of these effects.
  
  In brief, time–frequency representations of power did identify potentially interesting differences between walking and sitting sessions in the visual paradigm. Inter-trial phase coherence (ITPC) revealed an early increase in alpha-band synchronization suggesting that phase alignment of alpha oscillations may contribute to the early differences in visual responses between walking and sitting. The same analyses were applied to visuomotor mismatch and playback halt responses. Time–frequency power analysis revealed an increase in delta-band power during visuomotor mismatch, consistent with previous reports linking delta activity to prediction error processing, including reward prediction errors (Cavanagh, 2015), unexpected final words (Webb and Sohoglu, 2025), and visual deviance detection (West et al., 2024). Notably, it appears as if the increase in delta power emerged first over occipital electrodes and appeared later over more frontal electrodes, forming a spatiotemporal gradient of onset across the scalp.
  
  Delta power changes were markedly reduced in the playback halt responses at the time of visual flow cessation. While some power changes were observed, they occurred primarily at visual flow onset rather than at flow offset. Inter-trial phase coherence analysis further revealed delta-band synchronization over occipital electrodes following visuomotor mismatch, whereas the playback halt response showed strong phase synchronization in both delta and theta bands following visual flow onset.
  
  Author response image 6.
  
  Time–frequency representations of EEG power changes during the visual paradigm. (A) Time–frequency maps showing changes in spectral power relative to baseline for electrodes Fp1–2, C3–4, P3–4, and O1–2 following checkerboard reversal in the sitting session. The dashed red vertical line indicates the time of the checkerboard reversal (0 s). (B) As in A, but recorded while participants were walking.
  
  Author response image 7.
  
  Inter-trial phase coherence (ITPC) for visual trials during sitting and walking. (A) ITPC across trials for electrode pairs Fp1–2, C3–4, P3–4, and O1–2 following checkerboard reversal in the sitting session. The dashed red vertical line marks the time of the checkerboard reversal (0 s). (B) As in A, but recorded during walking.
  
  Author response image 8.
  
  Time–frequency representations of EEG power changes during visuomotor mismatch and playback halt responses. (A) Time–frequency maps showing changes in spectral power relative to baseline for electrodes Fp1–2, C3–4, P3–4, and O1–2 following visuomotor mismatch presentation. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but for playback halts.
  
  Author response image 9.
  
  Inter-trial phase coherence (ITPC) for the visuomotor mismatch and playback halt responses. (A) ITPC across trials for electrode pairs Fp1–2, C3–4, P3–4, and O1–2 following visuomotor mismatch presentation. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but for playback halts.
  
  Finally, the sentence in the abstract that this paradigm " can trigger strong prediction error responses and consequently requires shorter recording times would simplify experiments in a clinical setting" is a nice setup to the paper, but the very fact that one third of recordings had to be removed due to movement artifact, and that hairstyle modulates the recording SnR, is reason that this paradigm, using the reported equipment, may have limited clinical utility in its current form. Further, auditory oddball paradigms are of great clinical utility because they do not require explicit attention and can be recorded very quickly with no behavioral involvement of a hospitalized patient. This should be discussed, although it does not detract from the overall scientific importance of the study. The authors should reconsider putting this statement in the abstract.
  
  We have added a paragraph to the discussion to address these points. Note, we get robust and strong responses with very few trials (Author response image 2). The fact that we need to discard up to 21.7 % of trials due to movement/eye blink artefacts, does little to change the fact that we need much fewer trials and have larger and more robust responses compared to other EEG paradigms. Finally, we understand that sometimes not needing participants to pay attention to the task is useful. However, having a paradigm that is engaging and fun for participants and takes 5 minutes of recording time is probably equally often of advantage.
  
  Reviewer #3 (Recommendations for the authors):
  
  Minor points:
  
  (1) In the Introduction, I'm not sure that the logic comes through as to what the authors aim to illustrate by comparing mice to humans, in terms of precision and "movement modulation". In some cases, the precision of the comparison is referred to, and in others, the precision of the prediction (I think?). I'm not sure if they mean for this to be different or not. Simlarly, on line 81, "If indeed the precision of visuomotor coupling determines the amount of motor modulation of visual responses" - here I'm a little confused, as "amount of motor modulation" to me, the term "modulation" refers to a conditional modifier (if moving, than suppress visual movement resposnes. if not moving, then amplify visual movement repssones) rather than movement driven activity. The way I'm reading it, the authors mean the latter, but I could be misunderstanding.
  
  We have rephrased this section of the introduction.
  
  (2) I think it could be helpful, in the sentence starting on line 65, to reiterate that this observation of higher-than-expected motor activity in V1 is in mice (if I'm understanding it correctly). I also found myself tangled up in the difference between motor-related activity in V1 and motor-modulation in V1 in this paragraph.
  
  We have rephrased this section of the introduction.
  
  (3) For signal power, was the amplitude squared on individual trials prior to averaging, or after averaging? If prior, it would help with separating amplitude modulations from phase variance.
  
  In our previous analysis, power was computed by squaring the amplitude after trial averaging (Author response image 10A). We repeated the analysis using the alternative approach in which power was calculated for individual trials and then averaged (Author response image 10B). Although this method yields substantially higher absolute power values, the overall pattern of results remains unchanged: visuomotor mismatch responses continue to show significantly higher power than visual responses. To look at the phase variance we additionally analyze inter-trial phase coherence (Author response image 7 and Author response image 9).
  
  Author response image 10.
  
  Visuomotor mismatch responses have more power compared to visual responses. (A) Comparison of power between visuomotor mismatch and visual responses, calculated within a 0 - 0.5 s time window following stimulus onset. Power was computed by squaring the amplitude after trial averaging. Boxes indicate the median and interquartile range, with whiskers showing the range excluding outliers; circles represent data from individual participants. ***p < 0.001. (B) Same comparison as in (A), but with power calculated by squaring the amplitude of individual trials prior to averaging.
  
  (4) The "the world suddenly flew forward!" response from the participant, I understand, and I believe that it is useful to illustrate a point. I do not understand the "Are you printing this? - Hi Mom! " part of the participant response, and I'm not sure it adds to the paper, beyond amusement, which seems inappropriate.
  
  One of the authors (the one who did none of the experiments) finds this endlessly hilarious and as the reviewer notes, it might add amusement more generally. “Inappropriate” might be a bit harsh – according to our favorite AI chatbot: “Amusement provides significant mental, physical, and social value by offering a necessary escape from routine, reducing stress, and fostering a connection. It enhances well-being through endorphin-releasing experiences and encourages social bonding, learning, and joy.” Nevertheless, we have censored the offending passage.
  
  Aizenbud, I., Audette, N., Auksztulewicz, R., Basiński, K., Bastos, A.M., Berry, M., Canales-Johnson, A., Choi, H., Clopath, C., Cohen, U., Costa, R.P., Filippo, R.D., Doronin, R., Errington, S.P., Gavornik, J.P., Gillon, C.J., Granier, A., Hamm, J.P., Hertäg, L., Kennedy, H., Kumar, S., Ladd, A., Ladret, H., Lecoq, J.A., Maier, A., McCarthy, P., Mei, J., Mejias, J., Mikulasch, F., Mudrik, N., Najafi, F., Nejad, K., Nejat, H., Oweiss, K., Petrovici, M.A., Priesemann, V., Rudelt, L., Ruediger, S., Russo, S., Salatiello, A., Senn, W., Sennesh, E., Sima, S., Uran, C., Vasilevskaya, A., Vezoli, J., Vinck, M., Westerberg, J.A., Wilmes, K., Xiong, Y.S., 2025. Neural mechanisms of predictive processing: a collaborative community experiment through the OpenScope program. https://doi.org/10.48550/arXiv.2504.09614
  
  Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J., 2012. Canonical microcircuits for predictive coding. Neuron 76, 695–711. https://doi.org/10.1016/j.neuron.2012.10.038
  
  Cavanagh, J.F., 2015. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage 110, 205–216. https://doi.org/10.1016/j.neuroimage.2015.02.007
  
  Delorme, A., Makeig, S., 2004. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. https://doi.org/10.1016/j.jneumeth.2003.10.009
  
  Gramann, K., Gwin, J.T., Bigdely-Shamlo, N., Ferris, D.P., Makeig, S., 2010. Visual evoked responses during standing and walking. Front. Hum. Neurosci. 4, 202. https://doi.org/10.3389/fnhum.2010.00202
  
  Heindorf, M., Keller, G.B., 2024. Antipsychotic drugs selectively decorrelate long-range interactions in deep cortical layers. eLife 12, RP86805. https://doi.org/10.7554/eLife.86805
  
  Keller, G.B., Hahnloser, R.H.R., 2009. Neural processing of auditory feedback during vocal practice in a songbird. Nature 457, 187–90. https://doi.org/10.1038/nature07467
  
  Keller, G.B., Mrsic-Flogel, T.D., 2018. Predictive Processing: A Canonical Cortical Computation. Neuron 100, 424–435. https://doi.org/10.1016/j.neuron.2018.10.003
  
  Oliveira, A.S., Schlink, B.R., Hairston, W.D., König, P., Ferris, D.P., 2016. Proposing Metrics for Benchmarking Novel EEG Technologies Towards Real-World Measurements. Front. Hum. Neurosci. 10, 188. https://doi.org/10.3389/fnhum.2016.00188
  
  O’Toole, S.M., Oyibo, H.K., Keller, G.B., 2023. Molecularly targetable cell types in mouse visual cortex have distinguishable prediction error responses. Neuron 111, 2918-2928.e8. https://doi.org/10.1016/j.neuron.2023.08.015
  
  Rao, R.P.N., Ballard, D.H., 1999. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87. https://doi.org/10.1038/4580
  
  Vasilevskaya, A., Widmer, F.C., Keller, G.B., Jordan, R., 2023. Locomotion-induced gain of visual responses cannot explain visuomotor mismatch responses in layer 2/3 of primary visual cortex. Cell Rep. 42, 112096. https://doi.org/10.1016/j.celrep.2023.112096
  
  Webb, J.M., Sohoglu, E., 2025. Cortical tracking of prediction error during perception of connected speech. https://doi.org/10.1101/2025.07.18.665498
  
  West, C.L., Bastos, G., Duran, A., Nadeem, S., Ricci, D., Groves, A.M.R., Wargo, J.A., Peterka, D.S., Leeuwen, N.V., Hamm, J.P., 2024. A lasting impact of serotonergic psychedelics on visual processing and behavior. https://doi.org/10.1101/2024.07.03.601959
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.14.670295v2
www.biorxiv.org www.biorxiv.org

Starvation of the bacterium Vibrio atlanticus induces simultaneous attacks on the dinoflagellate Alexandrium pacificum

1
1. Public_Reviews 18 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Rolland and colleagues investigated the interaction between Vibrio bacteria and Alexandrium algae. The authors found a correlation between the abundance of the two in the Thau Lagoon and observed in the laboratory that Vibrio grows to higher numbers in the presence of the algae than in monoculture. Timelapse imaging of Alexandrium in coculture with Vibrio enabled the authors to observe Vibrio bacteria in proximity to the algae and subsequent algae death. The authors further determine the mechanism of the interaction between the two and point out similarities between the observed phenotypes and predator prey behaviours across organisms.
  
  Strengths:
  
  The study combines field work with mechanistic studies in the laboratory and uses a wide array of techniques ranging from co-cultivation experiments to genetic engineering, microscopy and proteomics. Further, the authors test multiple Vibrio and Alexandria species and claim a wide spread of the observed phenotypes.
  
  Comments on revisions:
  
  I thank the authors for their additional work on the manuscript. My comments were addressed to my satisfaction.
  
  Dear Reviewer #1, we thank you for your careful evaluation of our manuscript and for the time and effort you dedicated to this review. We are pleased that the revised version has addressed your concerns to your satisfaction.
  
  Reviewer #2 (Public review):
  
  Goal summary
  
  The authors sought to (i) demonstrate correlations between the dynamics of the dinoflagellate Alexandrium pacificum and the bacterim Vibrio atlanticus in natural populations, ii) demonstrate the occurrence of predation in laboratory experiments, iii) demonstrate that predation is induced by predator starvation, and iv) test for effects of quorum sensing and iron-uptake genes on the predation process.
  
  Strengths include
  
  - Data indicating correlated dynamics in a natural environment that increase the motivation for study of in vitro interactions
  
  - Experimental design allowing clear inference of predation based on population counts of both prey and predators in addition to microscopy-based evidence
  
  - Supplementation of population-level data with molecular approaches to test hypotheses regarding possible involvement of quorum sensing and iron update in predation
  
  Weaknesses include
  
  - A quantitative analysis of effects of manipulating V. atlanticus density on rates of predation would have been valuable
  
  - Lack of clarity in some of the methodological descriptions
  
  Appraisal
  
  The authors convincingly demonstrate that V. atlanticus can prey on A. pacificum, provide strongly suggestive evidence that such predation is induced by starvation and clearly demonstrate that both iron availability and correspondingly the presence of genes involved in iron uptake strongly influence the efficacy of predation.
  
  Discussion of impact
  
  This paper will interest those interested in the diversity of forms of microbial predation and how microbial predatory behavior responds to environmental fluctuations. It will also interest those investigating bacteria-algae interactions and potential ecological controls of algal blooms. It may also interest researchers of microbial cooperation in light of the suggestion of communication between predator cells.
  
  Dear Reviewer #2, we sincerely thank you for the time you devoted to this second review of our manuscript. We greatly appreciate your thoughtful comments, which helped us further improve the clarity and precision of the manuscript. All your additional recommendations have been carefully considered and addressed in the revised version and in our responses below.
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  (2) The authors' reference to Fig. 4a did not address our concern about density potentially affecting the outcomes shown in Fig. 3. Fig. 4a does not provide any quantitative effects of manipulating Vibrio density. But the new density numbers the authors added in response to point (33) do seem to address our concern, because Vibrio densities become lower in the older cultures, excluding the possibility that the increased predation in older cultures might have been due higher Vibrio densities. We think this should be stated explicitly.
  
  (33) See point (2) above. We think the authors should explicitly state in the text that the increased predation in older cultures was not due higher Vibrio densities in those older cultures, referring to their data.
  
  As recommended by Reviewer#2, we added the sentence “Importantly, Vibrio densities decreased with culture age, ruling out the possibility that the stronger predation observed in older cultures was driven by higher bacterial densities” in the results section “Attack of A. pacificum ACT03 is activated by V. atlanticus LGP32 starvation.”
  
  (45) Is it known that bacterial predators collectively feed more on other bacteria than on microbial eukaryotes in natural habitats? While this certainly seems most likely, it's stated as fact and so should either the statement should be supported with relevant citations or phrased as a likely hypothesis.
  
  As suggested, we rephrased this sentence “Predatory bacteria are found in a wide variety of environments and are commonly described as feeding on other bacteria, although some cases of predation on microbial eukaryotes have also been hypothesized” in the discussion section.
  
  (46) Perhaps "Conceiving predators as free-living organisms that kill other organisms and feed on them, this study suggest that Vibrios engage in a novel form of predation in which they kill and feed on algae."
  
  The reference to 'developing' a predator behavior is not clear. What is meant by 'develop'? It seems unnecessary.
  
  The use of italics when writing Vibrio is inconsistent.
  
  We agree that the reference to “developing” a predatory behavior was unclear and unnecessary. We therefore revised the sentence as follows: “Conceiving predators as free-living organisms that kill other organisms and feed on them, this study suggests that Vibrio engages in a novel form of predation in which it kills and feeds on algae.” We also corrected the inconsistent use of italics for Vibrio throughout the manuscript.
  
  (48) The authors might wish to revise this sentence, as although M. xanxthus does have contact-dependent killing mechanism, it is our understanding that both Lysobacter and myxobacteria can kill some prey at a distance with diffusible secretions.
  
  The sentence “These bacteria must be in close proximity to their prey in order to cause lysis and utilize their biomass, regardless of the prey's species” was replaced by “These bacteria may require close proximity to their prey to cause lysis and utilize their biomass, although some can also kill prey at a distance through diffusible secretions”.
  
  (50) Why not directly say 'predatory behavior?
  
  We totally agree and have reworded the sentence.
  
  Line by line feedback:
  
  28 '...the phycosphere, an interface ...'
  
  We agree and have revised the wording.
  
  24 'In the attack stage, Vibrios...'
  
  This sentence has been rephrased as recommended.
  
  35 surrounds -> surround
  
  The correction has been done.
  
  36 The lysis is induced by the cells not by the 'stage'. We would rephrase to 'in which the lysis and consumption of the dinoflagellates occurs'
  
  This sentence has been rephrased as recommended.
  
  41 'a new mechanism that could to be involved' -> 'a new mechanism that could be involved ...'
  
  The correction has been done.
  
  61 forms
  
  The correction has been done.
  
  98 'the role...in'
  
  The suggested correction has been performed.
  
  103 'Qpcr' -> 'qPCR'
  
  Thank you for spotting this typo. “Qpcr” was corrected to “qPCR” in the manuscript.
  
  125 Misplaced punctuation
  
  The punctuation was corrected.
  
  152 The use of '.' vs 'x' to indicate multiplication when writing numbers is inconsistent. In some cases both are missing.
  
  Numbers have been corrected throughout the manuscript.
  
  231 I would rephrase 'poor nutrient stress' to 'little nutrient stress' or 'no nutrient stress'
  
  The rephrasing was carried out as suggested.
  
  310 R and used packages are not cited
  
  We added the citation (R Core Team, 2024). Linear models, QQ plots (which are part of linear models), tests, and AICs are included in R by default and are credited to the R Core Team.
  
  The sentence “Statistical analyses were performed using R 3.6.3 software” was replaced by “Statistical analyses were performed using R 3.6.3 software (R Core Team, 2024) using Rstudio”.
  
  358 'are capable of simultaneously attacking'
  
  The expression “are capable of simultaneously attacking” was revised in the manuscript to improve clarity and readability.
  
  366 'exponential growth phase'
  
  We have corrected the wording to “exponential growth phase” in the revised manuscript.
  
  430 The large difference in incubation time between the sea-water vs nutrient-rich treatments and use of different media are unfortunate. These additional variables compromise the ability to directly ascribe observed differences to starvation.
  
  We agree, the sentence “The comparative analysis of the proteome of V. atlanticus LGP32 incubated 60 h in artificial seawater (ENSW) versus V. atlanticus LGP32 grown 12 h in Zobell nutrient-rich medium revealed 10 proteins modulated by nutrient stress (Fig. S2)” was replaced by “The comparative analysis of the proteome of V. atlanticus LGP32 incubated 60 h in artificial seawater (ENSW) versus V. atlanticus LGP32 grown 12 h in Zobell nutrient-rich medium revealed 10 proteins that were differentially abundant under these two contrasting conditions (Fig. S2)”
  
  443 Somewhat unclear sentence. I would rephrase this to "Remarkably, of the 10 proteins identified by proteomic analysis and eliminated by mutation, only elimination of PvuB prevented V. atlanticus from attacking A. pacificum ACT03."
  
  To clarify this point, the sentence “Remarkably, among the 10 proteins identified by proteomic analysis only V. atlanticus LGP32 mutant lacking pvuB failed to attack A. pacificum ACT03 (Fig. 4C; ANOVA p <0.001)” was replaced by “Remarkably, of the 10 proteins identified by proteomic analysis and eliminated by mutation, only elimination of PvuB prevented V. atlanticus from attacking A. pacificum ACT03 (Fig. 4C; ANOVA p <0.001).”
  
  445 'attack simultaneously' -> 'simultaneously attack'
  
  The suggested modification has been done.
  
  450 H3BO4 is written as Boron later, it would be good to call it boron here as well so that it is easier to make the connection for the reader.
  
  We agree, we modified the manuscript and called it boron.
  
  459 'no linked' -> 'no link'
  
  The text was modified accordingly.
  
  483 'which induces' -> 'which induce'
  
  The correction has been made.
  
  519 The use of Vibrio atlanticus and V. atlanticus is inconsistent within the text.
  
  We have checked and modified the manuscript in accordance with the recommendations.
  
  807-808 The use of the phrase 'Akaike information criterion (AICc) models' is confusing. Aren't these models just generalized linear models? It should be rephrased to make clear that the AICc is just a test that is used to select which model to use.
  
  We clarified this point by revising Figure 1 legend. The sentences “(C) Result of Akaike information criterion (AICc) models tested to explain the mean value of degraded Alexandrium cells (dead cells) in spring. (D) Wald test of the AICc model attributing the mean value of degraded cells of Alexandrium in spring to free Vibrio “were replaced by “(C) Results of the Akaike Information Criterion (AICc) test conducted to select a model for explaining the mean value of dead Alexandrium (degraded cells) in spring. (D) Wald test of the AICc model explaining the mean value of dead Alexandrium in spring by free Vibrio”
  
  827 The chronological sequence of snapshots is not very clear. Perhaps it would be clearer if pictures over a shorter timeframe were used to clearly show the gathering of the V. atlanticus cells near the algal cells.
  
  To address this point, we removed the first and the last 14 seconds of the snapshots to clearly show the gathering of the V. atlanticus cells near the algal cells, and we added an arrow on Fig. 2D to indicate the chronological order.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.12.18.629110v3
www.biorxiv.org www.biorxiv.org

Comprehensive characterization of human color discrimination thresholds

1
1. Public_Reviews 18 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We would like to thank the editors and the reviewers for the thorough and insightful comments and suggestions. Addressing them has strengthened our manuscript. We have carefully addressed all reviewer comments, as described in detail below, as well as additional comments we received from others. In addition, we made two substantive updates to the manuscript:
  
  (1) We improved the estimation of uncertainty in the model predictions by computing 95% confidence intervals using 120 bootstrapped datasets (instead of the 100% of 10 bootstrapped datasets in the original submission) to match the number of bootstrap for the validation dataset.
  
  (2) We selected a slightly different hyperparameter value based on follow-up analyses suggested by Reviewer 1, which provided very useful information.
  
  Importantly, none of these changes alter the main results or conclusions of the paper.
  
  Beyond these changes and those outlined below, we also worked to improve the clarity of the prose throughout as well as added various additional citations to the literature.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This paper presents an ambitious and technically impressive attempt to map how well humans can discriminate between colours across the entire isoluminant plane. The authors introduce a novel Wishart Process Psychophysical Model (WPPM) - a Bayesian method that estimates how visual noise varies across colour space. Using an adaptive sampling procedure, they then obtain a dense set of discrimination thresholds from relatively few trials, producing a smooth, continuous map of perceptual sensitivity. They validate their procedure by comparing actual and predicted thresholds at an independent set of sample points. The work is a valuable contribution to computational psychophysics and offers a promising framework for modelling other perceptual stimulus fields more generally.
  
  Strengths:
  
  The approach is elegant and well-described (I learned a lot!), and the data are of high quality. The writing throughout is clear, and the figures are clean (elegant in fact) and do a good job of explaining how the analysis was performed. The whole paper is tremendously thorough, and the technical appendices and attention to detail are impressive (for example, a huge amount of data about calibration, variability of the stim system over time, etc). This should be a touchstone for other papers that use calibrated colour stimuli.
  
  Weaknesses:
  
  Overall, the paper works as a general validation of the WPPM approach. Importantly, the authors validate the model for the particular stimuli that they use by testing model predictions against novel sample locations that were not part of the fitting procedure (Figure 2). The agreement is pretty good, and there is no overall bias (perhaps local bias?), but they do note a statistically-significant deviation in the shape of the threshold ellipses. The data also deviate significantly from historical measurements, and I think the paper would be considerably stronger with additional analyses to test the generality of its conclusions and to make clearer how they connect with classical colour vision research. In particular, three points could use some extra work:
  
  (1) Smoothness prior.
  
  The WPPM assumes that perceptual noise changes smoothly across colour space, but the degree of smoothness (the eta parameter) must affect the results. I did not see an analysis of its effects - it seems to be fixed at 0.5 (line 650). The authors claim that because the confidence intervals of the MOCS and the model thresholds overlap (line 223), the smoothing is not a problem, but this might just be because the thresholds are noisy. A systematic analysis varying this parameter (or at least testing a few other values), and reporting both predictive accuracy and anisotropy magnitude, would clarify whether the model's smoothness assumption is permitting or suppressing genuine structure in the data. Is the gamma parameter also similarly important? In particular, does changing the underlying smoothness constraint alter the systematic deviation between the model and the MOCS thresholds? The authors have thought about this (of course! - line 224), but also note a discrepancy (line 238). I also wonder if it would be possible to do some analysis on the posterior, which might also show if there are some regions of color space where this matters more than others? The reason for doing this is, in part, motivated by the third point below - it's not clear how well the fits here agree with historical data.
  
  Thank you for raising this important point. We have now added analyses of the effects of the two smoothness-related hyperparameters, ε and γ (see Appendix 10).
  
  First, we swept a range of values for each hyperparameter (ε: 0.1 – 1; γ: 0.000001 – 0.003) and evaluated model performance using 5-fold cross-validation of the dataset used to fit the WPPM, quantifying predictive accuracy on held-out test data. We used the mean negative log likelihood averaged across the held-out data in the cross validation as our measure of predictive accuracy (Figs. S27-31).
  
  The two hyperparameters affect cross-validation accuracy in a similar manner. With γ fixed at 0.0003, predictive accuracy is highest for ε in the range of approximately 0.3–0.5 and drops quite rapidly for ε < 0.3. We attribute this drop to oversmoothing. Cross-validation accuracy also decreases, albeit more gradually, for ε > 0.5. We attribute this to increased variance due to undersmoothing relative to the power of our datasets. Similarly, with ε fixed at 0.4, predictive accuracy is highest for γ values between approximately 0.0001 and 0.001, declines rapidly for smaller γ (oversmoothing), and more slowly for larger γ (undersmoothing).
  
  Second, we examined how the hyperparameter ε affected the agreement between the WPPM fit and the MOCS validation data. Specifically, at each ε, for each participant, we computed the linear regression between WPPM thresholds and validation thresholds at 25 reference locations. Then, we examined the slope and correlation coefficient of all participants as a function of ε. We found a classic bias–variance tradeoff. Excessive smoothness introduces bias by failing to capture structure in the data, whereas insufficient smoothness increases variance in model predictions. These results further support a choice of ε = 0.4 as lying near the optimal balance between bias and variance (Fig. S32).
  
  Based on these analyses, we selected for the final analysis ε = 0.4, slightly smaller than the preregistered value used in the original submission (0.5), while retaining the original value of γ (0.0003).
  
  We now discuss these reasons for changing this value in the revision, as well as provide a more general discussion of the importance and practicalities of hyperparameter choice in Bayesian approaches to analyzing data (Discussion / Prior specification).
  
  (2) Comparison with simpler models. It would help to see whether the full WPPM is genuinely required. Clearly, the data (both here and from historical papers) require some sort of anisotropy in the fitting - the sensitivities decrease as the stimuli move away from the adaptation point. But it's >not< clear how much the fits benefit from the full parameterisation used here. Perhaps fits for a small hierarchy of simpler models - starting with isotropic Gaussian noise (as a sort of 'null baseline') and progressing to a few low-dimensional variants - would reveal how much predictive power is gained by adding spatially varying anisotropy. This would demonstrate that the model's complexity is justified by the data.
  
  In the 5-fold cross-validation analysis described above (and now presented in Appendix 10), we found that when ε or γ is small, the stronger smoothness constraint leads to threshold ellipses that are nearly identical to each other across color space. Under these conditions, model predictions show poor accuracy on held-out test data and lead to poor predictions of the validation data. This observation addresses the underlying point raised by the reviewer, albeit in a different way than suggested: it shows that a degree of spatially varying anisotropy is necessary to capture the structure of the data. We now make this point in the paper (Discussion / Prior specification).
  
  More broadly, we employed the WPPM as a prior that imposed smoothness but not much other obvious structure, and used this to learn about the psychometric field. We are currently working to understand how we can best use our current data to improve the prior we would apply to future measurements. There are a number of approaches to this. One would be to seek a parametric mechanistic model that can describe the current data, and to the extent this is possible formulate prior distributions over the parameters of the model. The results reported here thus provide a foundation for deriving and evaluating more structured priors that would even more efficiently leverage future datasets, but with the feature that they impose more structure. We have added this perspective to the Discussion / Extensions of the WPPM framework.
  
  (3) Quantitative comparison to historical data. The paper currently compares its results to MacAdam, Krauskopf & Karl, and Danilova & Mollon only by visual inspection. It is hard to extract and scale actual data from historical papers, but from the quality of the plotting here, it looks like the authors have achieved this, and so quantitative comparisons are possible. The MacAdam data comparisons are pretty interesting - in particular, the orientations of the long axes of the threshold ellipses do not really seem to line up between the two datasets - and I thought that the orientation of those ellipses was a critical feature of the MacAdam data. Quantitative comparisons (perhaps overall correlations, which should be immune to scaling issues, axis-ratio, orientation, or RMS differences) would give concrete measures of the quality of the model. I know the authors spend a lot of time comparing to the CIE data, and this is great.... But re-expressing the fitted thresholds in CIE or DKL coordinates, and comparing them directly with classical datasets, would make the paper's claims of "agreement" much more convincing.
  
  Although we are sympathetic to this request, we have chosen not to implement the sort of quantitative comparison requested by the reviewer. The reason is that an important feature of color thresholds is that they depend on the spatial (e.g. Kelly, 1974; Poirson & Wandell, 1996; Danilova & Mollon, 2025) and temporal (e.g. Kelly, 1974) properties of the stimuli, and on the observer’s state of adaptation (e.g. Loomis & Berger, 1979; Krauskopf & Gegenfurtner, 1992). Because (as the reviewer notes below) the spatial and temporal properties of our stimuli were not matched to those of the comparison datasets, our purpose in making these comparisons was to examine qualitative agreement, as well as to situate our results in the literature and to demonstrate that our approach allows us to read out thresholds around the references and in the color spaces used in other studies. We would not expect detailed quantitative agreement with the current dataset because of differences in stimuli.
  
  As a consequence of this, we think we would be overreaching to quantify the differences between our data and classic datasets. This consideration is particularly important for the MacAdam measurements, where because of the matching adjustment procedure used, the observer’s state of adaptation is likely to have varied (by amounts that are difficult to estimate) from one reference to the next (e.g. Danilova & Mollon, 2025). We have clarified the manuscript with respect to these points (Results / Comparison with previous measurements).
  
  A point to make on this topic is that an important and interesting future direction that emerges from our work is to develop efficient methods to characterize the dependence of the full discrimination field on ancillary variables, such as those that describe spatial and temporal properties and/or the state of adaptation, which we now also mention in the paper (Discussion / Implications for the mechanisms of color perception). Although not the primary motivation, doing so would enable comparison of data with a wider range of studies.
  
  We do agree that the comparisons to CIELAB predictions work better when we express them in CIELAB, and have now done so (Fig. 3D; Fig. S24-S26).
  
  Kelly, D. H. (1974). "Spatio-temporal frequency characteristics of color-vision mechanisms." Journal of the Optical Society of America 64(7): 983–990.
  
  Poirson, A. B. and B. A. Wandell (1996). "Pattern-color separable pathways predict sensitivity to simple colored patterns " Vision Research 36(4): 515–526.
  
  Danilova, M. V. and J. D. Mollon (2025). "Effect of stimulus size on chromatic discrimination." Journal of the Optical Society of America A 42(5).
  
  Loomis, J. M. and T. Berger (1979). "Effects of chromatic adaptation on color discrimination and color appearance." Vision Research 19(8): 891–901.
  
  Krauskopf, J., Gegenfurtner, K. (1992). "Color discrimination and adaptation." Vision Research 32(11): 2165–2175.
  
  Overall, this is a creative and technically sophisticated paper that will be of broad interest to vision scientists. It is probably already a definitive method paper showing how we can sample sensitivity accurately across colour space (and other visual stimulus spaces). But I think that until the comparison with historical datasets is made clear (and, for example, how the optimal smoothness parameters are estimated), it has slightly less to tell us about human colour vision. This might actually be fine - perhaps we just need the methods?
  
  Related to this, I'd also note that the authors chose a very non-standard stimulus to perform these measurements with (a rendered 3D 'Greebley' blob). This does have the advantage of some sort of ecological validity. But it has the significant disadvantage that it is unlike all the other (much simpler) stimuli that have been used in the past - and this is likely to be one of the reasons why the current (fitted) data do not seem to sit in very good agreement with historical measurements.
  
  As the reviewer notes, our stimuli head in the direction of ecological validity (see also Hedjar et al., 2025) and indeed this was a consideration when we chose them, at the cost of limiting the degree of comparison we can make with prior studies (as discussed above). Another reason we chose our stimuli is that they enable the current data to be used as a basis of comparison with stimuli where we add specularity, change object shape, and vary object pose in the future. These manipulations are not possible with flat matte patches. Such experiments are of interest to us, as they will tell us about how effectively color may be used to differentiate stimuli in cases where other ecologically important variables co-vary. We now mention this motivation in the paper (Results / Task and Stimuli).
  
  Hedjar, L., M. Toscani and K. R. Gegenfurtner (2025). "Importance of hue: color discrimination of three-dimensional objects and two-dimensional discs." Journal of the Optical Society of America A 42(5).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Hong et al. present a new method that uses a Wishart process to dramatically increase the efficiency of measuring visual sensitivity as a function of stimulus parameters for stimuli that vary in a multidimensional space. Importantly, they have validated their model against their own hold-out data and against 3 published datasets, as well as against colour spaces aimed at 'perceptual uniformity' by equating JNDs. Their model achieves high predictive success and could be usefully applied in colour vision science and psychophysics more generally, and to tackle analogous problems in neuroscience featuring smooth variation over coordinate spaces.
  
  Strengths:
  
  (1) This research makes a substantial contribution by providing a new method to very significantly increase the efficiency with which inferences about visual sensitivity can be drawn, so much so that it will open up new research avenues that were previously not feasible. Secondly, the methods are well thought out and unusually robust. The authors made a lot of effort to validate their model, but also to put their results in the context of existing results on colour discrimination, transforming their results to present them in the same colour spaces as used by previous authors to allow direct comparisons. Hold-out validation is a great way to test the model, and this has been done for an unusually large number of observers (by the standards of colour discrimination research). Thirdly, they make their code and materials freely available with the intention of supporting progress and innovation. These tools are likely to be widely used in vision science, and could of course be used to address analogous problems for other sensory modalities and beyond.
  
  Weaknesses:
  
  It would be nice to better understand what constraints the choice of basis functions puts on the space of possible solutions. More generally, could there be particular features of colour discrimination (e.g., rapid changes near the white point) that the model captures less well.
  
  This comment bears conceptual similarity to Reviewer 1’s question about the hyperparameters of our prior, as it is basically asking whether we might be oversmoothing through the choice of form and number of basis functions. The hyperparameter sweeps we now present suggest that within the choice of basis functions we used, we are operating at a reasonable point on the bias-variance tradeoff curve - we can see bias emerging with a smoother prior, and variance increasing with a less smooth prior. Our expectation is that varying the smoothness of the prior in other ways, such as by varying the form and number of the basis functions, would lead to similar tradeoffs.
  
  We did perform one additional check that shows, within our current framework, that adding more basis functions is unlikely to change things much. This was to plot the fit weights as a function of Chebyshev basis order (Figure S4 in Appendix 2). These decline to near zero at the highest order we used, suggesting that adding more would not alter the inferred psychometric field, given our hyperparameter choices. Although we could explore this question further by explicitly fitting the data using more basis functions along with different hyperparameter choices, or different functional forms for the basis functions, we decided not to pursue this in favor of performing the other additional analyses we now present.
  
  We resonate with the reviewer’s concern that assuming smoothness, both by assuming that isoperformance contours are elliptical and by assuming that these vary smoothly with reference, might cause us to miss features of the true underlying field in cases where that field varies rapidly or the isoperformance contours are asymmetric or non-elliptical. Our approach to this was to measure the validation thresholds and demonstrate that any bias in our WPPM-inferred field is small for these measurements. Because we shared the reviewer’s intuition that the adapting point is a candidate location where there might be less smooth variation, we measured a validation threshold at this reference for every subject. Nonetheless, we only measured in one direction around the adapting reference for each subject. We considered validation approaches where we measured full ellipses at a set of validation references, but we were worried about effects of uncertainty reduction and perceptual learning which might distort thresholds at highly sampled locations.
  
  It is the case that if one wanted to study the discrimination field in more detail around a particular reference, one could concentrate trials in a smaller model space around that reference, and for the same number of trials use a prior with less smoothness relative to the underlying stimulus space. Indeed, simply halving the size of the stimulus space that maps onto the [-1,1] model space and keeping the same prior over the model space effectively halves the degree of smoothness expressed with respect to the stimulus space. Thus our methods could prove useful in studying more rapid variations in the discrimination field if one hypothesized that they might occur around particular reference choices, but this would still rest upon the elliptical assumption. To relax that assumption, one could use the threshold field estimation methods implemented in AEPsych, which incorporate a smoothness assumption but do not assume elliptical isoperformance contours. Weakening the prior in this way would, however, increase trial demand to obtain similar measurement precision.
  
  As a general matter, we don’t think it is possible to leverage smoothness for trial efficiency on the one hand and at the same time be completely sure that there isn’t some aspect to the underlying ground truth that has been smoothed over. Carefully choosing the degree of prior smoothness together with the number of experimental trials in the context of a particular content problem is an important part of bringing the WPPM and related methods to bear, and one where simulation and held-out data both play an important role.
  
  We now bring these points out more fully in the paper (Discussion / Extensions of the WPPM framework; Discussion / Prior specification).
  
  Chen, C.-C., J. M. Foley and D. H. Brainard (2000). "Detection of chromoluminance patterns on chromoluminance pedestals I: threshold measurements." Vision Research 40(7): 773–788.
  
  The substantial individual differences evident in Figure S20 (comparison with Krauskopf and Gegenfurtner, 1992) are interesting in this context. Some observers show radial biases for the discrimination ellipses away from the white point, some show biases along the negative diagonal (with major axes oriented parallel to the blue-yellow axis), and others show a mixture of the two biases. Are these genuine individual differences, or could the model be performing less accurately in this desaturated region of colour space?
  
  We agree that these differences are interesting. We have now added more complete bootstrapped confidence regions in these (Appendix 8) and the other comparison figures (Appendix 6, 7, 9), so that an estimate of measurement precision is directly available in these figures. These confidence regions suggest that the individual differences in this region of color space are real. A longer-term goal is to develop more mechanistic models that can account for individual subject data through parameter choice. This might lead to insight into what differs in the visual system across individuals.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This study presents a powerful and rigorous approach for characterizing stimulus discriminability throughout a sensory manifold, and is applied to the specific context of predicting color discrimination thresholds across the chromatic plane.
  
  Strengths:
  
  Color discrimination has played a fundamental role in studies of human color vision and for color applications, but as the authors note, it remains poorly characterized. The study leverages the assumption that thresholds should vary smoothly and systematically within the space, and validates this with their own tests and comparisons with previous studies.
  
  Weaknesses:
  
  The paper assumes that threshold variations are due to changes in the level of intrinsic noise at different stimulus levels. However, it's not clear to me why they could not also be explained by nonlinearities in the responses, with fixed noise. Indeed, most accounts of contrast coding (which the study is at least in part measuring because the presentation kept the adapt point close to the gray background chromaticity, and thus measured increment thresholds), assume a nonlinear contrast response function, which can at least as easily explain why the thresholds were higher for colors farther from the gray point. It would be very helpful if a section could be added that explains why noise differences rather than signal differences are assumed and how these could be distinguished. If they cannot, then it would be better to allow for both and refer to the variation in terms of S/N rather than N alone.
  
  We agree with the reviewer. We are measuring SNR and attributing it to noise, but cannot identify from the data whether changes in SNR across color spaces are due to changes in noise, to a nonlinear relationship between stimulus space and the observer’s response space with noise in the response space held fixed, or both. We now make this point where we introduce the Results / Wishart Process Psychophysical Model and reiterate it in the Discussion / Extensions of the
  
  WPPM framework.
  
  Related to this point, the authors note that the thresholds should depend on a number of additional factors, including the spatial and temporal properties and the state of adaptation. However, many of these again seem to be more likely to affect the signal than the noise.
  
  We don’t disagree. Indeed, as we noted in our response to a comment by Reviewer 1 and above in the context of individual differences, we are very interested in developing a mechanistically plausible model that accounts for the data. If we or others are able to do so, that would provide a basis for parsing performance into separate signal and noise effects. And if such a model has natural ways in which additional variables affect its predictions, measuring the effects of these variables would be a way to provide evidence in favor of the model (Discussion / Implication for the mechanisms of color perception - Extensions of the WPPM framework).
  
  An advantage of the approach is that it makes no assumptions about the underlying mechanisms. However, the choice to sample only within the equiluminant plane is itself a mechanistic assumption, and these could potentially be leveraged for deciding how to sample to improve the characterization and efficiency. For example, given what we know about early color coding, would it be more (or less) efficient to select samples based on a DKL space, etc?
  
  The more we are willing to assume about the structure of the psychometric field, the more efficiently we can measure it. As the reviewer correctly notes, this principle applies to trial placement as well. We are currently using an adaptive method (AEPsych) that starts with a fairly weak smoothness prior and attempts to place trials using heuristics that aim to minimize the expected uncertainty in the posterior. As we learn more about the discrimination field, we should be able to leverage stronger priors to increase trial efficiency. This point is closely related to one we made above about developing stronger priors that capture what we have learned in this study. Such priors could also help improve trial placement. For a prior that has a relatively small number of parameters, for example, perhaps a mechanistic prior, methods such as Quest+ (Watson, 2017) may be used for trial placement.
  
  Watson, A. B. (2017). "QUEST+: A general multidimensional Bayesian adaptive psychometric method." J Vis 17(3): 10.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  I do not think that the authors need to perform additional experiments. However, I would like to see some additional analyses regarding the assumptions made in the fitting procedure and how they affect the final maps.
  
  I also think some more quantitative comparisons with historical data would be valuable - at the moment, a lot of the comparisons are simply 'by eye'.
  
  It would have been nice to have the code and data available during the review procedure - I'm sure these will be released with excellent documentation?
  
  We addressed the first two points in the public review section. The code is now available online as is the data. These links are now provided in the paper (Methods and Materials / Data and code availability).
  
  Reviewer #2 (Recommendations for the authors):
  
  Minor points
  
  I have a few suggestions for additions and small changes.
  
  (1) Several examples of covariance matrix fields are shown in Figure 1, 4, but these are for simulated examples. It would be nice to see the fields actually fit the data! I would be interested in seeing this for all participants in an Appendix, and maybe for participant CH in the main paper?
  
  We have made the changes (see Figure 4 and Figure S3).
  
  (2) I have not worked through all the math in the appendices line by line, but it seems to be complete, and the model validation results speak for themselves. I think the authors have done a pretty good job of explaining the model conceptually (not easy), but I struggled with the 'weighted sum' step in Figure 4 and the main text. I would appreciate a bit more hand-holding here, e.g, why is an 'overcomplete' representation needed as an intermediate, and providing an intuition of why there are 12 matrices in the overcomplete representation and what each matrix in this representation represents.
  
  We have now added more explanations in the figure legend and text (Fig. 4 and Methods and Materials / The Wishart Process Psychometric Model).
  
  (3) Individual differences: There is a section on this in the manuscript, and it's concluded that there are only "modest" individual differences. However, in Figure S20, the individual differences, I think, are huge and place observers almost in qualitatively different categories! Some observers show a radial bias in discrimination ellipses, others seem to show basically a bias along the negative diagonal, and others a mixture of both biases. These ellipses are at a desaturated part of colour space - is it possible that there are some rapid changes in the underlying noise in this region that the Wishart fit has not captured due to relatively sparse sampling or the fact that the basis functions are all fairly low spatial frequency? I wondered whether the results are constrained by the choice of Cartesian rather than polar basis functions, e.g, polar basis functions may have better allowed fine-grained changes near the white point but slower changes at higher saturations away from the white point.
  
  We agree that the individual differences are meaningful and, in some cases, quite pronounced. Our intent in describing the differences as “modest” was to emphasize that the overall structure of the psychometric fields remains broadly consistent across observers. We have revised the Results to note and more fully describe these differences.
  
  Regarding the possibility that sharp changes in the underlying noise near the achromatic point might not be fully captured by the current model, we agree that this is an important consideration. The current implementation uses relatively low-order Chebyshev basis functions that primarily capture smooth global variations in the psychometric field. While validation analyses indicate that these basis functions capture the dominant structure in the data, they may be less sensitive to sharp local variations such as those that could occur near the white point. Future work could address this by mapping the model space to a smaller region around the achromatic reference or by exploring alternative basis sets (e.g., polar or Zernike functions) that may better capture such localized structure. This is discussed above in this response and now addressed in Discussion / Extensions of the WPPM framework.
  
  On sampling, I wondered if the results might have been biased by the strongly biased ellipse that occurs at the grey point. If not, and the model is accurate in this region of colour space, I think this figure does show some large individual differences, and it would be good to comment on these in the individual differences section of the manuscript.
  
  Based on our analysis of trial placement (Fig. S1), the adaptive algorithm does not appear to have disproportionately concentrated trials near the gray point. In fact, more trials were allocated to the edges of the stimulus space than to the center. This suggests that the WPPM estimates are unlikely to be driven primarily by performance in the gray region. In addition, we examined the threshold ellipses around the gray reference in DKL space and found that they are broadly consistent across participants (Figs. S22–S23). Together, these analyses suggest that the anisotropy observed near the gray point reflects a genuine property of the psychometric field rather than an artifact of the sampling procedure.
  
  As noted just above, we have added additional text about individual differences in the Results and referenced it in the Discussion.
  
  (4) The manuscript seems unusually free of typographical errors, but I noticed that in many places "Krauskopf and Karl 1992" is cited! Also, I think something has gone wrong with the legend to Figure 2 - perhaps the order of panels was swapped around, but the legend was not fully updated. There is a repeated reference to the "summary of regression slopes" which seems to be in 2 positions, after C and G. It would make more sense to label panel G as D and progress from there, or switch the order of the panels so that G is on the bottom row.
  
  Thank you for catching those errors. They are now fixed.
  
  Reviewer #3 (Recommendations for the authors):
  
  A minor point (or perhaps major if your last name is Gegenfurtner) is that the reference to Krauskopf and Karl is incorrect.
  
  They are now fixed.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.07.16.665219v3
www.biorxiv.org www.biorxiv.org

Predicting functional topography of the human visual cortex from cortical anatomy at scale

1
1. Public_Reviews 15 May 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  In the paper, the authors compare the performance of their new version to two previous approaches. Figure 2b shows that the new toolbox performs similarly to the previous deep-learning-based toolbox, but requires only an anatomical scan, which is a significant improvement. They also compare it to an older method that uses an atlas without requiring deep learning. For eccentricity and pRF size predictions, both deep-learning methods perform better than the older approach. For polar angle, a critical parameter for delineating visual field maps, the gain is substantially less. Moreover, the comparison to the atlas method (Benson2014) is not entirely fair, as, to our knowledge, there is also a more advanced atlas version that uses Bayesian fitting methods and already performs better than the old method. To better understand the gain of using deep learning, it would be beneficial if the authors also made the comparison to this more recent atlas-based approach. Moreover, it would be useful to know the correlations for the representative participant. Some examples of relatively "bad" maps would also be useful to have (and could be provided as supplementary information).
  
  We thank the reviewer for their constructive feedback. We plan to expand our benchmarking section to include the Bayesian model comparison. Note, however, that the additional accuracy gain afforded with the Bayesian model of retinotopy (Benson and Winawer, 2018) results from combining anatomical data with retinotopic maps estimated with a few minutes of functional data. The Bayesian model of retinotopy without such functional data is equivalent to Benson14. We plan to report the correlations (between predicted and empirical maps) for the representative participant shown in Figure 2 and include an additional supplementary figure showing retinotopic map predictions for a participant whose predictions deviate the most from empirical maps, as suggested by the reviewer.
  
  Figure 2b shows that the toolbox is quite good at estimating eccentricity and polar angle parameters, but less good at estimating the population receptive field (pRF) size. I will return to this latter point.
  
  An interesting feature is that while the toolbox is trained on a specific data set (HCP), it can, "out-of-the-box", be applied to different existing data sets, without the need to retrain the model. This is quite important for the general utility of the method. The results for this are shown in Figure 3. Again, in panel b, it can be seen that the toolbox does a good job at estimating eccentricity and polar angle values, but performs rather poorly for pRF size: the deepRetinotopy toolbox has a strong tendency to only estimate very small pRFs, particularly when applying it across different datasets. For this reason, at the moment, these estimates appear hardly useful. It would be very helpful for readers if the authors could clarify or elaborate on this point, particularly regarding the limitations of pRF size predictions. They explain that this could be due to the use of different types of stimuli, but even within the same (HCP) dataset, the predictions primarily suggest tiny pRFs, even though the training dataset also contains larger ones (which can be better seen in supplementary Figure 4). Showing the predictions for higher-order brain areas, which have larger pRFs on average, could serve a similar evaluation purpose. Presumably, the underlying reasons are complex and could relate to the use of different stimuli, different analysis toolboxes, and how the deep learning model is currently being trained. Possibly, the abundance of small pRFs at lower eccentricity in the training set (which is usually the case in any empirical analysis) has given the model a very strong bias toward predicting small pRFs.
  
  There would be various ways to verify which of these components is critical. For example, the model could be trained only on the bar stimuli of the HCP dataset, or the pRFs for all stimuli and datasets could be estimated using the same software tool. The latter seems important. For example, Supplementary Figure 4 indicates a high correlation between the Stanford and NYU cohorts that have used the same stimulus and analysis package, despite having different resolutions and scanners. Further investigation into the underlying reasons for these discrepancies would strengthen the paper. It would also provide valuable guidance for users of the toolbox on which toolbox predictions to trust and which not, as well as how well the model generalizes to other stimulus types, scanners, and image resolutions.
  
  We will expand our discussion of the limitations of pRF size prediction, highlighting that differences in visual stimuli, analysis toolboxes used to estimate pRF parameters from empirical data, and the current training of deepRetinotopy affect prediction accuracy. As the reviewer pointed out, the underlying reasons are complex, and it is difficult to isolate all the potential contributing factors. However, in addition to our expanded discussion, we also intend to present results from additional experiments that assess the impact of different loss functions on the range of predicted pRF sizes (to explain how training may partly account for the differences observed in the HCP dataset). We will also perform pRF fitting on at least one dataset using the same software/encoding model as in the HCP dataset (the training data) to illustrate that the lower performance in pRF size prediction in out-of-distribution datasets is also partly explained by differences in how the empirical maps were obtained.
  
  An aspect that is not directly apparent from the title, abstract, and introduction is that the deepRetinotopy toolbox does not by itself produce estimates of visual area labels or boundaries. It predicts only polar angle and eccentricity values. To predict labels and boundaries, the authors combine the toolbox with an atlas (the aforementioned Bayesian atlas). For visual areas V1 - V3, it does a very good job, in that the predictions are as good as the empirical ones. Notably, the authors indicate that the predictions for V2 and, in particular, V3 are worse than for V1, but Figure 4 clearly shows that predictions are as good as the empirical ones. More cannot be expected from a model that is trained on such empirical data.
  
  We will edit the introduction and abstract to make it clearer that the deepRetinotopy toolbox does not yet produce estimates of visual boundaries on its own.
  
  Irrespective of the limitations with respect to predicting pRF size, the toolbox opens up functionally oriented analyses of very large cohorts of healthy participants, of which only anatomical data is available. The authors present an example of this by confirming the existence of differences in horizontal and vertical asymmetries in the field maps of the visual cortex of children and adults. While Figure 5 confirms the existence of differences, the analysis could be expanded to provide deeper insights, such as normalized developmental trajectories for both asymmetries, given the size of the dataset. This would better highlight the true power of their approach.
  
  Although providing insights into developmental trajectories for horizontal and vertical asymmetries is beyond the scope of the current work, as it would require aggregating datasets such that individuals’ age span a larger range (ABCD dataset only contains individuals between 9-11 years old and the HCP Young Adult dataset between 22-36 years old), we plan to provide some complementary analyses (differences across ages and sex within the ABCD dataset).
  
  While the authors address limitations with respect to studying experience-dependent atypical functional organization, they do not address how the deepRetinotopy toolbox would handle (acquired) brain lesions. Addressing this, even if only speculative, would be welcome. Another welcome addition would be to see the predictions for additional brain areas, even if those would (presumably) be worse at present. Such information would nevertheless be essential for users considering applying this toolbox. Moreover, this could be a valuable resource serving as a benchmark for future iterations of either deepRetinotopy or other approaches.
  
  We plan to expand and report performance evaluation across other visual areas (using Wang atlas’ parcels) to serve as a benchmarking resource. Moreover, we will expand our discussion on how deepRetinotopy would handle brain lesions.
  
  Reviewer #2 (Public review):
  
  (1) The weak point of the contribution is the choice to limit anatomical quality assessments and error quantifications to just three early regions, V1-V3, even though the deepRetinotopy toolbox can delineate over 20 regions (including parietal, ventral, and lateral regions, such as IPS0-5, hV4, VO1-2, V3A, PHC1-2, LO1-2, and TO1-2).
  
  (2) The limit is fine for their large-scale application of the toolbox to age groups, as here, a clear hypothesis on early cortex variability was tested.
  
  (3) However, the introduction of the toolbox itself warrants quality assessments and comparisons to prior models and ground truth beyond V1-V3, just like the authors did in their prior publication of the predecessor model.
  
  (4) This is important as the vast majority of applications of this toolbox will likely go beyond V1-V3 to delineate dorsal, ventral, and lateral regions.
  
  (5) For the present paper, this will require only 1 or 2 additional figures, or extending their present figures 2 and 4 along the lines of their previous figure 7 (Ribeiro et al 2021), which included error measures for high-level regions. Ideally, you provide sub-graphs separately for early visual, dorsal, ventral, and lateral regions.
  
  (6) Going beyond V1-V3 is important for several reasons: first, future studies applying the software beyond V3 will need quantification for reassurance and justification. Second, for the sake of transparency, even if results are noisy or on par with prior models. Third, as a benchmark or reference point for future approaches.
  
  We thank the reviewer for their constructive feedback, and we agree that expanding our performance assessment beyond V1-3 would be a valuable benchmarking resource. Thus, we plan to evaluate retinotopic map prediction accuracy across visual areas defined by the Wang atlas’ parcels, expanding on the results reported in Figure 2, and provide it as a supplementary figure. However, performance estimation ultimately depends on the quality of the dataset used for evaluation. The empirical maps, although treated as ground truth, may themselves misrepresent the underlying retinotopic organization. As a matter of fact, the quality of the empirical data (HCP dataset and others) is indeed lowest in some of the higher-order visual areas.
  
  It may be unclear from the text that the deepRetinotopy toolbox does not yet produce estimates of visual boundaries on its own. Accordingly, we illustrate how deepRetinotopy toolbox’s predictions can be combined with another tool [the Ba yesian model of retinotopy from Benson and Winawer (2018)] to obtain visual area boundaries automatically. We will edit the introduction and abstract to make it clearer. Given the availability of empirical labels (currently only for V1-3) and the segmentation tool (which was only assessed for V1-3), we cannot expand Figure 4 to other visual areas as suggested.
  
  Reviewer #3 (Public review):
  
  Quantification of the Analysis: My main concern is that the analysis relies heavily on global summary measures such as correlation and Dice score. Those measures are useful, but the paper would be more informative if it also quantified boundary differences in millimeters, especially for comparisons such as the V1/V2 boundary in Figure 2. That kind of analysis would help readers understand how large the errors are in physically meaningful terms.
  
  We thank the reviewer for their constructive feedback. Following the reviewer’s suggestion, we plan to expand our segmentation evaluation to quantify the extent to which boundary predictions from deepRetinotopy’s maps deviate from those from empirical maps, in millimetres.
  
  Model fitting methods: I also think the discussion of prediction failures for pRF size should be more explicit. The mismatch is likely influenced by the fact that the training data and several evaluation datasets were fit with different models and different analysis software. In particular, the network was trained on non-linear size estimates from the HCP data, while the comparison datasets were derived using other packages and, in some cases, different model assumptions. That likely contributes to the spread in Figure 3b and should be discussed more directly. It is important to discuss that the pRF parameters were derived using different software tools.
  
  We will expand our discussion of the limitations of pRF size prediction, highlighting that differences in visual stimuli, different encoding models for estimating pRF parameters from empirical data, and the current training of deepRetinotopy affect prediction accuracy. In addition to our expanded discussion, we intend to also present results from additional experiments that assess the impact of those factors on pRF size prediction performance.
  
  Clarifying Model Accuracy: If deepRetinotopy generates a true "noise-removed" representation of functional mapping based on anatomy, then fitting it to one fMRI measurement should predict a second, independent fMRI run better than the noisy data from the first run does.
  
  The authors possess the exact data for this test. For the HCP dataset, the empirical fMRI data were explicitly separated into two halves: "fit 2" (the first half of the fMRI runs) and "fit 3" (the second half). They correlated these two halves to establish a "noise ceiling," the maximum possible reliability of the data. Looking at their results in Figure 2b, the correlation of the deepRetinotopy predictions falls below this noise ceiling. This means that the noisy functional Half 1 actually predicts functional Half 2 better than the anatomical model does.
  
  The authors should state this explicitly. A side-by-side plot of Half 1 predicting Half 2 versus deepRetinotopy predicting Half 2 would show that the anatomical model regularizes map location well, but misses reliable subject-specific variation that anatomy alone cannot capture.
  
  We will expand our benchmarking session to make these comparisons (“Half 1 predicting Half 2 versus deepRetinotopy predicting Half 2”) more explicit. It is important to highlight that there is more subject-specific variation that is currently not captured by our model, and it can also serve as a benchmarking resource for future model versions and newer approaches.
  
  The Hemodynamic Response Function: The assumptions used to generate the original empirical maps are permanently baked into the deep learning model. However, the authors explicitly mention the hemodynamic response function (HRF) only once, noting in the Methods that the modeled time series was "convolved with a canonical hemodynamic response function."
  
  Beyond this single mention, there is no direct discussion of how the assumption of a single canonical HRF across all 161 HCP training subjects might have systematically impacted or biased the network's predictions. The authors address cross-dataset differences broadly under the umbrella of "experimental design" and "fMRI preprocessing pipeline" biases, but the HRF is a core biological property that mediates the connection between the anatomy and the data. The authors should explicitly discuss how this canonical assumption limits or biases the resulting deepRetinotopy network.
  
  As Reviewers 3 and 1 have noted, the observed limitations in pRF size prediction stem from multiple underlying factors. One of those factors is indeed the HRF assumed in the encoding models. We will expand our discussion about factors that may introduce biases into deepRetinotopy predictions, including the HRF.
  
  Scoping the Input Data and Normative Use: The authors use FreeSurfer to generate a mean curvature map for the entire midthickness cortical surface. This full-hemisphere curvature map is resampled to a standard template surface space (32k_fs_LR), acting as the data frame that feeds input features into the neural network. However, while the network receives the full geometric structure of the hemisphere, it is explicitly trained to predict retinotopic parameters only within a restricted posterior ROI, based on the Wang et al. atlas and containing roughly 3,200 vertices per hemisphere.
  
  A useful experiment to try, and perhaps the authors have already considered this, would be to restrict the input features exclusively to the posterior vertices. Including all anterior vertices may make it harder for the network to fit the localized visual data. A brief commentary on why the full hemisphere was retained as input could be highly informative for researchers adapting this geometric deep learning pipeline.
  
  Thanks for this suggestion. We have not performed a systematic evaluation of using ROIs that span a larger portion of the cortex (including the full hemisphere). It is a great idea to do so and report it in our manuscript to inform other researchers interested in adapting our pipeline. We intend to also update our toolbox by retraining our models to take all posterior vertices as suggested, which would improve the coverage of current predictions.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.11.27.690210v3
www.biorxiv.org www.biorxiv.org

Deployment of endocytic machinery to periactive zones of nerve terminals is independent of active zone assembly and evoked release

1
1. Public_Reviews 15 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  We thank the reviewers for their careful consideration of our work and constructive comments. We are glad that reviewers appreciated the rigor and value of our work. In response to the reviewer comments we have made the following changes:
  
  (1) Addition of new experiments on EndoA localization at the Drosophila NMJ (Fig. 2).
  
  (2) Addition of new experiments on Dap160 localization at the Drosophila NMJ (Fig. 2).
  
  (3) Addition of new experiments to validate Dynamin, Dap160 and EndoA antibodies (Fig. 2 – figure supplement 1).
  
  (4) Assessment of the activity-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 3).
  
  (5) Assessment of the liprin-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 8).
  
  (6) Addition of a limitations section to the discussion to directly address that spontaneous release was not fully ablated in our studies and might contribute to recruitment.
  
  (7) Addition of an outlook to the same section on what experimental avenues could address the limitations in the future.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, Emperador-Melero et al. seek to determine whether recruitment of endocytic machinery to the periactive zone is activity-dependent or tethered to delivery of active zone machinery. They use genetic knockouts and pharmacological block in two model synapses - cultured mouse hippocampal neurons and Drosophila neuromuscular junctions - to determine how well endocytic machinery localizes after chronic inhibition or acute depolarization by super-resolution imaging. They find that acute depolarization in both models has minimal to no effect on the localization of endocytic machinery at the periactive zone, suggesting that these proteins are constitutively maintained rather than upregulated in response to transient activity. Interestingly, chronic inhibition slightly increases endocytic machinery levels, implying a potential homeostatic upregulation in preparation for rebound depolarization. Using genetic knockouts, the authors show that localization of endocytic machinery to periactive zones occurs independently of proper active zone assembly, even in the absence of upstream organizers like Liprin-α. Overall, they propose that the constitutive deployment of endocytic machinery reflects its critical role in facilitating rapid and reliable membrane internalization during synaptic functions beyond classical endocytosis, such as regulation of the exocytic fusion pore and dense-core vesicle fusion. Although many experiments reveal limited changes in the localization or abundance of endocytic machinery, the findings are thorough, and data substantially support a model in which endocytic components are organized through a pathway distinct from that of the active zone. This work advances our understanding of synaptic dynamics by supporting a model in which endocytic machinery is constitutively recruited and regulated by distinct upstream organizers compared to active zone proteins. It also highlights the utility of super-resolution imaging across diverse synapse types to uncover functionally conserved elements of synaptic biology.
  
  We thank the reviewer for the positive assessment of our study.
  
  Strengths:
  
  The study's technical strengths, particularly the use of super-resolution microscopy and rigorous image analyses developed by the group, bolster their findings.
  
  We thank the reviewer for highlighting the technical strength of our work.
  
  Weaknesses:
  
  One notable limitation, however, is the absence of interrogation of endocytic proteins previously suggested to be recruited in an activity-dependent manner, in particular, endophilin.
  
  We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drospophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.
  
  We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin, which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al., 2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord versus Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together with our work, we conclude that these data suggest that Endophilin constitutively, but not completely, localizes to the periactive zone.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study examines whether the localization of endocytic proteins to presynaptic periactive zones depends on synaptic activity or active zone scaffolds. Using a combination of genetic and pharmacological perturbations in Drosophila and mouse neurons, the authors show that proteins such as Dynamin, Amphiphysin, AP-180, and others are still recruited to periactive zones even when evoked release or active zone architecture is disrupted. While the results are mostly negative, the study is methodologically solid and contributes to a more nuanced understanding of synaptic vesicle recycling machinery.
  
  We thank the reviewer for deeming our work solid and for highlighting its importance for the field.
  
  Strengths:
  
  (1) The experimental design is careful and systematic, covering both fly and mammalian systems.
  
  (2) The use of advanced genetic models (e.g., Liprin-α quadruple knockout mice) is a notable strength.
  
  (3) High-resolution imaging (STED, Airyscan) is well used to assess spatial localization.
  
  (4) The findings clarify that certain core assumptions - such as strict activity dependence of endocytic recruitment - may not hold universally.
  
  We thank the reviewer for pointing out these strengths.
  
  Weaknesses:
  
  (1) The study would benefit from a clearer positive control to demonstrate activity-dependent recruitment (e.g., Endophilin).
  
  We have added experiments to measure the localization of Endophilin, a protein previously reported to localize to the synaptic vesicle cloud [1], in Drosophila NMJs (Figs. 2 and 3). We observed that EndoA localized both to the synaptic vesicle cloud and to the periactive zone area. While stimulation did not enhance levels in either compartment, this outcome is not inconsistent with shuttling of protein between compartments during activity. Nevertheless, our data support a model in which EndoA, like the other tested endocytic proteins, is present at the periactive zone at rest.
  
  (2) The reliance on Tetanus toxin in the Drosophila NMJ experiments in my eyes is a limitation, as it does not block all presynaptic fusion events; this should be discussed more directly.
  
  We agree with the point of the reviewer. To more directly discuss it, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” (519-523).
  
  (3) The potential role of Dynamin in organizing other periactive zone proteins is not addressed and could be an important next step.
  
  We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”
  
  (4) Some small changes in protein levels upon silencing are reported; their biological meaning (e.g., compensation vs. variability) is not fully clarified.
  
  These changes might include homeostatic adaptations. In the revised version of the manuscript, this is addressed on lines 135-137 and 405-407. We think it is overall difficult to assign biological meaning to small-magnitude changes, and chose to highlight the main point that there are no large-magnitude changes.
  
  (5) While alternative organizing mechanisms (actin, lipids, adhesion molecules) are mentioned, a more forward-looking discussion of how to test these models would be helpful.
  
  Following the reviewer’s suggestion, we have added an outlook section to the discussion where we provide suggestions for future studies (lines 510-543).
  
  (6) The authors should consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.
  
  We have included new experiments on EndoA at the fly neuromuscular junction (Fig. 2, Fig. 3, Fig. 8, Fig. 3 – figure supplement 1) and have added appropriate discussion of these findings as outlined above.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This study examines how synaptic endocytic zones are positioned using a combination of cultured neurons and the Drosophila neuromuscular junction. The authors test whether neuronal activity, active zone assembly, or liprin-α function is required to localize endocytic zone markers, including Dynamin, Amphiphysin, Nervous Wreck, PIPK1γ, and AP-180. None of the manipulations tested caused a coordinated disruption in the localization or abundance of these markers, leading to the conclusion that endocytic zones form independently of synaptic activity and active zone scaffolds.
  
  We thank the reviewer for reviewing our work.
  
  Strengths:
  
  The work is systematic and carefully executed, using multiple manipulations and two complementary model systems. The authors consistently examine multiple molecular markers, strengthening the interpretation that endocytic zone positioning is robust to changes in activity and structural assembly.
  
  We thank the reviewer for pointing out these strengths.
  
  Weaknesses:
  
  The main limitation is that the study does not test whether the methods used are sensitive enough to detect subtle functional disruption, and no condition tested produces clear disorganization of the endocytic zone. As a result, the conclusion that these zones assemble independently is supported by negative data, without a strong positive control for disassembly or mislocalization.
  
  We are confident that our methods are sensitive enough to detect changes within synaptic compartments. First, for mouse neurons assessed with STED microscopy, we have demonstrated that we can distinguish between the N- and the C-termini of the presynaptic protein Bassoon, which are positioned only a few tens of nanometers apart [4]. We have subsequently been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart and have established that genetic manipulations of active zone proteins induce detectable disruptions as assessed by STED microscopy [4-12]. Given that the periactive zone is larger than the distances that we can resolve, we are confident that we can detect changes in this area with enough sensitivity. Second, for Drosophila NMJs, we use a carefully validated workflow that allows assessing the distribution of periactive zone proteins and can detect subtle changes [13]. Unfortunately, there are no known manipulations that lead to periactive zone disassembly that could serve as a positive control, which reflects the little knowledge available in this field. We acknowledge that there may be subtle changes in protein localization that escape the resolution of our microscopy methods or experimental design, but this would not undermine the conclusion that the periactive zone remains assembled across the manipulations that we have tested. Overall, none of the manipulations we test induces a detectable disruption of the periactive zone. Naturally, we cannot exclude milder effects and have added a limitations section to discuss this possibility and some of the subtle changes we observe.
  
  This paper addresses a longstanding question in synaptic biology and provides a well-supported boundary on the types of mechanisms that are likely to govern endocytic zone localization. The conclusions are well justified by the data, though additional evidence would be needed to define the assembly mechanism itself.
  
  We thank the reviewer for the support of the conclusion of our study.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  This is a rigorous study that, while presenting largely negative data, delimitates the processes that control peri-active zone organization. In addition to the interpretive and technical comments below, we encourage the authors to consider extending this study in two areas. First, examining the activity-dependence of Endophilin, and perhaps other factors, being recruited to the PAZ, where previous research has indicated a positive role for activity. Second, further characterization of the role of miniature release events in potentially contributing to PAZ organization. Overall, this was a rigorous and well-executed study.
  
  We thank the reviewing editor for this positive assessment of our work.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The rationale for comparing chronic inhibition to acute depolarization could be more clearly articulated. While this approach may be grounded in prior studies, the physiological consequences of chronic silencing differ markedly from those of transient activity, and these distinctions should be more explicitly addressed in the interpretation of results. For example, might lower intensity, chronic stimulation be a better comparison? Since fixation takes place immediately after stimulation, the time window to capture changes in protein recruitment may be curtailed.
  
  We thank the reviewer for this comment. The introduction of the manuscript now includes a rationale on lines 110-112. By inhibiting evoked synaptic vesicle fusion throughout the lifespan of neurons, we assessed whether this process is necessary for periactive zone assembly and concluded that it is not a requirement. By acutely depolarizing neurons with 50 mM KCl or with a 40 Hz train of action potentials, we were able to test whether synaptic vesicle fusion triggers the rapid recruitment of endocytic proteins to the periactive zone and concluded that this is not the case for most of the endocytic proteins that we studied. While these results indicate that a constitutive pathway must exist to assemble the periactive zone, we remain agnostic as to whether stimulation paradigms not tested in our study can enhance the deployment of endocytic proteins, especially over long periods of time. This may be the case for low, chronic stimulation, as suggested by the reviewer. We clarify these limitations on a “limitations and outlook” section of the discussion (lines 510-543).
  
  (2) Amphiphysin stood out as the only protein showing a notable change in opposite directions under either active zone protein knockout/blockers and Liprin-α knockout. Given the predominance of negative results, it would be valuable to devote more discussion to why Amphiphysin behaves differently. What functional role might it play in this context that sets it apart from other endocytic components?
  
  As suggested by the reviewer, we have extended the discussion on Amphiphysin. One possibility why Amphiphysin may respond differently to different genetic manipulations or changes in stimulation is that different endocytic proteins might belong to different endocytic submachineries. This is addressed on lines 421-424. On lines 444-449, we further discuss the subtle decrease in the levels of Amphiphysin and AP-180 in Liprin-α mutants. We suggest that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus, and that this link may be partially disrupted in Liprin-α mutants. Overall, we note that Amphiphysin is still localized to the periactive zone at rest, and hence that it fits with the overall model of constitutive deployment that we propose.
  
  (3) The claim of activity-independence may need to be nuanced. Although the data suggest no recruitment in response to acute stimulation, the subtle changes following chronic inhibition complicate this interpretation, especially when considering redundancy. If activity-dependence is considered bidirectional, these findings might reflect a more complex regulatory mechanism. The interpretation in lines 188-190 more accurately captures this complexity than earlier generalizations.
  
  We agree with the reviewer that the dependence on activity should be discussed in a nuanced fashion. We have scrutinized the manuscript on this point and state throughout that recruitment is independent of evoked activity and not necessarily of any kind of activity. We believe that this interpretation is accurate because evoked release of neurotransmitter was ablated by the pharmacological and genetic manipulations that we used. Furthermore, we have included a “Limitations of the study” section in the discussion where we openly address that spontaneous fusion of synaptic vesicles cannot be ruled out as a potential mechanism to sustain periactive zone assembly (lines 514-523). Finally, we have expanded on the complexity of periactive zone assembly relative to activity. In particular, homeostasis may contribute to increased levels of endocytic proteins upon chronic blockade of evoked transmission (lines 404-406).
  
  (4) Given published work on endophilin's role in activity-dependent endocytic recruitment, adding endophilin (at least in the Drosophila NMJ experiments) would be highly informative.
  
  We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.
  
  We believe that there are multiple plausible explanations for these findings compared to previous work on Endophilin [3], which we discuss on lines 407-410:
  
  “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are compatible with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.
  
  (5) Line 57 might have a typo in the citation.
  
  We thank the reviewer for pointing this out. The citations now include: Bai et al., 2010; Jiang et al., 2024; Koh et al., 2007; Winther et al., 2013 and Winther et al. 2015. Please note that these two last citations are grouped as Winther et al. 2013, 2015 following our formatting style.
  
  (6) Line 208 might be missing a citation that justifies parameters.
  
  In the revision, this information is discussed on lines 222-224, where we cite our prior work describing these data: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023)”.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Please consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.
  
  We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.
  
  We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin [3], which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are consistent with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.
  
  (2) Expand the discussion of TeNT's limitations-specifically that it does not block spontaneous fusion or alternative fusion pathways-and consider referencing more stringent tools (e.g., Botulinum toxins or SNARE mutants), even if they weren't used here.
  
  Following the reviewer’s suggestion, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017)” (520-523).
  
  (3) We encourage the authors to briefly discuss whether Dynamin might contribute to periactive zone structure beyond its role in membrane fission. Loss-of-function data could be particularly informative in future work.
  
  We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”
  
  (4) Clarify the interpretation of increased endocytic protein levels upon chronic silencing - are these interpreted as homeostatic responses or experimental variability?
  
  We suggest that these changes might include homeostatic adaptations. We note that this increase is of the same magnitude as the increase in active zone proteins following a similar pharmacological manipulation on lines 405-406, where we state that “a mechanism for this effect might be a homeostatic response (Wen and Turrigiano, 2024) similar in magnitude to the increase in active zone protein levels following activity blockade (Held et al., 2020).”
  
  (5) The Discussion could be strengthened by sketching out more concrete experimental approaches to test candidate mechanisms (e.g., roles for actin, lipids, adhesion molecules) in organizing periactive zones.
  
  The potential roles of the cell adhesion molecules (lines 430-440), cytoskeleton and lipids (442-452) are addressed in the discussion. Furthermore, following the reviewer’s suggestion, we have added the following statement (lines 541-543): “This work builds a foundation to assess alternative mechanisms and models of periactive zone assembly, including roles of the cytoskeleton, lipids, adhesion molecules, and intrinsic endocytic protein interactions”. We hope that the reviewer agrees that the discussion of our paper is not the right format to provide a concrete experimental plan for future work. In our view, the discussion should put the findings of our experiments in the context of the field.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) At a spine synapse, the endocytic zone is estimated to be between 100-200nm from the active zone. The focus of the author's analysis is largely outside of this region (0-150nm), raising the question of whether the area studied may be outside of the area affected by the manipulations made. While STED systems claim ~80 nm resolution, this is rarely achieved in practice, and the authors do not report the effective resolution of their system. Reporting the resolution achieved would address this issue. In addition, super-resolution imaging does not appear to have been used at the Drosophila NMJ. The authors should clarify whether resolution limitations influenced the choice of analysis region and whether their imaging approach is sufficient to detect changes in the endocytic zone.
  
  We believe that it is unlikely that the relevant signals were missed. First, in mouse synapses, most signal corresponding to endocytic proteins was detected inside the selected region of interest. Our rationale to select the area was based on the fact that expanding the region analyzed would have reduced the sensitivity of our approach, as averaging over a larger area would dilute the signal. The resolution of our microscopy should not be a limitation either. In our previous work, we demonstrated that STED microscopy allows discriminating between the N- and the C-terminal termini of the presynaptic scaffold Bassoon, which are positioned only a few tens of nanometers apart [4]. This establishes that we can resolve differences at tens of nanometers in biological context, which is more relevant than the resolution measured with fluorescent beads (which we have repeatedly assessed to be ~80 nm laterally). Subsequently, we have also been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart [4-12]. Given that the periactive zone spans over a larger area than the distances that we can resolve experimentally in the examples above, we are confident that our measurements are sensitive enough to detect changes in this area.
  
  Second, for Drosophila NMJs, the choice for the region of interest and the overall analysis was done following a workflow validated in our previous work [13]. This method analyzes both immediately adjacent and more distant regions from the active zone, and does not exclude any region based on distance from the active zone as described on lines 222-224: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023).” In our previous study, we analyzed the distribution of periactive zone proteins at rest with STED microscopy and with Airyscan confocal microscopy. The resolution provided by Airyscan is reported to be ~175 nm in XY and ~400 nm in Z, which is sufficient to assess localization to the periactive zone compartment imaging methods and is not inferior to imaging methods previously used to report changes in the distribution of endocytic proteins; for examples, see [1,2]. In the revised manuscript, we have added new data measuring the levels and distribution of EndoA and Dap160 using STED microscopy (Figure 3 – figure supplement 1). The results acquired with STED microscopy and with Airyscan confocal microscopy are consistent with one another.
  
  Overall, the accuracy of the imaging methods and analyses used in this study are sufficient to assess periactive zone structure given its size and organization.
  
  (2) Interestingly, in a number of cases, the authors observe significant differences in endocytic markers (Figure 1q, 4k, 6k, 6r). However, little is made of these differences. The authors should provide more discussion of these changes and how they make sense of them alongside their claims of a lack of effect from their manipulations.
  
  The reviewer raises a good point. We interpret these changes in two different ways. First, we suggest that changes observed in response to block of action potentials or disassembly of the active zone might be homeostatic. This is addressed on lines 135-137. Second, we discuss that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus. Several active zone proteins interact with the actin cytoskeleton. One of them is Liprin-α. This interaction may explain the decrease in the level of Amphiphysin and AP-180 at the periactive zone in Liprin-α null neurons. This is addressed on lines 444-449. We hope that the reviewer agrees that overall, we should focus on the main conclusion that deployment of endocytic proteins persists over a number of manipulations and synapse types.
  
  (3) The graphs in Figure 1c and 1g, 3g, 4c, 4e, 6c, and 6g do not appear to be identical. If the solid line represents the mean and the lighter color represents the distribution of these data, these data appear to be different from one another. It is surprising that these differences are not significant. What statistical tests were used to determine whether the differences in these graphs are not significant? Is the issue that a relatively now number of synapses were examined (30-60)? Did the authors conduct a power analysis?
  
  We apologize if the display of our data and analyses was not clear. We do not perform statistical analyses on the line profiles. Instead, we perform it on two values that are extracted from line profiles. These values are (1) the distance between the peak intensity values of the protein of interest and the marker and (2) the peak intensity values. For example, in Figure 1, distances are quantified and statistically analyzed in panel j, and the peak levels are quantified and statistically analyzed in panel k. We have clarified this in the legend of current Figures 1, 4, 5, and 7.
  
  (4) The authors clearly state that their experiments address the role of evoked activity in endocytic zone positioning, but they do not examine whether spontaneous vesicle fusion might play a role. Given the availability of Drosophila mutants that decrease (Doc2, Dunc-13) or increase (syt1) spontaneous release, this is a notable omission. Ideally, these mutants should be examined. And at a minimum, the authors should discuss whether spontaneous release could contribute to endocytic zone organization.
  
  We agree with the reviewer that spontaneous fusion of synaptic vesicles may contribute to periactive zone organization. Many of the genetic manipulations that we used in mouse neurons result in a significant decrease in spontaneous release. This includes Ca<sub>V</sub>2 triple knockouts with a ~60% decrease in spontaneous fusion [10], RIM+ELKS quadruple knockouts with a ~70% decrease in spontaneous fusion [9] and Liprin-α quadruple knockouts with a ~50% decrease in spontaneous fusion [7]. We cannot rule out that the spontaneous release that is left is sufficient to mediate assembly functions. The conclusive way to address this possibility is using a manipulation that ablates spontaneous release without altering other pathways. However, to our knowledge, this is not available. The manipulations suggested by the reviewer might suffer from similar limitations, as they would change the frequency of spontaneous release without fully ablating it, and they would also affect evoked release. We have included a limitations section in the discussion where we address this (lines 514-523), specifically stating “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited. While many of the manipulations used here, including Ca<sub>V</sub>2 knockout (Held et al., 2020), RIM+ELKS knockout (Tan et al., 2022; Wang et al., 2016) and Liprin-α knockout (Emperador-Melero et al., 2024) in hippocampal neurons, and TeNT expression in fly NMJs (Sweeney et al.,1995) , result in 50% to 70% decreased spontaneous release rates, it is possible that the remaining spontaneous release supports periactive zone assembly. Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” We hope that the reviewer agrees that assessing these mutants should be a topic of future studies, given that we already test many mutants in the paper.
  
  (5) In Figures 1 and 6, the authors assess presynaptic protein localization in cultured neurons, but it is unclear whether these are synaptic sites. Many presynaptic proteins traffic together and can accumulate at sites lacking postsynaptic specializations. The authors should validate that the observed spatial organization occurs at bona fide synapses, ideally by co-labeling with postsynaptic markers as done in Figure 4. If methods like these were used, providing more details on how synapses were identified and selected would be useful to the reader.
  
  While we understand the reviewer’s point, we are confident that the structures analyzed are bona fide synapses for three reasons, as we have established before across many papers [4-8,10-12,17].
  
  The diameter of the structures detected using the synaptic vesicle marker Synaptophysin aligns much more closely with the size of the large vesicle clusters found at presynaptic terminals than with that of a few transport vesicles.
  
  In side-view synapses, the bar-like distribution of the active zone marker (Bassoon or Munc13-1) at one edge of the vesicle cloud indicates that active zone proteins are organized at one edge of the vesicle cluster—consistent with the architecture of synapses.
  
  Synaptophysin is one of our key markers for detecting synapses. In our cultures, most of the Synaptophysin signal colocalizes with postsynaptic markers (either PSD-95 or Gephyrin), as we have established across many studies [4,7-12]. This indicates that the markers used here are sufficient to select synapses. Furthermore, the frequency at which synapses were identified using an active zone marker as the second marker was similar to that observed when using a postsynaptic marker, suggesting that we were not randomly including unrelated structures.
  
  (6) Many of the images, particularly of the Drosophila NMJ, are of low quality and are shown in very small images. In addition, the quality of the images throughout the paper makes it difficult to assess the author's analysis and results. The authors should provide larger, higher-quality images that show examples of the means for each of the examples shown. This is an issue for most of the figures, but is particularly prominent in the dNMJ. A minor additional point is that the authors should be clear whether the dNMJ images are collected at super-resolution or using a conventional microscope.
  
  We believe that the quality of our images is sufficient for the assessments made for the following reasons:
  
  These images were acquired with enough spatial resolution to assess levels at the PAZ as discussed in response to this reviewer’s first comment. In our previous work, we used images acquired at the same resolution and presented in the same manner for both mouse hippocampal synapses [6,7] and Drosophila NMJs [13,18]. In those previous studies, we drew conclusions at a similar level of detail as in the current study.
  
  In our view, our representative images are not inferior in quality to other papers in the field addressing similar questions [1,2,19,20].
  
  We have selected sample images based on the quantified mean values per condition. Hence, we strived to select panels that are objectively representative regarding the quantified parameters.
  
  We have specified microscopy methods in the figure legends. Specifically, for Drosophila NMJs, we used Airyscan confocal microscopy and STED microscopy. For each experiment, it is now stated which microscopy method was used in the corresponding legend.
  
  References:
  
  (1) Winther, Å. M. E. et al. An Endocytic Scaffolding Protein together with Synapsin Regulates Synaptic Vesicle Clustering in the Drosophila Neuromuscular Junction. J Neurosci 35, 14756–14770 (2015).
  
  (2) Winther, Å. M. E. et al. The dynamin-binding domains of Dap160/intersectin affect bulk membrane retrieval in synapses. J Cell Sci 126, 1021–1031 (2013).
  
  (3) Bai, J., Hu, Z., Dittman, J. S., Pym, E. C. G. & Kaplan, J. M. Endophilin functions as a membrane-bending molecule and is delivered to endocytic zones by exocytosis. Cell 143, 430–441 (2010).
  
  (4) Wong, M. Y. et al. Liprin-alpha3 controls vesicle docking and exocytosis at the active zone of hippocampal synapses. Proc Natl Acad Sci U S A 115, 2234–2239 (2018).
  
  (5) Emperador-Melero, J., de Nola, G. & Kaeser, P. S. Intact synapse structure and function after combined knockout of PTPδ, PTPσ, and LAR. Elife 10, (2021).
  
  (6) Emperador-Melero, J. et al. PKC-phosphorylation of Liprin-α3 triggers phase separation and controls presynaptic active zone structure. Nat Commun 12, 3057 (2021).
  
  (7) Emperador-Melero, J. et al. Distinct active zone protein machineries mediate Ca2+ channel clustering and vesicle priming at hippocampal synapses. Nature Neuroscience 2024 1–15 (2024) doi:10.1038/s41593-024-01720-5.
  
  (8) Tan, C., Wang, S. S. H., de Nola, G. & Kaeser, P. S. Rebuilding essential active zone functions within a synapse. Neuron 110, 1498-1515.e8 (2022).
  
  (9) Wang, S. S. H. et al. Fusion Competent Synaptic Vesicles Persist upon Active Zone Disruption and Loss of Vesicle Docking. Neuron 91, 777–791 (2016).
  
  (10) Held, R. G. et al. Synapse and Active Zone Assembly in the Absence of Presynaptic Ca(2+) Channels and Ca(2+) Entry. Neuron 107, 667-683.e9 (2020).
  
  (11) Chin, M. & Kaeser, P. S. The intracellular C-terminus confers compartment-specific targeting of voltage-gated calcium channels. Cell Rep 43, 114428 (2024).
  
  (12) Nyitrai, H., Wang, S. S. H. & Kaeser, P. S. ELKS1 Captures Rab6-Marked Vesicular Cargo in Presynaptic Nerve Terminals. Cell Rep 31, 107712 (2020).
  
  (13) Del Signore, S. J., Mitzner, M. G., Silveira, A. M., Fai, T. G. & Rodal, A. A. An approach for quantitative mapping of synaptic periactive zone architecture and organization. Mol Biol Cell 34, (2023).
  
  (14) Sweeney, S. T., Broadie, K., Keane, J., Niemann, H. & O’Kane, C. J. Targeted expression of tetanus toxin light chain in Drosophila specifically eliminates synaptic transmission and causes behavioral defects. Neuron 14, 341–351 (1995).
  
  (15) Kaeser, P. S. & Regehr, W. G. Molecular mechanisms for synchronous, asynchronous, and spontaneous neurotransmitter release. Annu Rev Physiol 76, 333–363 (2014).
  
  (16) Santos, T. C., Wierda, K., Broeke, J. H., Toonen, R. F. & Verhage, M. Early Golgi Abnormalities and Neurodegeneration upon Loss of Presynaptic Proteins Munc18-1, Syntaxin-1, or SNAP-25. Journal of Neuroscience 37, 4525–4539 (2017).
  
  (17) de Jong, A. P. H. et al. RIM C2B Domains Target Presynaptic Active Zone Functions to PIP2-Containing Membranes. Neuron 98, 335-349.e7 (2018).
  
  (18) Del Signore, S. J. et al. An autoinhibitory clamp of actin assembly constrains and directs synaptic endocytosis. Elife 10, (2021).
  
  (19) Imoto, Y. et al. Dynamin 1xA interacts with Endophilin A1 via its spliced long C-terminus for ultrafast endocytosis. EMBO Journal https://doi.org/10.1038/S44318-024-00145-X
  
  (20) Imoto, Y. et al. Dynamin is primed at endocytic sites for ultrafast endocytosis. Neuron 110, 2815-2835.e13 (2022).
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.23.650151v2
www.biorxiv.org www.biorxiv.org

Genome Restructuring around Innate Immune Genes in Monocytes in Alcohol-associated Hepatitis

1
1. Public_Reviews 15 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors investigate the relationship between 3D chromatin architecture and innate immune gene regulation in monocytes from patients with alcohol-associated hepatitis (AH). Using Hi-C technology, they attempt to identify structural changes in the genome that correlate with altered gene expression. Their central claim is that genome restructuring contributes to the hyper-inflammatory phenotype associated with AH.
  
  Strengths:
  
  (1) The manuscript employs Hi-C technology, which, in principle, is a powerful approach for studying genome organization.
  
  (2) The focus on disease-relevant genes, particularly innate immune loci, provides a contextually important angle for understanding AH.
  
  Weaknesses:
  
  (1) Sample Size: The study relies on an exceptionally small cohort (4 AH patients and 4 healthy controls), rendering the results statistically underpowered and highly susceptible to variability.
  
  (2) Hi-C Resolution unpaired to RNA seq: The data are presented at a resolution of 100kb, which is insufficient to uncover meaningful chromatin interactions at the level of individual genes. This data is unpaired.
  
  (3) Functional Validation: The manuscript lacks experiments to directly link changes in chromatin architecture with gene expression or monocyte function, leaving the claims speculative.
  
  (4) Data Integration: The lack of Hi-C with ATAC and RNA-seq data handicaps the analysis and really makes it superficial. In short, it does not convincingly demonstrate a functional relationship.
  
  (5) Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.
  
  Appraisal of the Aims and Results:
  
  The manuscript sets out to establish a connection between chromatin architecture and AH pathology. However, the study fails to achieve its stated aims due to inadequate methods and insufficient data. The conclusions drawn from the Hi-C analyses alone are poorly supported, and the lack of functional validation undermines the credibility of the proposed mechanisms. Overall, the results do not provide compelling evidence to substantiate the authors' claims.
  
  Impact on the Field and Utility to the Community:
  
  The work, in its current form, is unlikely to have a meaningful impact on the field. The limited scope, methodological shortcomings, and lack of robust data significantly diminish its potential utility. Without addressing these critical gaps, the study does not offer new insights into the role of genome architecture in AH or provide useful methodologies or datasets for the community.
  
  Additional Context:
  
  The manuscript would benefit from a more comprehensive analysis of potential mechanisms underlying the observed changes, including the interplay between chromatin architecture and epigenetic modifications. Furthermore, longitudinal studies or therapeutic interventions could provide insights into the dynamic aspects of genome restructuring in AH. These considerations are entirely absent from the current study.
  
  Conclusion:
  
  The manuscript does not achieve its stated goals and does not present sufficient evidence to support its conclusions. The limitations in sample size, resolution, and experimental rigor severely hinder its contribution to the field. Addressing these fundamental flaws will be essential for the work to be considered a meaningful addition to the literature.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  Dr. Adam Kim and collaborators study the changes in chromatin structure in monocytes obtained from alcohol-associated hepatitis (AH) when compared to healthy controls (HC). Through the usage of high throughput chromatin conformation capture technology (Hi-C), they collected data on contact frequencies between both contiguous and distal DNA windows (100 kB each); mainly within the same chromosome. From the analyses of those data in the two cohorts under analysis, authors describe frequent pairs of regions subject to significant changes in contact frequency across cohorts. Their accumulation onto specific regions of the genome -referred to as hotspots- motivated authors to narrow down their analyses to these disease-associated regions, in many of which, authors claim, a number of key innate immune genes can be found. Ultimately, the authors try to draw a link between the changes observed in chromatin architecture in some of these hotspots and the differential co-expression of the genes lying within those regions, as ascertained in previous single-cell transcriptomic analyses.
  
  Strengths:
  
  The main strength of this paper lies in the generation of Hi-C data from patients, a valuable asset that, as the authors emphasize, offers critical insights into the role of chromatin architecture dysregulation in the pathogenesis of alcohol-associated hepatitis (AH). If confirmed, the reported findings have the potential to highlight an important, yet overlooked, aspect of cellular dysregulation-chromatin conformation changes - not only in AH but potentially in other immune-related conditions with a component of pathological inflammation.
  
  Weaknesses:
  
  In what I regard as the two most important weaknesses of the work, I feel that they are more methodological than conceptual. The first of these issues concerns the perhaps insufficient level of description provided on the definition of some key types of genomic regions, such as topologically associated domains, DNA hotspots, or even DNA loci showing significant changes in contact frequency between AH and HC. In spite of the importance of these concepts in the paper, no operational, explicit description of how are they defined, from a statistical point of view, is provided in the current version of the manuscript.
  
  Without these definitions, some of the claims that authors make in their work become hard to sustain. Some examples are the claim that randomizing samples does not lead to significant differences between cohorts; the claim that most of the changes in contact frequency happen locally; or the claim that most changes do not alter the structure of TADs, but appear either within, or between TADs. In my viewpoint, specific descriptions and implementation of proper tests to check these hypotheses and back up the mentioned specific claims, along with the inclusion of explicit results on these matters, would contribute very significantly to strengthening the overall message of the paper.
  
  The second notable weakness of the study pertains to the characterization of the changes observed around immune genes in relation to genome-wide expectations. Although the authors suggest that certain hotspots contain a high number of immune-related genes, no enrichment analysis is provided to verify whether these regions indeed harbor a higher concentration of such genes compared to other genomic areas. It would be important for readers to be promptly informed if no such enrichment is observed, for in that case, the presence of some immune genes within these hotspots would carry more limited implications.
  
  Additionally, the criteria used to define a hotspot are not clearly outlined, making it difficult to assess whether the changes in contact frequencies around the immune genes highlighted in figures 5-8 are truly more pronounced than what would be expected genome-wide.
  
  Reviewer #3 (Public review):
  
  In this manuscript, the authors use HiC to study the 3D genome of CD14+ CD16+ monocytes from the blood of healthy and those from patients with Alcohol-associated Hepatitis.
  
  Overall, the authors perform a cursory analysis of the HiC data and conclude that there are a large number of changes in 3D genome architecture between healthy and AH patient monocytes. They highlight some specific examples that are linked to changes in gene expression. The analysis is of such a preliminary nature that I would usually expect to see the data from all figures in just one or two figures.
  
  In addition, I have a number of concerns regarding the experimental design and the depth of the analyses performed that I think must be addressed.
  
  (1) There is a myriad of literature that describes the existence of cell type-specific 3D genome architecture. In this manuscript, there is an assumption by the authors that the CD14+ CD16+ monocytes represent the same population from both healthy and diseased patients. Therefore, the authors conclude that the differences they see in the HiC data are due to disease-related changes in the equivalent cell types. However, I am concerned that the AH patient monocytes may have differentiated due to their environment so that they are in fact akin to a different cell type and the 3D genome changes they describe reflect this. This is supported by published articles for example: Dhanda et al., Intermediate Monocytes in Acute Alcoholic Hepatitis Are Functionally Activated and Induce IL-17 Expression in CD4+ T Cells. J Immunol (2019) 203 (12): 3190-3198, in which they show an increased frequency of CD14+ CD16+ intermediate monocytes in AH patients that are functionally distinct.
  
  I suggest that if the authors would like to study the specific effects of AH on 3D genome architecture then they should carefully FACsort the equivalent monocyte populations from the healthy and AH patients.
  
  (2) The analysis of the HiC data is quite preliminary. In the 3D genome field, it is usual to report the different scales of genome architecture, for example, compartments, topologically associated domains (TADs), and loops. I think that reporting this information and how it changes in AH patients in the appropriate cell types would be of great interest to the field.
  
  We thank the reviewers for their careful and thorough examination of our manuscript. We agree with all of their comments regarding the limitations of the study. Many of the criticisms focus on the small sample size of our study (n=4 for healthy controls and disease patients) in both Hi-C and single-cell RNA-seq experiments, and that these experiments are unpaired, or in other words, PBMCs came from different patients for each experiment.
  
  Unfortunately, these experiments are fairly complicated to perform, requiring patient cells and very expensive deep sequencing. We are not currently in a position to be able to easily or cost effectively increase sample size. In the case of Hi-C, we still believe our study to be of value as Hi-C is not a commonly used technique to study disease effects on chromatin, and very few studies have employed a large enough sample size to perform statistical comparisons. Additionally, to analyze the data at a higher resolution would require deeper sequencing, and unfortunately we do not have the resources to sequence these libraries deeper. Regarding the single-cell RNA-seq data, this dataset was generated for an earlier study [1] focusing on gene expression responses to LPS, and we were unable to get PBMCs from exactly the same patients to perform the Hi-C study.
  
  We disagree that our study has limited scientific value. Our study is the first to use Hi-C to show that the 3D genome architecture of primary monocytes is changed in a disease context. The only other study to follow a similar approach performed Hi-C in monocytes from 2 healthy and 2 Systemic lupus erythematosus (SLE) patients, and in their study the data from both patients were combined prior to comparison. No statistics were performed and their conclusion was no differences in genome architecture due to disease. They did find differences between primary monocytes and the THP1 monocytic cell line, but this lacked statistical analysis. Their conclusion was that inflammatory disease may not lead to genome wide changes in architecture. Our study, though a very different disease than SLE, shows statistically significant differences between AH and healthy controls. We believe our study lays the groundwork for how Hi-C can be used to study genome architecture in human disease, and the possible downstream effects.
  
  Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.
  
  This is an interesting suggestion. This dataset only contains 4 AH patients, which we have included basic clinical data in Supplemental Table 1, including Age, HCA1c, Bilirubin, AST, ALT, Creatinine, Albumin, and MELD score. 3/4 of these patients are severe AH while 1 is moderate (AH2). Despite one patient being moderate, all four AH patients had similar correlations with each other, suggesting these disease specific differences we observed are not indicative of severity. More patient samples are needed to determine if genome architecture changes throughout disease progression. We have added this important discussion to the manuscript (page 12, lines 5-14).
  
  Recommendations for the authors:
  
  Reviewer #2 (Recommendations for the authors):
  
  The criteria used to determine which pairs of regions exhibit significant differences in contact frequency between alcohol-associated hepatitis (AH) and healthy controls (HC) are not disclosed. It would be beneficial for the authors to provide this information, including details such as the number of pairs tested, the nature of the statistical tests conducted, the method of multiple testing correction applied, as well as the significance thresholds used, and the number of loci-pairs below these thresholds for each chromosome. This information would greatly enhance the reader's understanding of the relevance of the reported findings.
  
  Thank you for this comment, though we are not sure we totally understand. All of our statistics were performed using multiHiCcompare [2], where we input all 8 datasets (.hic files from Juicer), then measured statistical differences between defined groups (HC vs AH). For our randomization studies, we randomized the group comparisons, so each group contained a mix of HC and AH.
  
  Second, a formal statistical definition of what constitutes a hotspot would be valuable for clarity.
  
  Thank you for this suggestion. Initially, hotspots were defined as just regions of the genome with a high frequency of very significant differential contacts. We have defined a more formal definition of “hotspot” based on similar criteria. A hotspot is defined by both adjusted p value and frequency of locations. First, we filtered all pair-wise chromosomal interactions by a very, very stringent padj < 0.0000001 to focus on only the most changed coordinates (Supplemental Table 4). Then we looked for regions of the genome with a high frequency of these differential locations. Borders for each hotspot were determined more liberally by looking at the full list of differential spots (padj < 0.05). Then we used code to list genes within each interacting region. We have added these important details to the Methods (page 14, lines 11-14).
  
  Third, a clear definition of the criteria used to identify different topologically associated domains (if these were indeed defined in the data and/or utilized in the analyses) would also be a helpful addition.
  
  Thank you for this suggestion, we did not identify TADs or really utilize TADs in any of these analyses.
  
  Likewise, several statements throughout the paper lack support from specific analyses, although it should be feasible to implement such analyses (or at least present them if they have already been conducted) to substantiate these claims:
  
  If randomizing samples does not result in significant differences between (randomized) cohorts, it would be beneficial to provide insights into the number of loci pairs that exhibit differences in frequency when using both the actual and randomized cohorts.
  
  Thank you for asking this question, as this is an important point. Using multiHiCcompare, if we compare WT (n=4) to AH (n=4), we get the results in the figures and supplementary data but if we randomize Group 1 (WT, WT, AH, AH) vs Group 2 (WT, WT, AH, AH), we get almost 0 significant changes in contact frequency. To show this more robustly, we performed 5 randomized comparisons and found far fewer changes in contact frequency between groups. This shows that these changes in contact frequency caused by disease are not random, but rather due to our real difference in AH. This point has been added to the Results (page 6, lines 15-17), and Methods (page 14, lines 16-21)
  
  If most changes in contact frequency occur locally, it would be useful to visualize the relationship between effect sizes and/or significance levels for the observed differences in frequency in relation to the distance between the involved loci. Additionally, comparing these results to the average baseline contact intensities as a function of distance would be informative. This comparison could help determine whether the distance decay in effect size/significance for the differences between AH and HC is faster or slower than the decay rates for baseline contact frequencies.
  
  This is a good suggestion. In our initial analysis, we made a number of figures relating chromosome positions, distance between loci, and statistics regarding the differential contact frequency. In the initial submission, we only showed Figure 3, which shows the logFC (log fold change) for the differential contact frequency by chromosomal position on both sides. To address this question, we have added a supplemental figure showing logFC as a function of the distance between two loci (new Supplemental Figure 3)
  
  Similarly, the assertion that most changes do not affect the structure of topologically associated domains (TADs) but occur either within or between TADs should be supported by specific testing; otherwise, or else, removed.
  
  Thank you, yes we have adjusted the language in the Discussion
  
  Furthermore, the authors should clarify whether differences in chromatin conformation are more pronounced around immune genes compared to genome-wide expectations. If this is not the case, it would be helpful to quantify the intensity of these differences around the highlighted genes in relation to the rest of the genome. To achieve this, I would suggest the following:
  
  Conduct enrichment analyses on the genes located within the most prominent hotspots to determine whether they are significantly enriched in immune genes (and, or, alternatively, in any other functional category).
  
  Estimate the average absolute fold change in contact frequency within all topologically associated domains (TADs) identified in the study. This would allow for the identification of immune gene-containing TADs highlighted in Figures 5-8, providing readers with a quantitative understanding of how anomalously different these genomic regions are with regards to the magnitude of its alterations in AH, compared to the rest of the genome.
  
  While some of the selected gene clusters appear to co-localize well with topologically associated domains (e.g., Figures 5A, 8A), others seemingly encompass either multiple TADs (Figure 6) or only portions of them (Figure 7). This should be clarified.
  
  Thank you, this is a great suggestion. In order to be as unbiased as possible, we took all genes present in the regions with the highest significant changes in genome (Supplemental Table 4) that we used to identify the hotspots. And you are correct, we do in fact see enrichment of genes involved in innate immune signaling. This has been added to Results (page 7, lines 19-25) and Figure 4.
  
  Finally, there are several minor issues concerning the figures that could be easily addressed to substantially enhance their readability:
  
  Font sizes in most figures should be increased, particularly for some axis labels and tick marks. This issue affects most figures; for instance, in Figure 4, it hinders the reader's ability to interpret the ranges of the data presented.
  
  Thank you, the figures have been adjusted
  
  Figures 5 to 8 (panels A and B) would benefit significantly from a more consistent format. Specifically, the gene cluster boxes should also be included in the right panels, and the gene locations should be displayed on the left in a uniform format across all figures (e.g., formatting Figures 7 and 8 to match the style of Figures 5 and 6).
  
  Figures 5 and 6 have a similar structure to each other because we were focusing on all of the genes in that chromosomal region. Figures 7 and 8 are different because we are focusing on how the region around a certain hotspot of interest changes.
  
  It is also important to note that the genes plotted in Figures 8C and 8D are not the same. Concerning these two panels, it would be valuable to clarify whether the data presented pertains exclusively to monocytes. If so, information regarding the number of cells analyzed and the number of donors from which they were drawn would also be beneficial.
  
  These figures are generated using scRNA-seq data. They represent all of the genes expressed in that region of the genome, in their chromosomal position. If a gene is not expressed in the scRNA-seq data, then it is not shown. I have debated with myself a lot on how to show gene expression in a region of the genome, but I think this is the clearest way to show this; including the genes that have no expression would make it more confusing. But yes, if you compare HC and AH, you see some differences in the list of genes. We have added more clarity to the figure legend for this figure.
  
  References
  
  (1) Kim, A., Bellar, A., McMullen, M. R., Li, X. & Nagy, L. E. Functionally Diverse Inflammatory Responses in Peripheral and Liver Monocytes in Alcohol-Associated Hepatitis. Hepatol Commun 4, 1459-1476 (2020). https://doi.org:10.1002/hep4.1563
  
  (2) Stansfield, J. C., Cresswell, K. G. & Dozmorov, M. G. multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments. Bioinformatics 35, 2916-2923 (2019). https://doi.org:10.1093/bioinformatics/btz048
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.08.07.607014v2
www.biorxiv.org www.biorxiv.org

Acute opioid responses are modulated by dynamic interactions of Oprm1 and Fgf12

1
1. Public_Reviews 15 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The study by Lemen et al. represents a comprehensive and unique analysis of gene networks in rat models of opioid use disorder, using multiple strains and both sexes. It provides a time-series analysis of Quantitative Trait Loci (QTLs) in response to morphine exposure.
  
  Strengths:
  
  A key finding is the identification of a previously unknown morphine-sensitive pathway involving Oprm1 and Fgf12, which activates a cascade through MAPK kinases in D1 medium spiny neurons (MSNs). Strengths include the large-scale, multi-strain, sex-inclusive design, the time-series QTL mapping provides dynamic insights, and the discovery of an Oprm1-Fgf12-MAPK signaling pathway in D1 MSNs, which is novel and relevant.
  
  Weaknesses:
  
  (1) The proposed involvement of Nav1.2 (SCN2A) as a downstream target of the Oprm1-Fgf12 pathway requires further analysis/evidence. Is Nav1.2 (SCN2A) expressed in D1 neurons?
  
  The authors mentioned that SCN8A (Nav1.6) was tested as a candidate mediator of Oprm1-Fgf12 loci and variation in locomotor activity. However, the proposed model supports SCN2A as a target rather than SCN8A. This is somewhat unexpected since SCN8A is highly abundant in MSN.
  
  Can the authors provide expression data for SCN2A, Oprm1, and Fgf12 in D1 vs. D2 MSNs?
  
  Author response image 1.
  
  We generated Author response image 1 to show both Scn2a and Scn8a are ubiquitously expressed in MSN and GABAergic neurons.
  
  (2) The authors should consider adding a reference to FGF12 in Schizophrenia (PMC8027596) in the Introduction.
  
  This is a relevant reference. We have cited it in the discussion section instead of introduction because we felt that is more relevant.
  
  (3) There is recent evidence supporting the druggability of other intracellular FGFs, such as FGF14 (PMC11696184) and FGF13 (PMC12259270), through their interactions with Nav channels. What are the implications of these findings for drug discovery in the context of the present study? Could FGF12 be considered a potential druggable therapeutic target for opioid use disorder (OUD)?
  
  The recent success in targeting FGF14 and FGF13 protein-protein interactions with sodium channels suggests that FGF12 could indeed be a druggable target for OUD. We have added a section to the Discussion exploring the potential for developing small-molecule modulators of the FGF12-Nav interface as a novel therapeutic strategy.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This highly novel and significant manuscript re-analyzes behavioral QTL data derived from morphine locomotor activity in the BXD recombinant inbred panel. The combination of interacting behavioral-pharmacology (morphine and naltrexone) time course data, high-resolution mouse genetic analyses, genetic analysis of gene expression (eQTLs), cross-species analysis with human gene expression and genetic data, and molecular modeling approaches with Bayesian network analysis produces new information on loci modulating morphine locomotor activity.
  
  Furthermore, the identification of time-wise epistatic interactions between the Oprm1 and Fgf12 loci is highly novel and points to methodological approaches for identifying other epistatic interactions using animal model genetic studies.
  
  Strengths:
  
  (1) Use of state-of-the art genetic tools for mapping behavioral phenotypes in mouse models.
  
  (2) Adequately powered analysis incorporating both sexes and time course analyses.
  
  (3) Detection of time and sex-dependent interactions of two QTL loci modulating morphine locomotor activity.
  
  (4) Identification of putative candidate genes by combined expression and behavioral genetic analyses.
  
  (5) Use of Bayesian analysis to model causal interactions between multiple genes and behavioral time points.
  
  Weaknesses:
  
  (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors.
  
  We have performed a thorough review of the manuscript and corrected typographical errors, including "ddactivates" and other compositional issues.
  
  (2) There are multiple examples of overstating the possible significance of results that should be corrected or at least directly pointed out as weaknesses in the Discussion. These include:
  
  (a) Assumption that the Oprm1 gene is the causal candidate gene for the major morphine locomotor Chr10 QTL at the early time epochs. Oprm1 is 400,000 bp away from the support interval of the Mor10a QTL locus, and there is no mention as to whether the Oprm1 mRNA eQTL overlaps with Mor10a.
  
  We have clarified this in the text. While Oprm1 is located proximal to the peak, its massive size and the presence of a strong mRNA cis-eQTL in the NAc and hippocampus that precisely overlaps with the Mor10a QTL support interval provide robust evidence for its candidacy. We have added this detail to the Results section.
  
  (b) Although the Bayesian analysis of possible complex interactions between Oprm1, Fgf12, other interacting genes, and behaviors is very innovative and produces testable hypotheses, a more straightforward mediation analysis of causal relationships between genotype, gene expression, and phenotype would have added strength to the arguments for the causal role of these individual genes.
  
  We agree that mediation analysis would be a valuable addition. We revised the Results section to acknowledge that while the Bayesian network provides a comprehensive causal hypothesis, future studies employing formal mediation analysis could further strengthen these individual gene-to-behavior links.
  
  (c) The GWAS data analysis for Oprm1 and Fgf12 is incomplete in not mentioning actual significance levels for Oprm1 and perhaps overstating the nominal significance findings for Fgf12.
  
  We have updated the manuscript to include the specific significance levels for the human GWAS findings related to Oprm1 and Fgf12. We have clarified that the OPRM1 variant rs1799971 reached genome-wide significance (OR = 1.046, p = 4.92 × 10<sup>-9</sup>). Furthermore, we have ensured that the findings for FGF12 are described as nominally significant to avoid any overstatement of the results. For example, we now specify that the top FGF12 SNP rs1553460 achieved nominal significance (OR = 1.015, p = 0.021). The Results and Discussion sections have been revised to reflect these precise statistical values.
  
  Appraisal:
  
  The authors largely succeeded in reaching goals with novel findings and methodology.
  
  Significance of Findings:
  
  This study will likely spur future direct experimental studies to test hypotheses generated by this complex analysis. Additionally, the broad methodological approach incorporating time course genetic analyses may encourage other studies to identify epistatic interactions in mouse genetic studies.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This is a clearly written paper that describes the reanalysis of data from a BXD study of the locomotor response to morphine and naloxone. The authors detect significant loci and an epistatic interaction between two of those loci. Single-cell data from outbred rats is used to investigate the interaction. The authors also use network methods and incorporate human data into their analysis.
  
  Strengths:
  
  One major strength of this work is the use of granular time-series data, enabling the identification of time-point-specific QTL. This allowed for the identification of an additional, distinct QTL (the Fgf12 locus) in this work compared to previously published analysis of these data, as well as the identification of an epistatic effect between Oprm1 (driving early stages of locomotor activation) and Fgf12 (driving later stages).
  
  Weaknesses:
  
  (1) What criteria were used to determine whether the epistatic interaction was significant? How many possible interactions were explored?
  
  By design we only tested for epistasis between the Oprm1 and the Fgf12 loci—a single test of a non-linear interaction. As such there is no correction for multiple tests and no need for permutation. In other words the “nominal” P value in this case is the only relevant P value. We have added this clarification in the Results and Methods.
  
  (2) Results are presented for males and females separately, but the decision to examine the two sexes separately was never explained or justified. Since it is not standard to perform GWAS broken down by sex, some initial explanation of this decision is needed. Perhaps the discussion could also discuss what (if anything) was learned as a result of the sex-specific analysis. In the end, was it useful?
  
  We chose to analyze sexes separately AND jointly due to significant sex differences and sex by strain interactions in locomotion data. This rationale has been added to the results section. We also discussed sex-specific results in the revision.
  
  (3) The confidence intervals for the results were not well described, although I do see them in one of the tables. The authors used a 1.5 support interval, but didn't offer any justification for this decision. Is that a 95% confidence interval? If not, should more consideration have been given to genes outside that interval? For some of the QTLs that are not the focus of this paper, the confidence intervals were very large (>10 Mb). Is that typical for BXDs?
  
  The 1.5 LOD support interval is a standard metric for most QTL mapping studies, and does correspond approximately to a 95% confidence or support interval. Large intervals are common in BXD studies when effect sizes are moderate or recombination density is lower in specific regions. We have clarified the use of the 1.5 LOD interval in the Results section.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  In the vast majority of the figures, the text is too small to read.
  
  We have adjusted the font size in most of the figures.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors. Examples of these include:
  
  (a) Figure 2E&F lacks identification of Oprm1 as the gene for cis-eQTL studies.
  
  (b) Figure 2H is fairly uninterpretable given the small font sizes. It should be excluded, put as a supplemental figure, or reconfigured to highlight the most important findings in a more legible manner.
  
  (c) Figure 4b: columns in the table need to be identified by a header row.
  
  We thank the reviewer for these comments and have addressed them in the revised version.
  
  Oprm1 is now labeled in Figure 2E and 2F, Figure 2G and 2H is now moved to the Supplementary material. And a header row is added to the table in Figure 4b.
  
  Reviewer #3 (Recommendations for the authors):
  
  Abstract
  
  (1) For the abstract, it might be simpler to name the alleles as "the C57BL/6J allele", etc., since B allele will confuse people unfamiliar with mouse nomenclature.
  
  It is critical to not confound the organism known as C57BL/6J with the genotype, allele, or haplotype that a mouse happens to inherit. Diverse types of mice inherit reference alleles but they may be only very distantly related the C57BL/6J strain. And even the C57BL/6J strain is a moving target that accumulates mutations that are not even consider reference. For example the mutation in Gabra2 of C57BL/6J is a de novo mutation that is not carried by many of the BXD strains since this mutation happened in JAX foundation stock after the BXDs were first established by Dr. Ben Taylor in the 1970s.
  
  The convention is to refer to mouse strains by one string and RRID, the abbreviation of that strain by a common code (often B6), and the abbreviation of the allele, genotype, or haplotype by the italic letter B. This has been the recommendation of the Mouse Nomenclature Committee (on which one of the authors has been a member) for well over 50 years.
  
  (2) I wondered if "also associated with a high B allele" could be reworded somehow; I had to re-read that sentence several times.
  
  This sentence has been reworded for clarity.
  
  (3) Parts of the abstract are written in the present tense, but then it switches to past ("we generated" but then "a Bayesian network analysis supports...").
  
  We have thoroughly revised the abstract. Following standard scientific writing conventions, we now utilize the past tense to describe the specific experimental actions and results of this study. We have maintained the present tense for established biological facts and the broader significance of the findings.
  
  (4) While the -log(p) values are all impressive, the abstract should indicate what threshold is used for genome-wide significance and how that threshold was obtained.
  
  We have added the significance threshold to the Abstract.
  
  (5) Do the details of the MAP kinase cascade need to be explained in the abstract? It feels like a lot of detail for an abstract and represents one of the most speculative aspects of the paper. Maybe just say you identified a possible network, but save the details for the main paper.
  
  This is a valid suggestion. We removed the specific MAP kinase from the abstract.
  
  Introduction
  
  (1) You could add a sentence explaining why using an LMM (GEMMA) was an improvement over the prior analysis.
  
  We have added a sentence explaining that GEMMA improves mapping power and better controls for population structure compared to previous methods.
  
  (2) When mentioning Philips 2010, you could indicate that it identified Oprm1. This might be easier than "In addition to Oprm1" which confused me at first because it had not been mentioned before, so 'in addition' was jarring.
  
  We have revised the text to state that Philip et al. (2010) originally identified the Oprm1 locus.
  
  Results
  
  (1) There are additional instances of the tense switching between past and present in the results section.
  
  We have standardized the tenses in the Results section.
  
  (2) "Ostn, Uts2d, Ccdc50, Gm10823, Fgf12, and Mb21d2" - before giving arguments for fgf12, can you clarify if there are coding variants or eQTLs for any of these genes?
  
  We have added a statement clarifying the coding variants for other genes in this interval and highlighting their eQTL status.
  
  (3) "a total number of 4,495 high-quality nuclei transcriptomes". Consider removing the word "number".
  
  Removed.
  
  (4) "approximately 6 males and 6 females" - could you point the reader to a supplementary table that has the exact number of individuals at the end of this sentence?
  
  The exact number of mice used in each of the BXD strains is not recorded in the original publication by Philip et al., with only mean and max was given. We have clarified that 6 is the average.
  
  (5) "computed using a subset" - please explain how you selected this subset (I assumed LD pruning, but why not be explicit. How many SNPs/markers were there originally, and how many are retained?
  
  We have specified that the subset of markers was selected via LD pruning to represent the genetic diversity of the BXDs.
  
  (6) A few words about how the significant threshold was obtained (permutation?) are needed.
  
  We have clarified that the significance threshold was obtained through 1,000 permutations.
  
  (7) Some of the GWAS results are presented for males and females separately (as well as combined). This is not typical, and so maybe a sentence explaining why the authors thought there might be sex specific GWAS results would be warranted.
  
  The rationale for sex-specific analysis is provided in the results section (significant sex difference and sex by strain interaction)
  
  (8) The correlation between the sexes of 0.68 could be evidence that there are sex-specific genetic effects, but could it also just be due to increased noise as you reduce sample size? What is the confidence interval for that number? Does it include 1? Or 0? If you randomly split the dataset, rather than splitting on the basis of sex, would you obtain higher correlations? The idea of sex differences is interesting, but a bit more work is needed to clarify these concerns.
  
  The correlation of 0.68 (95% CI: 0.52–0.79) significantly excludes both 0 and 1. The drop from r = ~0.86 at earlier intervals suggests a biological shift rather than noise due to sample size, as n remains constant (n = ~ 6 /sex/strain) across all time points. This divergence is driven by sex-specific genetic modifiers, such as the Fgf12 locus, which is more than twice as strong in females (LOD 10.6) as in males (LOD 4.3). We have addressed this in the revision.
  
  (9) Maybe I missed it, but how did you determine the threshold for significance for the epistatic interaction? Could you also clearly indicate how many possible cases of epistasis were examined/considered, since that dictates the correction for multiple testing.
  
  We only tested the interaction between the Fgf12 and the Oprm loci.
  
  (10) "To further examine whether Oprm1 and Fgf12 were co-expressed in the same cells of the NAc," can you first give an indication as to why you looked in NAc versus other brain areas you might have considered?
  
  We have added a sentence explaining that the NAc was chosen due to its central role in opioid reward and the observed strain differences in dopamine release in this region.
  
  (11) "...from every cell type conveyed a weak but significant positive correlation (r = 0.08, p = 1.8e-8) between the expression of Oprm1 and Fgf12 (Figure 7e). When we performed Pearson's correlation analysis within each individual cell cluster, only D1-MSN-3 had a significant positive correlation (r = 0.35, p = 6.1e-8, Figure 7f). In contrast, D1-MSN-2 had a significantly weak negative correlation (r = -0.12, p = 0.02, Figure 7g)." Can you explain why these correlations are relevant? What hypothesis are you testing?
  
  We have clarified that these correlations were used to test the hypothesis that Oprm1 and Fgf12 are co-expressed and potentially co-regulated within the same neuronal subtype to support their epistatic interaction.
  
  (12) "After the morphine locomotion tests were complete," can you give a specific timepoint? Like, was it exactly 180 minutes after the morphine injection?
  
  We have specified that naloxone was injected exactly 180 minutes after the morphine injection.
  
  (13) I appreciate the desire to relate the results of this paper to human GWAS results; however, I don't feel there is much worth discussing beyond the Oprm1 finding. Therefore, I would suggest removing this from the results section and instead just making it a discussion topic. The results presented are clearly the weakest part of this paper, and I personally think it is a shame to end the results section with something that is not very informative. But I suspect the authors may wish to retain this section, and I leave that decision to them and the editor.
  
  We have retained this section but moved some of the more speculative human data discussion to the Discussion section as suggested.
  
  Discussion
  
  (1) Typo "deactivates".
  
  Corrected to "activates".
  
  (2) The last sentence in the first paragraph again discusses the comparison to humans; I would remove this.
  
  That sentence is condensed.
  
  (3) "These data indicate that Oprm1 is a strong candidate gene for the Chr 10 locus associated with morphine-induced locomotion response." I would remind them of the eQTL for Oprm1 since this is a key piece of evidence supporting this gene as a candidate.
  
  We have added a reminder of the overlapping mRNA cis-eQTL for Oprm1.
  
  (4) "It is likely that differences in morphine-induced dopamine release are involved in the highly variable locomotor responses to morphine across the BXD family." I agree this might be true, but since you have no evidence to support this claim, is it worth mentioning at all?
  
  We have rephrased this as a hypothesis or cited relevant literature supporting this link in parental strains.
  
  (5) Could you include a sentence or two about why Philip 2010 didn't find Fgf12? Lack of markers? The difference between an LM and an LMM?
  
  We have added an explanation that the use of a high-density WGS-based marker set and the LMM (GEMMA) allowed for the detection of this novel locus that was previously missed.
  
  (6) Section titled "Cell-type specific gene expression in NAc". While this is interesting, you might also want to remind the reader that epistatic interactions do not necessarily require the genes to be expressed in the same cell or for their gene products to physically interact.
  
  We have added this caveat to the Discussion.
  
  (7) I think the Bayesian network section is not very strong. For example, they did not compare the results for their two chosen genes to the results they might have obtained if they had chosen other genes from their QTL intervals. My guess is that those other genes might have also produced results that were equally convincing. I'm not asking them to do that, but it reflects the risk of false positive results when taking an approach like this. Nevertheless, I am guessing the authors would prefer to include this section.
  
  We appreciate the reviewer pointing out this possibility and agree with this concern. We have added a statement acknowledging the risk of false positives in Bayesian modeling in this context and noting that these findings are intended as testable hypotheses
  
  Methods
  
  (1) How were the 2 HS rats selected? I had the impression that Dr. Telese's lab had access to snRNA-seq data from more than 2 HS rats.
  
  We have clarified that these rats were selected based on their addiction-like behavior phenotypes from a larger cohort.
  
  (2) I didn't look back, but did the main paper point out that the rats are treated with oxycodone rather than morphine?
  
  We have clarified this distinction in the Methods section.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2022.03.11.483993v4
www.biorxiv.org www.biorxiv.org

Multi-timescale neural adaptation underlying long-term musculoskeletal reorganization.

1
1. Public_Reviews 15 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  (1) I think this is an important paper, but I’m puzzled about a tension in the results. On the one hand, it looks like the behavioural gains post-TT happen rather smoothly over time (Figure 5). On the other hand, muscle synergy activations change abruptly at specific days (around day ~65 for Monkey A and around day ~45 for Monkey B; e.g., Figure 6). How do the authors reconcile this tension? In other words, how do they think that this drastic behavioural transition can arise from what appears to be step-by-step, continuous changes in muscle coordination? Is it “just” subtle changes in movements/posture exploiting the mechanical coupling between wrist and finger movements, combined with subtle changes in synergies, and they just happen to all kick in at the same time? This feels to me to be the core of the paper and should be addressed more directly.
  
  We thank the reviewer for this insightful comment, as it touches upon the central finding of our study. The apparent tension between the smooth behavioral recovery and the abrupt shift in neural strategy is indeed a key feature of the adaptation process. We propose that this reflects the interaction of two distinct, parallel processes operating on different timescales:
  
  A slow, gradual skill-learning process, where the monkeys incrementally developed and refined a compensatory motor strategy (i.e., the tenodesis effect). This slow refinement is responsible for the smooth improvement seen in the behavioral metrics over many weeks.
  
  A fast, switch-like adaptive process, which governs the activation of the primary muscle synergies. The initial ‘swap’ strategy, while simple, was biomechanically conflicting and inefficient. The CNS only abandoned this flawed strategy abruptly once the slow learning process had rendered the new compensatory strategy “good enough” to be a viable alternative.
  
  Therefore, the abrupt neural shift does not cause the behavioral improvement but is rather enabled by the gradual, underlying development of a better motor solution. To address this important point more directly within the manuscript, we added a new subheading to the Discussion section. This section is dedicated to explicitly framing our findings within this multi-timescale learning model, ensuring the link between the gradual behavioral recovery and the abrupt neural shift is clearly articulated.
  
  (2) The muscle synergy analyses, which are an important part of the paper, could be improved. In particular:
  
  (a) When measuring the cross-correlation between the activation of synergies, the authors should include error bars and should also look at the lag between the signals.
  
  We thank the reviewer for these excellent suggestions to improve our analysis.
  
  Error Bars: We agree that showing trial-to-trial variability is important. In our revision, we have added a shaded envelope (representing the SD across trials) to the cross-correlation plots in Figures 6, 9 and 10.
  
  Time Lag: We have performed the cross-correlation analysis allowing for variable time lags and extracted the lag yielding the maximum correlation coefficient (max CC) for each session, in addition to the zero-lag correlation presented in the main figures. As hypothesized, allowing variable lags often resulted in high max CC values throughout the adaptation period, potentially obscuring the clear swap-and-revert pattern visible in the zerolag analysis. This is likely because the primary adaptation involved changes in synergy timing rather than fundamental shape. However, the analysis of the lag itself proved informative. We observed significant fluctuations in the optimal lag during the early and mid-adaptation phases, particularly around the time of the ‘switch-back’, before the lag stabilized closer to zero in the late phase.
  
  We have added a description of this analysis to the Methods section. The results of the lag analysis are now presented in a new Supplementary Figure S6 and S7, and a sentence summarizing this finding has been added to the Results section.
  
  (b) Figure 7C and related figures, the authors state that the activation of muscle synergies reverts to pre-TT patterns toward the end of the experiments. However, there are noticeable differences for both monkeys (at the end of the “task range” for synergy B for monkey A, and around 50% task range for synergy B for monkey B). The authors should measure this, e.g., by quantifying the per-sample correlation between pre-TT and post-TT activation amplitudes. Same for Figures 8I, J, etc.
  
  We thank the reviewer for this detailed and insightful suggestion. We agree that our use of the term ‘reversion’ should be nuanced, as the recovery of the synergy activation patterns is substantial but not perfect.
  
  To formally quantify these remaining differences, we performed a rigorous quantitative comparison between the pre-surgery and final-day post-surgery activation profiles. We calculated the Cosine Similarity to assess the recovery of the temporal shape, and used a Permutation Test (n=10,000) to test for statistical distinctness between the pre- and post-surgery trajectories.
  
  Results: We found that while the temporal shapes were highly similar (Cosine Correlation > 0.90 for all synergies), the Permutation Test confirmed that the profiles remained statistically distinct (p < 0.0001) in both animals.
  
  We have added this quantification to the text (Results). This confirms our nuanced interpretation: while the primary temporal features of the synergies reverted, the recovered motor program represents a novel, ‘good enough’ solution that is robust and functional, rather than a mathematically perfect restoration of the original baseline.
  
  (c) In Figures 9 and 10, the authors show the cross-correlation of the activation coefficients of different synergies; the authors should also look at the correlation between activation profiles because it provides additional information.
  
  We thank the reviewer for this comment and the opportunity to clarify our terminology. We agree that analyzing the correlation between the full activation profiles is the most informative approach. In our manuscript, the terms ‘activation coefficients’ and ‘activation profiles’ both refer to the complete, time-varying activation patterns of the muscle synergies. Therefore, the crosscorrelation analysis presented in Figures 9 and 10 is indeed the correlation between these full activation profiles. To prevent any potential ambiguity for future readers, we have revised the manuscript to use the term ‘activation profiles’ exclusively and consistently when referring to these time-varying synergy activations.
  
  (d) The muscle synergy analysis for Monkey B is hindered by the fact that the authors lost the ability to record from the (very) functionally relevant FDS muscle. I’d repeat the synergy analyses without this muscle to understand to what extent the observed changes with respect to baseline are driven by the lack of this data.
  
  We thank the reviewer for raising this important methodological point. We agree that controlling for changes in the recorded muscle set is crucial for a valid comparison between pre- and post-surgical synergy structures. The reviewer’s concern is based on the premise that the FDS muscle was included in the pre-surgical analysis for Monkey B but absent from the postsurgical analysis.
  
  We would like to clarify that this is not the case. Due to the loss of the FDS signal post-surgery, we made the deliberate decision to exclude the FDS muscle from ALL synergy analyses for Monkey B, including the pre-surgical baseline period. This was done for the precise reason the reviewer identifies: to ensure a direct and unbiased “apples-to-apples” comparison and to avoid introducing the lack of this muscle as a confound. Therefore, the changes in synergy structure that we report for Monkey B can be confidently attributed to genuine physiological adaptation rather than an artifact of a changing input dataset.
  
  (e) Figure 11: The authors talk about a key difference in how Synergy B (the extensor finger) evolved between monkeys post-TT. However, to me this figure feels more like a difference in quantity - the time course than quality, since for both monkeys the aaEMG levels pretty much go back to close to baseline levels - even if there’s a statistically significant difference only for Monkey B. What am I missing?
  
  We thank the reviewer for this insightful question, as it has prompted us to refine our interpretation of this key finding. The reviewer correctly notes that the recovery trajectories of Synergy B appear different, and we agree that our original explanation can be improved.
  
  A more parsimonious interpretation, and one that we believe aligns better with the data, is that both monkeys likely underwent a similar ‘arms race’, but we captured different phases of this process. In Monkey A, our recordings (starting Day 29) captured the escalating phase of this neuromuscular conflict. In contrast, for Monkey B, recordings began on Day 20, by which time this rapid escalation had likely already occurred and peaked. This difference in the timing of the ‘arms race’ is consistent with our behavioral observations; Monkey A struggled for a longer period before performing the task proficiently, suggesting a more protracted overall adaptation process. Thus, the apparent difference in the figures is likely a reflection of the observational window and the individual adaptation rate of each animal, rather than a fundamental qualitative difference in their adaptive strategy. We have revised the text to present this more unified and coherent interpretation.
  
  (f) Lines 408-09 and above: The authors claim that “The development of a compensatory strategy, primarily involving the wrist flexor synergy (Synergy C), appears crucial for enabling the final phase of adaptation”, which feels true intuitively and also based on the analysis in Figure 8, but Figure 11 suggests this is only true for Monkey B. How can these statements be reconciled?
  
  We believe the reviewer may be referring to Monkey A in their comment, as the strong compensatory effect is indeed seen in this animal. The core of this issue, which we have clarified in our revision, is that both monkeys developed a compensatory tenodesis grasp but used different neural strategies to achieve it.
  
  For Monkey A, strong evidence for this strategy is provided by a clear temporal shift in the activation of its dedicated wrist flexor synergy (Synergy C). As we have now clarified in the manuscript, the peak of this synergy’s activation moved from occurring just after object contact to just before it, a re-timing well-suited to enable a tenodesis grasp.
  
  For Monkey B, the strategy was one of subtle re-timing rather than scaling. While the total aggregated activation of its primary flexor synergy (Synergy A) did not significantly increase, its temporal profile shifted. Specifically, activation prior to object contact increased, providing the necessary wrist flexion for its assistive tenodesis grasp, which was kinematically confirmed in Figure 12. This was achieved by reallocating activation from the post-contact phase, resulting in an earlier activation peak for the synergy overall. Crucially, a finer-grained analysis reveals a precise temporal sequence within this synergy’s activation: the wrist flexor component (PL) consistently peaked just before object contact to enable hand opening, while the finger flexor component (FDP) peaked just after contact to secure the grasp.
  
  This timing resolves the apparent biomechanical conflict. It also reveals that while both monkeys converged on the same biomechanical solution (a tenodesis grasp), the observable neural implementation appeared different. However, we must be cautious in directly comparing the computed synergy structures themselves, as the analysis for Monkey B was performed without the FDS muscle. The apparent “multi-functional synergy” in Monkey B is most likely a consequence of this missing data. What is clear and robust, however, is that both monkeys converged on a remarkably similar temporal solution: they both learned to re-time the activation of their key wrist flexor muscles to the pre-grasp phase.
  
  In Monkey A, this was observed in the temporal shift of its dedicated wrist flexor synergy (Synergy C). In Monkey B, this was observed in the temporal shift of the Palmaris Longus (PL) muscle itself (which, in our computed synergies, was grouped into Synergy A). This convergence on an identical temporal adaptation, regardless of the computed modular organization, is the key finding. We have revised the manuscript to articulate this more precise and defensible interpretation.
  
  (3) Experimental design: at least for the monkey who was trained on the “artificial task” (Monkey A), it would have been good if the authors had also tested him on naturalistic grasping, like the second monkey, to see to what extent the neural changes generalise across behaviours or are task-specific. Do the authors have some data that could be used to assess this even if less systematically?
  
  We thank the reviewer for raising this important point regarding the generalizability of our findings across different behaviors. We fully agree that a direct comparison of both tasks in the same animal would have been a valuable experiment. Unfortunately, we do not have systematic data on naturalistic grasping for Monkey A that would allow for such a direct comparison. We therefore view the two tasks as providing complementary evidence. Monkey A’s data shows the adaptation process during a highly stereotyped behavior, while Monkey B’s data demonstrates that a similar two-phase adaptive process occurs during a more naturalistic, unconstrained task. The convergence of these findings strengthens our overall conclusion that this multi-timescale adaptation is a robust principle of motor learning. Nonetheless, the reviewer raises a fascinating question about the task-specific tuning of motor synergies, which remains an excellent direction for future studies.
  
  (4) Monkey B’s behaviour pre-tendon transfer seems more variable than that of Monkey A (e.g., the larger error bars in Figure 5 compared to monkey A, the fluctuating crosscorrelation between FDS pre and EDC post in Figure 6Q). This should be quantified to better ground the results since it also shows more variability post-TT.
  
  We thank the reviewer for this excellent suggestion to formally quantify the presurgery behavioral variability. We have performed the suggested analysis on the "Grip Formation Time" metric (Fig. 5A), which was the comparable metric between the two tasks. Our calculation of the Coefficient of Variation (CV) confirms the reviewer’s observation. Monkey B’s pre-surgery performance was substantially more variable (CV = 81.93%) than Monkey A’s (CV = 46.62%). Furthermore, a non-parametric test for equal variances (Ansari-Bradley test) confirmed that this difference is highly statistically significant (p < 0.0001). We have added a description of this analysis to the Methods and reported this finding in the Results section to provide a clearer context for the baseline differences between the subjects.
  
  (5) Minor: Figure 12 is interesting and supports the idea that monkeys may exploit the biomechanical coupling between wrist and fingers as part of their functional recovery. It would be interesting to measure whether there is a change in such coupling (tenodesis) over time, e.g., by plotting the change in wrist angle vs change in MCP angle as a scatter plot (one dot per trial), and in the same plot show all the days, colour coded by day. Would the relationship remain largely constant or fluctuate slightly early on? I feel this analysis could also help address my point (1) above.
  
  We thank the reviewer for this excellent and insightful suggestion. We have performed the suggested analysis for Monkey B, plotting the trial-by-trial relationship between wrist and MCP angles for all recording days (New Figure 13).
  
  The results clearly show the gradual refinement of the tenodesis coupling. Pre-surgery, there was no correlation (R²=0.00). Immediately post-surgery (Day 22), the relationship was weak and variable (R²=0.16), reflecting an exploratory phase. Over the following weeks, the coupling became progressively stronger and more consistent, with the R² value peaking at 0.58 around Day 56, indicating a robust exploitation of the new strategy. The relationship then stabilized at a moderate level (R² ~0.2-0.3) in the final days. This analysis provides direct kinematic evidence for the slow, gradual skill-learning component of our two-state model. It beautifully complements our response to the reviewer’s first point by visualizing the underlying refinement process that occurred concurrently with the more abrupt neural shifts. We have added this new figure and a description of these results to the manuscript.
  
  Reviewer #2 (Public review):
  
  Weaknesses:
  
  The most notable weakness of the study is the incompleteness of the data. [...] As a result, it is difficult to make general conclusions from the study, and it awaits further analysis or the addition of another subject.
  
  We thank the reviewer for this critical and accurate assessment of the study’s limitations. The reviewer is correct that the datasets for the two monkeys are incomplete in different ways and that the tasks were not identical. We fully acknowledge these limitations throughout the manuscript. Rather than viewing these differences as a weakness that prevents generalization, we propose that they offer a unique strength in the form of complementary evidence. We consider the two animals not as a direct replication, but as two distinct case studies that test the same underlying hypothesis under different conditions.
  
  Monkey A, with its high-quality EMG and highly stereotyped task, provides a detailed, quantitative view of the neural adaptation process, allowing us to precisely characterize phenomena like the ‘neuromuscular arms race’.
  
  Monkey B, with its kinematic data and more naturalistic task, provides crucial evidence that the same fundamental principles, a two-phase adaptation and the eventual development of a compensatory strategy, generalize to a less constrained, more behaviorally relevant context. We believe the key finding is the convergence of the results. Despite the differences in individual strategy, task demands, and available data, both animals demonstrated the same core "swapand-revert" adaptive process. We propose that this convergence from heterogeneous sources lends support to the generalizability of our conclusions, suggesting that the multi-timescale adaptation we describe may be a general feature of motor learning following such perturbations. We agree that future studies with more subjects are needed to fully establish this principle. Nonetheless, we feel that the convergent evidence from these two complementary cases provides a valuable foundation for the model we present.
  
  A second weakness is the insufficient analysis of the movements themselves, particularly for Monkey A. [...] Since the authors have video data for both monkeys, it is surprising that it was not used to extract landmarks for kinematic analysis, or at least hand/endpoint trajectory, and how it is adjusted over time. Adding more behavior data and aligning it with the EMG data would be very helpful for characterizing motor recovery and is needed to support conclusions about underlying neural control strategies for functional improvement.
  
  We thank the reviewer for this important suggestion. The reviewer’s comment prompted us to re-examine our behavioral data, and we have now performed additional analyses that we agree provide a much clearer link between the neural changes and functional recovery.
  
  For Monkey A, we have quantified the ‘pull times’ on a day-by-day basis. This analysis reveals a clear, gradual learning curve: pull times were initially long and variable post-surgery but steadily decreased and stabilized over the recovery period. This provides a direct, quantitative measure of motor performance recovery for this animal.
  
  For Monkey B, we have performed a detailed analysis of the ‘grasp aperture’ prior to object contact. This kinematic analysis is particularly revealing, as it shows the development of the compensatory strategy in real-time. The grasp aperture was initially very small post-surgery, reflecting the monkey’s inability to open its hand. It then steadily increased over the next ~40 days as the monkey learned and refined the compensatory tenodesis grasp, before stabilizing at a new, functional baseline.
  
  We believe these new analyses directly address the reviewer’s concern by providing a more detailed picture of motor recovery. The grasp aperture data, in particular, offers a clear kinematic correlate for the slow, skill-learning process that we propose runs in parallel to the more abrupt neural reorganization. We have added these results as a new figure in the main text of our revised manuscript.
  
  Considering specific conclusions, the statement that the monkeys learned to use “tenodesis” over time by increasing activation of a wrist flexor muscle synergy does not seem to be fully supported by the data. [...] Given these issues, it is not clear how to align the EMG and kinematic data and interpret these findings.
  
  We thank the reviewer for this detailed and critical analysis. They raise an excellent point and have correctly observed that the adaptation is not a simple, uniform increase in wrist flexor synergy amplitude. Our interpretation, which we have clarified in the manuscript, is that the monkeys learned a more sophisticated strategy: a precise re-timing of the wrist flexor activation to occur earlier in the movement, specifically to pre-shape the hand for the grasp.
  
  For Monkey A: The reviewer correctly notes that the peak amplitude of Synergy C (the wrist flexor synergy) around the moment of grasp (0% task range) is lower in the final phase compared to baseline. However, the crucial change is temporal: the peak of this synergy’s activation shifts from occurring just after the grasp (~+1%) to occurring just before it (~-2%). This re-timing is perfectly suited to enable finger extension via the tenodesis effect immediately prior to object contact. The subsequent lower amplitude may reflect a more efficient, less forceful movement once this new skill was refined.
  
  For Monkey B: The reviewer is right that this monkey does not have a dedicated wrist flexor synergy and that the overall amplitude of the PL muscle does not increase dramatically. However, a closer look at its activity profile (Fig. S2-AN) reveals a clear and consistent increase in activation specifically in the pre-contact phase (~7% task range). This is the precise neural signature of the assistive tenodesis grasp that is kinematically confirmed in Figure 12. The monkey is not simply scaling up the synergy; it is strategically activating it earlier to prepare for the grasp.
  
  In summary, the key evidence linking the EMG to the tenodesis strategy is in the temporal domain. The learned re-timing of the wrist flexor activation to the pre-grasp phase is the crucial link that aligns the neural and kinematic data. We have revised the manuscript to make this distinction between amplitude scaling and temporal shifting clearer.
  
  A more minor point regarding conclusions: statements about poor task performance and high energy expenditure being the costs that drive exploration for a new strategy are speculative and should be presented as such. Although the monkeys did take longer to complete the tasks after the surgery, they were still able to perform it successfully and in less than a second and no measurements of energy expenditure were taken.
  
  We thank the reviewer for this important point regarding the precision of our language. We agree that statements regarding ‘high energy expenditure’ and the specific drivers for exploring a new strategy are interpretations of the data, not direct measurements, and should be framed as such.
  
  Our speculation about energetic cost is based on the significant increase in muscle co-activation we observed (e.g., Fig. 11), a phenomenon widely understood to be metabolically expensive. Similarly, while the monkeys were still successful, their prolonged movement times and inefficient motor patterns represent a clear performance deficit compared to their highly optimized presurgical baseline, which we propose acted as a driver for further adaptation. In our full revision, we have carefully revised the manuscript to soften these claims. We have used more speculative language, such as “we hypothesize that...”, “the likely cost of...”, or “may have provided the impetus for...” to ensure that our interpretations are clearly distinguished from our direct empirical findings.
  
  A small concern is whether the tendon transfer effect may fail over time, either due to scar tissue formation or tendon tearing, and it would be ideal if the integrity of the intervention were re-assessed at the end of the study.
  
  We thank the reviewer for raising this important point regarding the long-term integrity of the tendon transfer. We agree that a terminal anatomical re-assessment would be an ideal control. While a terminal assessment was not performed as part of this study’s protocol, we were able to monitor the transfer’s integrity throughout the study. We are confident the transfer remained functionally intact for two key reasons:
  
  (1) Physical Monitoring: We periodically used ultrasound imaging to non-invasively visualize the tendon repair, which allowed us to confirm its continued physical integrity.
  
  (2) Functional Evidence: This physical confirmation was corroborated by the functional data. Both animals achieved stable, proficient task performance that was maintained for months. Furthermore, the late-phase neuromuscular control strategies became highly consistent. A significant failure, such as a tendon tear or prohibitive mechanical scarring, would be incompatible with this sustained behavioral and neural stability.
  
  Nevertheless, we agree that a terminal assessment is an excellent methodological suggestion that should be incorporated into the design of future long-term studies of this nature.
  
  Reviewer #3 (Public review):
  
  (1) First, I find myself wondering about the physical healing process from the tendon transfer surgery and how it might contribute to the learning. Specifically, how long does it take for the tendons to heal and bear forces? If this itself takes a few months, it would be nice to see some discussion of this.
  
  We thank the reviewer for this insightful question about the potential contribution of the physical healing process to the adaptation timeline. Our surgical protocol was specifically designed to ensure the tendon transfer was biomechanically robust from the outset, minimizing the role of healing as a rate-limiting factor.
  
  We used a Pulvertaft weave technique, which is known to achieve mechanical strength equivalent to that of a native tendon shortly after the procedure (Graham et al., 2023). The repair involved more than two weaves and utilized high-strength suture material to maximize its initial forcebearing capacity. While full fibrous integration around the suture site typically occurs within approximately six weeks, the repair itself was strong enough to bear physiological forces immediately post-surgery. Therefore, the prolonged, complex, two-phase multi-month behavioral recovery and the neural reorganization we observed cannot be attributed to a slow physical healing process. Instead, this supports our conclusion that the observed timeline reflects the challenges and constraints of a purely neural adaptation and skill-learning process. To make this crucial point clear to all readers, we have added these details about the surgical method to the Methods section and included a brief discussion of its implications in the Discussion.
  
  (2) Second, I see that there are some changes in the muscle loadings for each synergy over the days, though they are relatively small. The authors mention that the cosine distances are very small for the conserved synergies compared to distances across synergies, but it would be good to get a sense for how variable this measure is within synergy. For example, what is the cosine similarity for a conserved synergy across different pre-surgery days? This might help inform whether the changes post-surgery are within a normal variation or whether they reflect important changes in how the muscles are being used over time.
  
  We thank the reviewer for this excellent and insightful suggestion. Establishing a baseline for normal day-to-day variability is an important control for our synergy analysis.
  
  We have performed this analysis in full. Specifically, to quantify baseline stability, we calculated the cosine similarity between the spatial synergy weights (W) of each individual recording day and the pre-surgery average. This provides a rigorous measure of day-to-day variability relative to the stable baseline structure. We have added these data to Figure 7 (Panel I), which plots the pre-surgery similarity (blue traces) alongside the post-surgery adaptation (red traces).
  
  We found that baseline stability was remarkably high, with cosine similarity consistently exceeding 0.99 (e.g., Monkey A: 0.99 ± 0.001). This quantification allows the reader to formally assess that the changes observed post-surgery (e.g., drops to ~0.80 or ~0.60 in Monkey B) are well outside the range of normal physiological fluctuation, representing subtle but genuine structural adaptation.
  
  (3) Last, and maybe most difficult (and possibly out of scope for this work): I would have ideally liked to see some theoretical modeling of the biomechanics so I could more easily understand what the tendon transfer did or how specific synergies affect hand kinematics before and after the surgery. Especially given that the synergies remained consistent, such an analysis could be highly instructive for a reader or to suggest future perturbations to further probe the effects of tendon transfer on long-term learning.
  
  We thank the reviewer for this excellent and forward-thinking suggestion. We completely agree that a detailed biomechanical model of the tendon transfer would be a powerful tool for understanding the mechanical consequences of the surgery and for interpreting the function of the recorded muscle synergies. However, creating a subject-specific musculoskeletal model with the fidelity required to accurately simulate synergy-to-kinematic transformations is a highly complex project that we feel is well beyond the scope of the current manuscript. Such an endeavor would constitute a major research project in its own right.
  
  Our study’s primary focus was to provide a detailed, longitudinal characterization of the in-vivo neural adaptation following this perturbation, a dataset that is itself rare and valuable. We aimed to document the physiological learning process as it unfolded over many months. Nonetheless, the reviewer’s point is exceptionally well-taken. Currently, we are constructing a monkey musculoskeletal model and performing tendon transfer on this model to investigate what kind of characteristics in the learning process reproduce the synergy changes observed in the experiments. Although this project is still in progress, to date, we have demonstrated that the robustness of synergies themselves is necessary for changes in muscle activity at the synergy level (Nakajima N, Wang S, Ogihara N, Oya T, Seki K, Funato T, Upper Limb Musculoskeletal Model of Macaque Monkey for Approaching Adaptation Mechanism to Tendon Transfer, Society for Neuroscience 2023, Washington DC, USA, 2023).
  
  The rich dataset we have collected in the present research could serve as an excellent foundation for developing and validating such a model in the future. We believe that combining these two approaches is a critical and exciting next step for the field, and we have highlighted this as a key future direction in our discussion.
  
  Recommendations for the authors:
  
  Reviewing Editor Comments:
  
  When revising the manuscript for resubmission, please try to improve the visual presentation of the data, which is a point highlighted by all three reviewers during the discussion, including making the presentation of monkey-specific results more consistent across subjects.
  
  We have comprehensively revised the figures to ensure a consistent and clear visual presentation, as requested. Specifically, we standardized the layout across all main and supplementary figures (placing Monkey A consistently in the top rows or left columns and Monkey B in the bottom rows or right columns) and applied unified color schemes throughout the manuscript. Furthermore, we harmonized the presentation of the analytical results, such as the specific cross-correlation pairings in Figures 9 and 10, to ensure that the data for both subjects are presented with identical logic, facilitating direct comparison.
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) Please revise the writing; some words are missing (line 90), and some sentences could be clarified slightly, even if the paper is well written (lines 317-320). The paragraph including the idea of tenodesis could also be further clarified, I think.
  
  Thank you for pointing these out. We have corrected the missing word (osteoarthritis) on line 90. We have also revised lines 317-320 to remove ambiguity. Furthermore, the section describing the tenodesis effect (now section "Distinct neural implementations...") has been substantially rewritten for improved clarity, incorporating a more detailed explanation of the biomechanics.
  
  (2) In the Introduction, the authors cite Hunter and Eckstein 2009 and Mercuri and Muntoni 2013 without describing the pathological conditions; this will not be clear for not nonspecialists.
  
  Thank you. We have added brief descriptions ("osteoarthritis, a degenerative joint disease," and "muscular dystrophy, which involves progressive muscle weakness,") directly into the Introduction sentence where these references appear.
  
  (3) Data presentation: I often thought that the data could be presented more clearly:
  
  (a) For example, Figure 3D and 4D should show error bars around the mean to have a sense of the consistency of pre-lesion behaviour. Same for other figures like Figure 6.
  
  We appreciate the reviewer's suggestion to visualize data consistency. (a) Figures 3D, 4D, and 6 (EMG Profiles): For these figures, we opted to display mean traces and peak markers to clearly illustrate the temporal shifts and relationships between muscles. Overlaying multiple standard deviation envelopes in these comparative plots would significantly reduce legibility. However, to fully address the reviewer's request to see the consistency of pre-lesion behavior, we direct attention to Supplementary Figure S1, which presents the complete EMG profiles with full error tubes (Mean ± SD) for every recorded muscle. (b) Quantitative Analysis Figures: We ensured that variability is explicitly visualized in all statistical analyses. The crosscorrelation time-courses in Figures 6 (G-Q), 9, and 10 are plotted with shaded error tubes to show variance. Similarly, the aggregated EMG analysis in Figure 11 utilizes bar plots with explicit error bars to quantify the statistical consistency of the changes.
  
  (b) The autocorrelation analysis in Figure 6 should also include measures of lag if it’s not at zero lag. If it’s the latter, please specify it in the Methods.
  
  We thank the reviewer for this question regarding the cross-correlation analysis presented in Figure 6 (Panels G-J, P-Q). We confirm that this analysis was performed at zero time lag. To clarify this, we have added a sentence to the Methods section (Subsection "Crosscorrelation analysis") explicitly stating that the EMG cross-correlations shown in Figure 6 were calculated at zero lag. We have also added a clarifying note ("at zero time lag") to the description of these panels within the Figure 6 caption.
  
  (c) Seeing EMG patterns similar to those presented in Figures 3D and 4D at different times post-lesion (e.g., as a Supplementary figure) would also give readers a better intuition of the neural changes.
  
  We thank the reviewer for this suggestion to provide more intuitive examples of the neural changes. We realize we did not sufficiently highlight this in the main text, but this complete data is already available in the manuscript. Supplementary Figures S1 and S2 provide a comprehensive overview of the EMG patterns for all recorded muscles in Monkey A and Monkey B, respectively. These figures show the pre-surgery and post-surgery average profiles for all recording sessions as well as the average profiles from five different post-surgery landmark days, covering the entire adaptation period. We have added explicit cross-references to these figures in the main text.
  
  (d) I couldn’t fully understand the analysis in Figure 4E; clarify.
  
  We thank the reviewer for noticing this oversight. The reviewer is correct that Figure 4E was not referenced in the main text. This panel was intended to show the baseline kinematic profiles (MCP and wrist angles) for Monkey B's control session, corresponding to the average EMGs shown in panel 4D. Given that our more comprehensive kinematic analyses are now presented in Figure 12 and the new Figure 13, we believe panel 4E is largely redundant. To improve the clarity and focus of Figure 4, we have removed panel 4E and its description from the revised manuscript.
  
  (e) Some figures showing neural changes (e.g., Figures 6G-J, 6P,Q, Figures 9 and 10, and even Figure 11 for different reasons) would become more understandable if they were accompanied by the behavioural changes (e.g., something like Figure 5A on top of them).
  
  We agree that visualizing the temporal link between neural reorganization and behavioral recovery is essential for interpreting the data. We have implemented this suggestion by overlaying behavioral metrics onto the right y-axes of Figures 6 (G-Q), 9, 10, and 11. However, regarding the specific behavioral metric, we opted to overlay the maladaptive behavior/aberrant reaching metric (from Figure 5B) rather than the grip formation time (Figure 5A). We found that the maladaptive behavior profile provided a clearer and more direct correlate to the neural data, as its peak coincides precisely with the ‘swapped’ synergy phase, thereby effectively illustrating the functional cost of that specific neural state.
  
  (f) Some figure captions could be improved by adding more detail (e.g., for Figure 6).
  
  We agree. We have substantially expanded and improved the captions for Figure 6 and Figure 7 to make them more self-contained and guide the reader more effectively through the key findings presented in the panels. We have also reviewed other captions for clarity.
  
  (g) I’d show the cosine distance between synergies across days as a main figure, e.g., as part of Figure 7, because this is an important result.
  
  We agree that the longitudinal stability of the synergy structures is a crucial result that deserves prominence. We have implemented this suggestion by adding a new panel, Figure 7 (I, K) for primary synergies and Figure 8 (K, L) for secondary synergies, which plots the cosine similarity of the spatial synergy weights across the entire experimental timeline. This figure explicitly visualizes the high stability of the pre-surgery baseline (blue traces, similarity > 0.99) and contrasts it with the dynamic structural tuning observed during the post-surgery adaptation (red traces), providing a clear, day-by-day account of synergy evolution as requested.
  
  (h) In Figure 7C, D and G, H, it’d be interesting to also see in the background the EMG for the transferred muscle that belongs to each synergy, to appreciate their relationship.
  
  We thank the reviewer for this suggestion. To illustrate the close relationship between the primary synergies and their key constituent muscles, while avoiding visual clutter in the complex post-surgery plots, we have modified the pre-surgery panels of Figure 7 (C, D, G, H). In these panels, we have now overlaid the average pre-surgery EMG profile of the primary transferred muscle belonging to that synergy (e.g., FDS for Synergy A, EDC for Synergy B) as a thin, gray, dashed line. This visually confirms the tight correlation between the synergy profile and the muscle’s activity at baseline.
  
  (i) In page 10, the authors report as maladaptive behaviour the duration of the aberrant reaching component from day 29 (monkey A) and day 20 (monkey B). What was happening before those recording dates? Were the monkeys recovering?
  
  Thank you for this question. We have added two sentences to the start of the Results section (“Functional Recovery Follows...”) clarifying that the period between surgery and formal recordings included approximately one week of home cage recovery followed by several weeks of assisted task practice. Formal recordings began once the monkeys could perform the task consistently without assistance.
  
  (j) In the Methods (EMG Analysis), the authors state that they resumed their recordings post-TT “once they (the monkeys) were able to perform the task on their own”. It would be good if the authors made this more precise (e.g., based on success rate or another metric).
  
  We thank the reviewer for this suggestion to increase precision. We have revised the Methods section to include the specific criteria used for resuming post-surgical recordings. Recordings were restarted once the monkeys were able to perform the task independently (i.e., without assistance from the experimenter) and consistently achieved a successful trial count of at least 100 trials within a single experimental session.
  
  (k) Line 266- reads “Alternation of EMG activity in non-transferred muscle suggests one possibility: TT might alter the control strategy of coordinated muscle activity for hand movement by modifying the transferred muscles and their agonists as a cohesive unit”, however, some “muscles showed patterns that were incompatible with a simple swap” (Lines 255-256). Doesn’t this observation suggest that what happens is not a simple change in muscle synergies?
  
  We thank the reviewer for this insightful question regarding the interpretation of muscles with adaptive patterns incompatible with the primary ‘swap-and-revert’. We agree that these observations require careful consideration within the modular framework. Our interpretation is that these muscles do not represent evidence against modular control, but rather reflect the involvement of multiple modules adapting concurrently. Specifically, muscles like FCR and PL, which showed distinct patterns, are primary members of Synergy C (the wrist flexor synergy) in Monkey A. Their adaptive profile is therefore consistent with the task-specific recruitment and retiming of Synergy C as part of the compensatory tenodesis strategy, rather than being a deviation from the swap observed in Synergies A and B. Synergies represent the dominant, shared variance in muscle activity. While they capture the overall strategy, some degree of individual muscle variation or the influence of secondary synergies is expected. We have added a sentence to the Results section to clarify that these diverse patterns likely reflect the differential involvement of muscles in multiple adapting synergies. We believe the overall evidence still strongly supports the modulation of stable synergies as the primary mechanism of adaptation in this paradigm.
  
  (l) You may want to call synergy A and synergy B, synergy F and synergy E to make recall easier? (Same for synergy C and D, which could be F2 and E2).
  
  We thank the reviewer for this helpful suggestion aimed at improving clarity. We considered renaming the synergies based on function (e.g., F/E). However, given the number of figures and the complexity of a global change, and the fact that the functional roles of Synergies C and D differed between animals, we decided to retain the original A/B/C/D labels for consistency. To ensure clarity for the reader, we have carefully checked the manuscript to ensure that we consistently define the primary functional role of each synergy (e.g., "Synergy A, the primary finger flexor synergy") when it is discussed.
  
  (m) Lines 315-317 - “These pattens of changes in synergy 3 and 4, both contributed minimally to the EMG of transferred muscles” -> This statement puts the causality as synergies cause muscles to activate according to certain patterns, which is supported by work by several groups -including the authors- however, they could also reflect biomechanical and task constraints as other have argued; perhaps this tone would be better for the discussion?
  
  We thank the reviewer for this nuanced point regarding the interpretation of synergy contributions. We agree that the causal relationship between computed synergies and muscle activity is complex and can reflect both neural commands and task constraints. To address this, we have revised the sentence in question in the Results section. Instead of stating that the synergies "contributed minimally," we now state that the changes in these synergies "were associated with minimal EMG activity in the transferred muscles." This phrasing is more descriptive of the observation and less implicitly causal, while retaining the key point within the flow of the results. The subsequent sentences, which offer interpretation, are already framed speculatively ("This suggests...", "may have served...").
  
  (n) Line 403 How do the authors conclude from the synergy patterns in Figure 11 that the early post-TT is characterised by “an unstable and inefficient neural control strategy”? To me, this is shown clearly in the behaviour, not in these plots, unless I’m missing something?
  
  We thank the reviewer for this comment, which highlights the need to clearly connect our neural findings to the behavioral outcome. The reviewer is absolutely correct that the behavioral data (Fig. 5) provides the most direct evidence of instability and inefficiency during the early adaptation phase. Our intention was to argue that the neural patterns observed in Figure 11 provide a physiological correlate for this behavioral inefficiency. Specifically, the escalating aggregated EMG activity observed in the conflicted extensor synergy (Synergy B), which we term the ‘arms race’, represents significant muscle co-activation. Such co-activation is widely understood to be energetically costly and reflects a suboptimal control strategy where the CNS is essentially "fighting itself" against the altered mechanics. To make this link clearer, we have revised the concluding sentence of the relevant paragraph in the Discussion ("The early adaptation phase...") to explicitly state that this escalating co-activation is a known marker of inefficient recruitment and that it occurred concurrently with the period of poor behavioral performance shown in Figure 5.
  
  (o) Lines 469-471. The authors suggest that muscle synergies may be preserved post-TT because a modular approach (to motor control) may be computationally easy and metabolically cheap. To me, recent data suggest that the most parsimonious explanation is what they later say: that the nervous system may not be plastic enough to change this (e.g., see Makin and Krakauer, “Against reorganisation” also in eLife).
  
  We thank the reviewer for raising this important theoretical point and for referencing the relevant literature on constraints on cortical reorganization. We agree that the preservation of muscle synergies in the face of such a profound perturbation is a key finding that warrants careful interpretation. In our revised Discussion (section "The CNS Defaults to a Modular Strategy..."), we have now explicitly incorporated the perspective that synergy stability may reflect inherent constraints on neural plasticity, citing Makin and Krakauer (2023), alongside our original hypothesis regarding computational and metabolic efficiency. We present these ideas not as mutually exclusive, but as potentially complementary factors that both contribute to the CNS’s apparent preference for modulating existing modules rather than fundamentally restructuring them.
  
  (p) Lines 501-503. Also on interpretation. Would the metabolic cost indeed be much higher? Couldn’t the observed change in strategy be explained purely based on performance metrics?
  
  This is an important point. We agree that statements regarding high energy expenditure are interpretations, not direct measurements. We have carefully revised the manuscript (Abstract, Results, and Discussion) to soften these claims, using more speculative language (e.g., "likely costly," "what we propose was...") to clearly distinguish our interpretations from direct empirical findings.
  
  (q) Lines 538-. The authors link the initial adaptation phase to the fast process reported in adaptation studies and say that this leads to poor retention. However, it seems from their data that the behaviour is stable across (early) days, so doesn’t this rule out such an interpretation?
  
  We thank the reviewer for this insightful question regarding the interpretation of the early adaptive phase within the two-state model framework. The reviewer correctly notes that the early post-surgical behavior, while maladaptive, appeared relatively stable across days and did not show the rapid decay sometimes associated with the "poor retention" characteristic of the fast system. We agree that this apparent stability requires careful interpretation. In our revised Discussion (section "A Multi-Timescale Model..."), we now propose that the fast system is primarily responsible for the initial, rapid adoption of the ‘swap’ strategy in response to the large error signal. The subsequent persistence of this flawed but stable state for several weeks is likely not due to strong retention by the fast system itself, but rather reflects the time required for the parallel slow system to gradually develop a more effective compensatory strategy (i.e., the tenodesis grasp). Once this alternative strategy became viable, it enabled the abrupt "switchback," which we also attribute to the fast system recalibrating away from the highly costly swap strategy. Therefore, we believe our data is consistent with the involvement of a fast system driving rapid strategic shifts, even if the typical "poor retention" phenotype is masked by the lack of a viable alternative strategy during the early phase.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) The discussion would benefit greatly from a more careful comparison with prior work characterizing the response to experimental or clinical tendon or nerve transfer in different models.
  
  We thank the reviewer for suggesting these important references and for the recommendation to compare our findings more carefully with prior work. This is an excellent point, and we agree it will significantly strengthen the discussion. In our full revision, we have added a new paragraph to the Discussion section dedicated to this comparison. We discuss how our findings relate to classic work showing primate adaptive capacity beyond simple maladaptive responses (Sperry, 1947), EMG evidence for the persistence of original neural patterns alongside new ones in human patients (Illert et al., 1986), the critical role of altered peripheral biomechanics and myofascial force transmission in complicating adaptation (Maas & Huijing, 2012), and how our observation of synergy stability aligns with evidence for modular adaptation strategies (Berger et al., 2013). This comparison helps situate our unique findings of a multi-timescale process and synergy timing modulation within the broader context of motor relearning after musculoskeletal rearrangement.
  
  (2) Line 90 - Which disease or condition is studied in Hunter and Eckstein (2009)?
  
  Thank you. We have clarified this in the Introduction; the reference pertains to osteoarthritis.
  
  (3) Line 280 for clarity in text and as a reminder to the readers, please state which muscles are involved in each synergy grouping.
  
  We have updated the text (Results, 'Adaptation occurs through modulating...') to explicitly list the main contributing muscles for each synergy grouping (e.g., Synergy A: FDS and FCU for Monkey A). This provides the requested clarity regarding the functional identity of each synergy while maintaining readability. For the complete, quantitative muscle weight composition including minor contributors, we referred the reader to Figure 7 and Supplementary Table 1.
  
  (4) Line 180 There are differences in the time course for measurements between the behavioral metrics and EMGs. If not recorded at fixed time intervals, the differences in the time courses for the two monkeys should be explained.
  
  We thank the reviewer for this question regarding the time courses of our measurements. We interpret this comment in two ways, both of which we have addressed in the revised manuscript.
  
  First, if the reviewer is asking about the overall recording schedule, they are correct that sessions were not performed at fixed daily intervals, and the specific days sampled differed between monkeys. This non-uniform sampling was due to the practical constraints of longterm behavioral experiments (e.g., animal cooperation, scheduling, weekends) and the aim to capture data during key phases of adaptation. However, within any given session, behavioral (video) and EMG data were always collected concurrently.
  
  Second, if the reviewer is asking whether the set of days included differs between the behavioral plots (e.g., Fig 5) and the EMG/synergy plots (e.g., Figs 6, 9-11), this is a possibility depending on data quality criteria. Our criterion for including a session in the behavioral analysis was a minimum of 20 successful trials. However, for the more demanding synergy analysis, we required a higher minimum of 100 successful trials to ensure robust factorization. It is possible that a few sessions met the behavioral criterion but not the synergy criterion and were thus excluded from the latter analysis, leading to slight differences in the days presented across figures. To ensure full clarity, we have added text to the Methods section explicitly stating: (A) the rationale for the non-uniform daily sampling schedule, and (B) the specific minimum trial count criteria used for including data in the behavioral versus the synergy analyses, noting if this resulted in different sets of days being analyzed for different figures.
  
  (5) General figure comments - The figures are informative, but they could be better presented, designed, and formatted to explain the important results in the paper. The figures should be able to explain most of the key results without entirely referring to the text to find some of the details. I had a bit of trouble understanding Figure 9 & 10. I would also like to suggest that bringing raw data into some figures (e.g., EMG of different muscle groups), such as showing stability between the synergies, could improve the results and allow the story to flow with more clarity. Likewise, clearly showing the differences between baseline EMG measurements and post-surgery measurements could improve some of the result figures.
  
  We thank the reviewer for these important general comments on data presentation. We agree that the figures are the key to our story and are implementing several revisions based on this and other reviewer feedback to improve their clarity.
  
  General Presentation: We have conducted a thorough review of all figures to improve layout, consistency, and font legibility (addressing R3, 1 and the Reviewing Editor's comments). This includes adjusting the layouts of Figures 3, 4, and 6 for better alignment and clarity.
  
  Figures 9 & 10 (Cross-correlation): The reviewer mentioned having trouble understanding these figures. In our revision, we have substantially rewritten the captions for Figures 9 and 10 to be much more descriptive. We explicitly walk the reader through how to interpret the plots (e.g., "The ‘swap’ is evidenced by the drop in self-correlation... and a concurrent rise in antagonist-correlation...").
  
  Including "Raw Data" (EMG): We thank the reviewer for this suggestion to provide more intuitive examples of the neural changes. We realize we did not sufficiently highlight this in the main text, but this complete data is already available in the manuscript. Supplementary Figures S1 and S2 provide a comprehensive overview of the EMG patterns for all recorded muscles in Monkey A and Monkey B, respectively. These figures show the pre-surgery and post-surgery average profiles for all recording sessions as well as the average profiles from five different post-surgery landmark days, covering the entire adaptation period. These figures directly visualize the swap-and-revert pattern in the transferred muscles and their agonists (e.g., EDC, ED23), as well as the diverse and complex adaptations in other nontransferred muscles (e.g., FCR, PL), as requested. To make this clearer, we have added explicit cross-references to Supplementary Figures S1 and S2 within the main Results section to ensure readers are directed to this detailed data.
  
  Showing Differences (Pre vs. Post): To "clearly show the differences between baseline... and post-surgery measurements," we implemented the point-by-point statistical comparison of pre- vs. final-day synergy profiles (as suggested in R1, 2b). This has resulted in a new Supplementary Figure visually highlighting the precise periods in the task where the final profiles still differ significantly from baseline (Fig. S9).
  
  We believe these additions (new figures and improved captions) will make the results much clearer and more self-explanatory, as the reviewer suggested.
  
  (6) Figure 1 A table with all the acronyms would help with identifying all the muscles and their respective synergies (supplemental), especially when describing the muscles in the result of the discussion section.
  
  This is an excellent suggestion. We have created a comprehensive table (Supplementary Table 1) listing all muscle abbreviations, full names, primary functional groups, and assigned synergies for both monkeys. We have added a reference to this table in the Figure 1 caption and the Methods section.
  
  (7) Figure 2 - is this mainly from Monkey A? If so, it should be stated.
  
  We thank the reviewer for pointing out this omission. We have updated the caption for Figure 2 to clarify that the example data shown (ultrasound, trajectories, and quantitative plots) are from Monkey A.
  
  (8) Figure 3 & Figure 4 seems unbalanced because of the descriptive need to explain Monkey B’s tasks? The figure alignments could be better.
  
  We thank the reviewer for this comment on the visual presentation of Figures 3 and 4. The reviewer’s observation that the figures appeared ‘unbalanced’ was correct. This was a direct consequence of two issues: (1) the different tasks required slightly different schematics (the "descriptive need" the reviewer mentioned), and (2) the original Figure 4 contained an additional kinematic panel (formerly 4E) that was unique to Monkey B, which broke the parallel structure with Figure 3.
  
  To address this and significantly improve the alignment, we have now moved the unique kinematic panel (formerly 4E) to a new Supplementary Figure (Supplementary Figure S8). This change has allowed us to re-arrange the panels in Figures 3 and 4 so that they now follow the exact same order. We have also adjusted the layout to ensure that corresponding panels are of a consistent size. We agree that this creates a much better visual balance and makes the comparison between the two monkeys far more direct and clear, as the reviewer suggested.
  
  (9) Figure 5. It seems like the animals can still perform the task post-surgery, but with high variability. Maybe emphasize the differences in variability between baseline and postsurgery?
  
  We thank the reviewer for this suggestion to emphasize the changes in variability. We have now quantified this using the Coefficient of Variation (CV) for key behavioral metrics across different phases (Pre-surgery, Early, Mid, Late post-surgery). The results confirm the reviewer’s observation of high variability post-surgery, particularly in the early phase. For instance, Monkey A’s grip formation time CV spiked dramatically (Pre: 47% vs Early: 133%), while Monkey B’s remained high (Pre: 82% vs Early: 76%). Interestingly, while Monkey A’s variability returned close to baseline levels in the late phase (Late: 55%), Monkey B’s variability increased further (Late: 97%), suggesting persistent inconsistency despite functional recovery.
  
  We also observed metric-specific changes. Monkey A’s pull time became less variable than baseline later on (Pre: 65% vs Late: 43%), suggesting refinement of that action. Conversely, Monkey B’s grasp aperture remained consistently low throughout (Pre: 26% vs Late: 19%), indicating relatively precise kinematic control was maintained or quickly regained. We have added a summary of these findings to the Results section to provide a more complete picture of how behavioral variability evolved relative to baseline during the adaptation process.
  
  (10) Figure 6 quite a confusing figure. This figure needs to be better presented. The figure legends are hard to see for Monkey A vs Monkey B. At first, I thought Monkey B’s figure legend also represented Monkey A. I would suggest reorganizing the figures for clarity and coherence.
  
  We agree that the original presentation of Figure 6 was dense and potentially confusing. We have completely reorganized the figure to improve clarity and coherence.
  
  (1) Clear Separation: The figure is now structured with a strict separation between Monkey A (Left Panels, A-J) and Monkey B (Right Panels, K-Q), with prominent headers for each subject to prevent ambiguity.
  
  (2) Improved Legends: We have redesigned the legends to be larger and placed them explicitly within their respective subject’s section to ensure it is immediately clear which data they describe.
  
  (3) Visual Consistency: We have standardized the color schemes and axis layouts across this and all other figures to reduce cognitive load and facilitate easier comparison between subjects.
  
  (11) Figure 12 - This figure is incomplete without Monkey A’s results. The videos in the supplemental sections seem clear enough for some kinematic analysis. The story could be more supported with more thorough measurements of the kinematics from both animals to show how they differ over time and by highlighting the two phases. As a minor note, it would be helpful to present the kinematic data together with a schematic of when during the task the data are drawn from, using the % task range scale, since that is the standard throughout the paper.
  
  We thank the reviewer for their suggestions regarding the kinematic analysis. We agree that a parallel kinematic analysis for Monkey A, similar to that in Figure 12, would be ideal. We did attempt this. Unfortunately, while the supplemental videos for Monkey A are sufficient for observing the overall movement trajectory, they are not suitable for the detailed joint angle analysis the reviewer suggests. The videos for Monkey A were recorded at an insufficient frame rate that did not allow to reliably extract the rapid joint angle positions of the wrist and fingers during the grasping movement. This is the reason why this detailed kinematic analysis was limited to Monkey B, for which we had high-speed video recorded at 240 fps, allowing for a robust analysis of these fast movements.
  
  We have, however, expanded our kinematic analysis for Monkey B to show the refinement of the tenodesis strategy over the full time course (New Figure 13), which does help to highlight the different adaptive phases for that animal. We have also clarified in the manuscript (e.g., in the caption for Figure 12) that the lack of Monkey A data for this specific analysis was due to the lowresolution and low-frame-rate video available.
  
  We agree that defining the precise timing of the kinematic snapshot relative to our normalized task range is critical for accurate interpretation. In response, we have added a new panel (Figure 12C) that explicitly maps the kinematic snapshot to our standardized task timeline. This schematic clarifies that the joint angle analysis captures the hand configuration during the pre-shaping phase, specifically at 83 ms prior to object contact (which corresponds to -0.02% of the normalized task range). This ensures the kinematic data can be directly interpreted within the same temporal context as the EMG and synergy results presented throughout the paper.
  
  Reviewer #3 (Recommendations for the authors):
  
  First and most major: I found many of the figures much too small and incredibly difficult to read. Possibly the most difficult was Figure 7, where I had to zoom in a great deal to read what muscles corresponded to which bars. I don’t have specific suggestions here other than to make sure that figures are legible.
  
  We thank the reviewer for highlighting this important issue. We have comprehensively revised the figures to ensure they are legible at standard publication sizes. Specific improvements include:
  
  (1) Figure 7: We have significantly increased the font size of the x-axis muscle labels and optimized the bar chart spacing to ensure the muscle identities are readable without excessive zooming.
  
  (2) Global Updates: Across all figures, we have increased font sizes for axis labels and titles, removed unnecessary whitespace to maximize the data-to-ink ratio, and exported all final figures in high-resolution vector formats to ensure clarity.
  
  Second and more minor: I liked the setup of the manuscript, where the authors explained the unique benefits of their experimental methods and the question they were going after (“When confronted with structural changes to the musculoskeletal system, does the CNS adapt by modulating existing synergies, or by shifting toward more fractionated control strategies?”). However, the evolution of the paper made the answer to this question seem very confusing to me as I read it. The results show that monkeys initially modulated existing synergies in phase 1, but then reverted to the original modulation. This, in addition to the way the question was set up initially, made me think the conclusion was going to be that the synergies themselves changed in the second phase, but this paradoxically was not the case--synergies were stable throughout. I was left confused for the back half of the results section, until the discussion on tenodesis and developing compensatory movement strategies. So the answer is that the monkey learns by modulating existing synergies, but using different strategies in different learning phases. I’m not entirely sure how to avoid this confusion, but I wonder if there’s a way to foreshadow this finding earlier on.
  
  We thank the reviewer for this valuable feedback on the manuscript’s narrative structure. We understand how the initial framing (modulation vs. fractionation) followed by the reversion of the initial modulation could lead to confusion before the compensatory strategy is fully introduced. To address this, we have made two key adjustments in the revised manuscript:
  
  (1) In the Introduction, after posing the central question, we have added a sentence to subtly foreshadow that the adaptive process might be complex and multi-phasic, requiring analysis over extended timescales.
  
  (2) In the Results section, at the transition point between describing the reversion of the primary synergy timings and introducing the compensatory tenodesis strategy, we have added a short paragraph to explicitly signal that the reversion was not the complete solution and that a distinct compensatory strategy emerged concurrently.
  
  We believe these changes improve the narrative flow, provide better signposting for the reader, and mitigate the potential for confusion identified by the reviewer, making it clearer that the ultimate solution involved modulating existing synergies but via different strategies across distinct learning phases. We appreciate the reviewer’s help in identifying this area for improvement.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.10.18.618983v4
www.biorxiv.org www.biorxiv.org

Understanding neural circuit principles for representation learning through joint-embedding predictive architectures

1
1. Public_Reviews 15 May 2026
  
  in eLife
  
  Author response:
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  The paper describes a biologically plausible version of JEPA using recurrent neural networks called RPL for recurrent predictive learning. Given an embedding z<sub>t</sub>, a recurrent neural network processes these inputs with the form: c<sub>t</sub>+1 = RNN(c<sub>t</sub>,z<sub>t</sub>). Then the predictive network f is predicting the future inputs with the format: min||f(c<sub>t</sub>) − stop grad(z<sub>t</sub>+∆<sub>t</sub>)||<sup>2</sup>. I understand that a prediction error is defined as: e = z<sub>t</sub>+∆<sub>t</sub> − f(c<sub>t</sub>) to model cortical measurements in the oddball task.
  
  The RPL model is also shown to build an internal world model, with ”real-world” data like the movement of moving animals or speech signals. The representation is then compared to V1 data and expected prediction error signals in an oddball setting. In a stacked hierarchy of RNN learning with RPL, the higher layers appear to learn high-level latent variables, although gradients are not propagated downward to the lower layers.
  
  The paper tackles an open question: Self-supervised learning is thought to be a fundamental principle to explain how computation is structured in the brain. Cortical data suggest qualitatively that prediction error is a core principle of representation learning in the brain, but the field is still looking for a simple yet expressive model that would explain how the cortex learns its representations. RPL contributes in that direction by making a useful link between cortical representation learning in RNN models and the JEPA learning algorithm that was demonstrated to scale to large world model learning from video data by Lecun’s group. It is very useful to connect this popular deep learning algorithm to cortical data.
  
  The model formalism is relatively elegant and simple: Simple next input prediction objectives are conceptually simple but not necessarily trivial to build at scale. There is a clear benefit in comparison with contrastive or IL methods because they are free from dataset-specific data augmentation and negative samples. Thereby moving the comp neuro field towards conceptually simpler models of representation in the cortex. Yet predictive only models (and in particular predictive models in latent space instead of pixel space) are not easy to build in a stable fashion. JEPA family is basically intended to solve this question; it is very nice and timely to bring this to comp neuro.
  
  The methodology combining comp neuro and deep learning makes sense: The conceptual and qualitative analogy with cortical prediction errors is relevant and consistent with what is expected as a model of self-supervised learning in cortical models. The methodology to compare RPL with IL and CL is methodologically meaningful and grounded: showing, for instance, how some of the models fail to represent some latent structure in some toy datasets is interesting.
  
  (1.1) h-RPL: The h-RPL is perhaps the most creative departure from the JEPA model family. It would be interesting to say more about what was particularly difficult to see in the latent variables emerging in the hierarchical model. I often find it magical that layer-wise learning rules of this type are not learning redundant representations. Any insights why this is not the case here would be potentially insightful.
  
  We thank the reviewer for this comment. Regarding representational collapse in h-RPL: each local circuit independently applies the same collapse-preventing strategy as the single-level RPL model: namely, the asymmetric prediction architecture combined with the stop-grad operator. Since this mechanism operates locally within each circuit, it is sufficient to prevent collapse at every level of the hierarchy independently (see also our response to Point P1.3).
  
  The more subtle question is why the circuits learn non-redundant rather than identical representations across the hierarchy. We believe two mechanisms are at play here: First, the hierarchical encoder is a stacked convolutional network, meaning that receptive field sizes grow with depth. This architectural inductive bias naturally encourages successive circuits to operate on increasingly spatially integrated features, creating a structural pressure toward learning complementary rather than redundant representations. Second, the growing expressivity of the network with depth means that higher circuits have access to richer, more abstract inputs from which they can extract higher-level latent structure that is not already captured by lower circuits. Together these factors: the local collapse-preventing mechanism and the depth-dependent growth in receptive field size and network expressivity presumably explain why h-RPL builds an increasingly refined and non-redundant representational hierarchy.
  
  What we will do: We will expand our discussion on this point in the revised manuscript. We plan to expand our quantification on how abstractions emerge in h-RPL in future work in which we will also study variations with top-down connections.
  
  (1.2) In general, I fully support the type of question and ideas that the paper is putting forward. It is, however, very hard in this research field to gain insight into specific conceptual contributions or specific bits of experimental data that the model puts forward. In pointing to the following weaknesses, I am encouraging the authors to lay out more clearly what the unique hypothesis is or the contribution of the RPL model that we should remember it for.
  
  Thanks for the positive feedback along with the constructive criticism, and we agree that articulating the core contributions more crisply would strengthen the paper.
  
  At its heart, we believe the paper makes two contributions we hope it will be remembered for. First, while prior work has established that invariant representations can be learned via local Hebbianlike learning rules, we show that learning equivariant representations alongside a latent dynamics model requires something qualitatively different: a local circuit; one with recurrent dynamics and an asymmetric predictive architecture. RPL provides a minimal concrete instantiation of this principle.
  
  Second, and perhaps more broadly, the model makes a structural prediction about (cortical) neuronal circuit organization: since the encoder, integrator, and predictor each perform functionally distinct computations, the framework implies the existence of corresponding cell types and connectivity patterns one should look for in experimental data.
  
  What we will do: We will sharpen these above messages in the revised manuscript to ensure these contributions are prominently highlighted throughout the paper.
  
  (1.3) Comparison with JEPA variants: JEPA variants are integrating different details into the learning algorithm. Integrating, for instance, “masking” of the latent encoder targets, or EMA in the style of BYOL or Siamese networks, for the predicted representations. It is great that RPL does not seem to need any of those (next input prediction is a natural implementation of masking, and EMA does not seem to be used). It is notoriously hard for the JEPA model to work without these features. Since some of these details are sometimes surprisingly crucial for a simulation to work, it would be good to report which of the other important details were key to live without EMA and masking. Is it the difference in learning rate, for instance? Or maybe the tasks considered are simply easy enough for any model to work; if so, it could be useful to acknowledge to what extent this is true.
  
  We thank the reviewer for raising this important point. There are two key mechanisms that ensure stable, non-trivial training in RPL. First, using a higher learning rate for the predictor relative to the encoder is crucial for stable training. This prevents the predictor from collapsing the encoder representations and was already noted empirically by Chen et al. (2021).
  
  Second, and more fundamentally, predicting at the level of the memoryless encoder output, rather than at the level of the recurrent integrator, is essential to prevent a degenerate solution in which the RNN simply learns to generate an internally predictable time series unrelated to the input. By anchoring the prediction target to the encoder, the model is forced to ground its representations in the sensory input. Intuitively, otherwise the RNN can simply “make up” a predictable time series, which satisfies the learning objective, but would not yield useful internal representations.
  
  Beyond these architectural points, previous work from our group (Srinath Halvagal et al., 2023) has shown mathematically that JEPAs without EMA avoid collapse via an implicit variance regularization mechanism, and we believe RPL benefits from the same principle. Indeed, we now have a more complete theoretical understanding of this, including identifiability proofs for the latent dynamical model under relatively mild assumptions (Mikulasch et al., 2026). This work has recently been accepted at ICML. Other than that, one has to ensure that representations are not already nearly collapsed at the beginning of training. In this paper, we used normalization layers (batchnorm) in the encoder to ensure this.
  
  Finally like all SSL paradigms the augmentation strength is an important hyperparameter that impacts the quality of learned representations. In the temporal predictive setting, the augmentation strength is fixed by the world itself. The only knob we have to play with is the prediction horizon ∆. While we typically focused on next-time-step (∆ = 1) prediction, we saw a clear effect in the case of the speech dataset where ∆ = 8, but not ∆ = 1, yielded useful representations for the tasks (Fig. 5b).
  
  What we will do: We will discuss the above points more prominently in the discussion to avoid them being overlooked in the methods. Additionally, we will include a plot on the empirical prediction horizon for the speech dataset in the supplementary material for reference.
  
  (1.4) Comparison with IL and CL: On a high level, the comparison with IL and CL algorithms is written as conclusive. I suspect that the failure modes of IL and CL that are described are not due to the algorithms themselves, but rather to the construction of invariance statistics or the choice of negative sample sets (the sets of samples among which variance 1 is requested by VICreg). For instance, if variance (or negative sample set) is taken only across time, the variance object identity is expected to collapse. Similarly, if the variance is taken across the object identity, the variance across time can collapse. So I wonder if the failure of IL and CL is induced by the construction of the variance definition.
  
  We thank the reviewer for this thoughtful point. Both RPL and CL implement an implicit variance regularizer by virtue of being JEPAs (Srinath Halvagal et al., 2023), whereas IL uses an explicit regularizer computed along both the batch and time dimensions to avoid representational and dimensional collapse. The failure modes of IL and CL therefore cannot be entirely attributed to the statistics of the input samples chosen for variance regularization, but are instead primarily determined by the choice of prediction and target representations.
  
  What we will do: We will clarify this in the Methods section of the revised manuscript.
  
  (1.5) Prediction error: When compared to the recording of cortical activity in Figure 7. It is not obvious from the figure which latent space we are talking about mathematically. Is the vector z, c or the prediction error e? This is rather important from a neuroscientific point of view, because the prediction error e is expected to explain the neuronal data. On the other hand, the prediction error e is only used in the learning algorithm to define the loss function, but it is not the communication medium between the RNN units c (or with the encoder z).
  
  In the brain, since the measurements are recorded as neural activity, they are communication channels between specific units (z or c). It is probably c or z that would already explain the oddball prediction error. I believe that other models, like Forward-forward of Nejad et al., have tried quite hard to address this apparent tension. Whether or not this is resolved by RPL, it thinks it would be beneficial to state the problem and clarify how the algorithm addresses or ignores the issue.
  
  Thanks for pointing out the issue with regards to clarity and for raising the important but subtle point about prediction error representation. To answer the immediate question asking which vector we use in Figure 7, it is the vector c corresponding to the integrator representations. We agree this should be stated explicitly and will update the manuscript accordingly.
  
  On the more general point, we agree that the tension between recordable neural activity and the computational role of prediction errors is an important issue. We do already briefly engage with it in the Discussion (subsection “Relation to previous modeling work”), where we note that under RPL “inter-areal communication is dominated by representations rather than error signals”. However, we agree that this point should be surfaced more directly.
  
  To elaborate, under classical predictive coding, prediction errors are the inter-areal communication channel and are therefore expected to be directly observable in neural recordings, e.g., as oddball responses. Under RPL, this is not the case: e is computed locally within a circuit and serves only as a learning signal for synaptic plasticity, not as a signal propagated between circuits or areas. What cortex primarily encodes and communicates in our framework are predictive representations, not reconstruction errors. Accordingly, what should map onto recorded population activity are the representations c (and z), while locally computed prediction errors could in principle remain observable as more circumscribed or transient mismatch-like signals within a circuit.
  
  We would like to push this point further. The reviewer frames this as a tension that RPL needs to resolve, but growing neurophysiological evidence suggests that classical residual-difference prediction errors may not be a dominant mode of cortical encoding in the first place. Furutachi, Franklin, et al. (2024) showed that V1 responses to unexpected visual stimuli do not encode how input deviates from predictions, but instead selectively amplify the representation of the unexpected stimulus itself. Very recently, Furutachi and Hofer (2026) generalize this into a revised framework in which feedforward pathways transmit sensory representations modulated by prediction-error magnitude, rather than residual differences. Vasilevskaya et al. (2026) constrain the space of plausible cortical algorithms via functionalinfluence experiments, also concluding that no variant of standard predictive processing is consistent with the full pattern of layer 2/3 ↔ layer 5 interactions; they propose a JEPA-based model, citing RPL as a promising candidate. The model by Nejad et al. (2025) similarly shares with RPL the property that representations, rather than residual errors, propagate between circuit elements.
  
  Taken together, the apparent tension may be less a problem RPL needs to resolve than one it is well positioned to explain, remaining consistent with the emerging picture of cortex as encoding amplified sensory features rather than transmitting residual errors across areas.
  
  What we will do: We will add missing information to the main text and sharpen the Discussion with these arguments.
  
  (1.6) Successor representation without value? I believe the term successor representation is historically relevant in a reinforcement learning (RL) setting and has a precise mathematical definition. Without RL, I feel that learning successor representation is conceptually identical to learning a transition matrix (aka, a primitive world model). I therefore wonder if the pitch for high-level framing of the successor representation is appropriately described or trivial.
  
  The reviewer makes a valid point on the concept of successor representations. To answer the immediate question, it is not entirely trivial, as we not only observe the emergence of the transition structure (Fig. 6c), but also the encoding of decaying future (but not past) state occupancy (Fig 6d,e). We largely adapted the terminology “successor-like representations” from the study by (Ekman et al., 2023), but we will elaborate a bit further for why we stuck to it. As nicely pointed out by the reviewer, the term “successor representations” was introduced in the RL literature (Dayan, 1993), but further adopted in neuroscience to describe the idea that a neuronal population encodes a predictive representation that reflects the expected future occupancy of future states under a given policy. Ekman et al. (2023) use the term “successor-like representations” to explain the phenomena where the neural activity in V1 (and hippocampus) represent both current and (discounted) future, but not past, state occupancies in a sequence learning task with no explicitly defined policy or value training. In other words, successor-like representations are simply predictive representations.
  
  What we will do: To deal with this dichotomy, we will replace “successor-like representations” with the term “predictive representations” in the abstract and clarify this distinction in the Results section of the revised manuscript.
  
  (1.7) Learning in RNN: Learning with recurrent networks appears to be a key in this model presented here (it is in the algorithm name). Yet, this aspect of the model and the literature on biologically plausible learning rules for RNN is not really discussed.
  
  We thank the reviewer for raising this concern. While h-RPL is one step toward more biologically plausible and spatially local learning rules, exploring it further in terms of temporal credit assignment is beyond the scope of the present study and would require a more systematic and in-depth analysis. However, moving toward more biologically plausible learning rules is an interesting research direction that we plan to explore, as we also mentioned in the Discussion (“Limitations and future research directions”).
  
  We think a viable strategy could be to combine a slim spatial credit assignment strategy such as feedback alignment (Nøkland, 2016; Lillicrap et al., 2016) with an online learning rule using eligibility traces for temporal credit assignment such as SuperSpike (Zenke et al., 2018) or e-prop (Bellec et al., 2020). Similar strategies have given promising results for CLAPP (Illing et al., 2021; Zihan et al., 2026).
  
  What we will do: Following the suggestion, we will discuss biologically plausible learning rules for RNNs in the Discussion.
  
  Reviewer #2 (Public review):
  
  This is a very interesting manuscript, which proposes a novel idea on how cortical networks may learn useful representations of sensory stimuli. The model implementing this idea is thoroughly tested in multiple experimental paradigms. The manuscript is very clearly written. I feel it may have a significant impact on our understanding of cortical circuitry.
  
  Reviewer #3 (Public review):
  
  This paper presents Recurrent Predictive Learning (RPL), a self-supervised model conceptually similar to Joint-Embedding Predictive Architecture (JEPA) models. RPL sequentially observes dynamic scenes to predict subsequent observations. A central claim of the work is that the model’s trained representations are simultaneously invariant and equivariant to transformations, such as movement properties that emerge without explicit supervision. These representational qualities are demonstrated through three experiments utilizing two simulated datasets and one naturalistic dataset. Furthermore, the latent embeddings are qualitatively compared with neural data, showing that the model reproduces the successor representation observed in human V1 and the local/global oddball effect in the monkey Prefrontal Cortex.
  
  The paper addresses a fundamental question relevant to both computational neuroscience and machine vision: how the brain learns representations that are simultaneously invariant and equivariant to transformations. The manuscript is well-written, easy to follow, and supported by clear visualizations.
  
  While JEPA-style models have recently gained significant traction in the artificial intelligence community, this paper nicely bridges the gap to neuroscience. By framing these architectures as a theory for visual learning in the brain, the authors provide valuable insights into how predictive frameworks can explain cortical processing.
  
  The qualitative alignment with V1 and PFC data is a particularly strong contribution, as it offers a potential mechanistic explanation for observed neural phenomena through the lens of selfsupervised learning.
  
  (3.1) The central claim, that both invariance and equivariance emerge spontaneously, requires further scrutiny (see Ghaemi et al., NeurIPS, 2025; Garrido et al., arXive, 2024). In particular, the synthetic ”moving animal” dataset used in this paper may be too simple to fully support this claim. In latent space prediction, a model must predict both the scene content and the dynamics of movement. Because movement (whether ego-motion or external) is often highly uncertain (or multi-modal), predictive models in naturalistic settings often ”collapse” toward learning purely invariant representations, ignoring the hard-to-predict dynamics. In the provided simulations, the movements are extremely predictable. In more complex scenarios, the model would likely prioritize content (invariance) over dynamics (equivariance) unless aided by action-conditioning or explicit factor estimation (Zhang et al., ICLR, 2026). The authors’ results in Figure 5 using naturalistic video seem to reflect this limitation, given the lower performance on the naturalistic videos compared to the synthetic datasets.
  
  We thank the reviewer for the feedback. We agree that further validation on more complex datasets would strengthen the claims, and we take this point seriously. If the reviewer has any suggestions for a specific alternative dataset, we would welcome any recommendations.
  
  Regarding the mouse video data specifically, we realized that this is a suboptimal benchmark rather than a shortcoming of our method. The culprit presumably is that the mice remain largely stationary, leading to a heavily imbalanced velocity distribution peaked near zero (Supplementary Fig. S9). This imbalance makes equivariance evaluation unreliable regardless of the learning algorithm. For example, end-to-end supervised training results in an R<sup>2</sup> of 0.19 compared to 0.08 ± 0.02 for RPL.
  
  Regarding the moving animal dataset, we note that the dynamics are not trivial from an SSL perspective: unlike moving MNIST (Srivastava et al., 2015), the dataset includes changes in scale and orientation, both features that invariance-focused SSL models can easily ignore, yet RPL recovers reliably. For example, this discrepancy can be seen in Supplementary Table S1 where we compare to InfoNCE and CPC. That said, we acknowledge the reviewer’s broader concern and will seek to validate RPL on more complex datasets.
  
  While it would be nice to compare to related work by Ghaemi et al. (2024), this study used 3DIEBench (Garrido et al., 2023). Unfortunately, 3DIEBench’s reliance on pair-based representations with annotated but random augmentations (such as rotations or color changes) precludes the possibility of smooth latent traversals that would be required for RPL to learn from the same dataset. We will look into whether it is computationally feasible to adapt or regenerate a similar dataset that meets the requirements for temporal prediction.
  
  Regarding stochasticity, we agree that predictive learning in latent space is most natural in approximately deterministic settings, whereas real world sensory information often comprises non-deterministic elements. While a deeper treatment of such stochastic environments is beyond the scope of the present manuscript, it will be the focus of ongoing and future work. Regarding ongoing work, it is worth mentioning that in recent work from our group (Hauri et al., 2026), we have demonstrated that RPL’s core objective can replace the reconstruction loss in Dreamer, achieving competitive performance in complex, stochastic environments. While we did not systematically evaluate equivariance in this study, the results suggests that representation-space predictive learning is viable beyond the deterministic regime.
  
  What we will do: We will make the point about the real-world mouse video dataset being a poor benchmark and include the additional R<sup>2</sup> values to show that. Further, we will try to identify or generate alternative datasets to back the equivariance claims and discuss our findings in the light of previous work, e.g., Ghaemi et al. (2024). Moreover, we will sharpen our discussion of our model’s limitations in stochastic settings and highlight notable connections to related work.
  
  (3.2) The framing of the RPL model as an entirely new theory of representation learning is slightly overstated. The focus on prediction in representation space rather than input space is the defining characteristic of JEPA and various other Self-Supervised Learning (SSL) models, even sequential prediction. While this paper clarifies the connection between these AI frameworks and cortical circuits, the work would be strengthened by more explicitly positioning RPL within the context of existing JEPA-style models and prior SSL theories of the visual system.
  
  Thanks for raising this point. We are unsure what the reviewer refers to. We did not frame our work as ”an entirely new theory of representation learning,” as the reviewer suggests. In fact, we highlight quite the opposite already in the title of our article, which reads: “Understanding neural circuit principles for representation learning through joint-embedding predictive architectures.” We do not claim novelty over JEPA as an ML paradigm, we adopt it precisely because it provides a principled, non-generative framework for predictive representation learning, and our goal is to develop a circuit level instantiation that accounts for neural circuit computation. We already discuss a body of previous work of self-supervised learning and JEPAs at length. Since the reviewer did not specify what they are missing, we will briefly reiterate what is already there.
  
  Our contribution is a theory of representation learning in the brain, built on JEPAs as the underlying ML framework. The Title and Introduction already position our work quite explicitly this way. Specifically, we mention prior work on JEPAs (CPC, BYOL, SimSiam, I-JEPA, seq-JEPA, V-JEPA, V-JEPA 2), while noting that “most JEPAs developed in machine learning are poor models of cortical computation” because of their reliance on negative sampling, transformers, masking, static images, and/or known parametrized transformations, and motivate RPL as the minimal candidate that “must instead rely on recurrent neural dynamics, learn from streaming sensory input without masking, support both invariant and equivariant representations, and reproduce key neurophysiological observations.”
  
  The Discussion (“Relation to previous modeling work”) further details the specific novelties of RPL relative to existing sequential JEPA-style and SSL models like CPC (Oord et al., 2018), V-JEPA (Bardes et al., 2024), V-JEPA 2 (Assran et al., 2025), seq-JEPA (Ghaemi et al., 2024). In brief:
  
  RPL is a recurrent JEPA based on RNN dynamics, not transformers, and learns from streaming sensory input without masking or random negative sampling;
  
  It explicitly compares three prediction-error topologies (RPL vs. invariance learning vs. contextprediction; Fig. 2, Suppl. Fig. S2, S6) and shows that asymmetric recurrent prediction is essential for jointly learning invariant and equivariant representations;
  
  Importantly, it does so via pure temporal prediction without access to underlying transformations, a property shared by very few JEPAs. The closest exception is VJ-VCR (Drozdov et al., 2024) which uses an explicit variance-covariance regularization (VCReg) in a JEPA, which we will cite in the revised manuscript;
  
  It provides the first hierarchical JEPA optimizing local prediction errors at multiple levels (h-RPL, Fig. 8), as envisioned by LeCun (2022) but not previously implemented;
  
  It connects directly to neurophysiological data: successor-like representations in human V1 and abstract sequence representations in macaque PFC, which provides qualitative correspondence between JEPA components and cortical activity that the existing JEPA literature, focused on ML benchmarks, does not address.
  
  Finally, our article already includes a discussion paragraph on recent self-supervised learning models in the context of the brain where we discuss work by Nejad et al. (2025) and Asabuki et al. (2025). Most other SSL theories of the visual system rely on static images and recognition tasks (Yerxa et al., 2024; Margalit et al., 2024). However, there are two studies that include temporal prediction objectives and are worth mentioning with more details: First, Bakhtiari et al. (2021) show that representations similar to ventral and dorsal pathways in the visual system can emerge in a two-pathway encoder architecture within the CPC model. Second, Niu et al. (2024) use a “straightening” objective together with VCReg as a practical model of the perceptual straightening hypothesis (H´enaff et al., 2019). Though not a JEPA (i.e., has no predictor network), it can decode equivariant factors in a sequential MNIST dataset where only single factors change throughout a video.
  
  What we will do: We will carefully review our discussion of previous work and further discuss Drozdov et al. (2024), Bakhtiari et al. (2021), and Niu et al. (2024) in the revised manuscript.
  
  (3.3) A significant challenge in latent-space SSL is avoiding “representational collapse” (where the model provides a trivial constant output). While the paper alludes to JEPAlike solutions, it lacks a detailed explanation (in both the text and the architectural schematics) of the specific technique used to prevent collapse. Consequently, it is difficult to evaluate the authors’ claim of “biological plausibility,” as the biological equivalents of common machine learning techniques (such as stop gradient) are not discussed.
  
  Thanks for pointing this out. Our model avoids collapse through the asymmetric stop-grad / predictor architecture. It does not require an EMA, when the predictor learns with a faster learning rate than the rest of the network (see also our response to Point P1.3).
  
  The use of stop-grad suggests that a circuit learning with RPL needs to compute a vector-based instructive learning signal. While we do not explicitly model the circuit level mechanisms of how this could be implemented in the brain, excitation-inhibition balance is one possibility (Rossbroich et al., 2025). Finally, differences in learning rate can be implemented both structurally or functionally in the brain (see Liu et al. (2025) for instance), or activity normalization is suggested as a canonical computation in biological neural circuits (Carandini et al., 2012).
  
  What we will do: We will make sure to discuss these putative biological mechanisms in the revised manuscript.
  
  (3.4) Recent work has shown that the capacity (size) of the predictor significantly influences the learned representations in a JEPA-type world model (Gorrido et al., 2024). In simpler scenarios, a large enough predictor can allow a model to ”memorize” dynamics rather than learning generalized equivariant features. It would be beneficial to see how the ratio of predictor size to encoder size affects the emergence of these features.
  
  Thanks for raising this concern. We don’t observe noticeable difference in position and velocity decoding when changing the width or depth of the MLP predictor in the moving animals data. However, performance on rotation speed and orientation decoding scales with the changes in width, but not depth of the predictor. This analysis excludes the effect of integrator’s capacity as it directly affects the dimensionality of the representations, even though it also effectively contributes to prediction computation in RPL.
  
  What we will do: We will include a figure how how task performance varies with the predictor’s width and depth.
  
  Methodological Clarifications
  
  (3.5) The authors mention a contrastive learning comparison but provide few details. Since contrastive learning is primarily a technique to avoid collapse, it would be a more rigorous baseline if implemented within the same architecture as RPL to isolate the effect of the predictive objective.
  
  Thanks for the question. We already use the same network model as in RPL for the contrastive predictive learning (InfoNCE) baseline in Supplementary Table S1 and mentioned in the main text (l.164).
  
  What we will do: We will mention the architecture of the non-linear predictor used for InfoNCE baseline in Methods more explicitly.
  
  (3.6) In the PFC data comparison (Figure 7f), there appears to be a discrepancy where the local and global conditions show nearly identical results in PFC, while different dynamics in the model. It is unclear if this is a visualization error or a genuine model deviation.
  
  Thanks for picking up on this subtlety in the experimental results. To clarify, it is a model deviation but an interesting one. The local and global responses do look quite similar in the original PFC data. They differ in that the global oddball (xY|xx and xx|xY) response has a secondary peak that encodes the presence of the global oddball, whereas the initial response is actually dominated by local oddball encoding (xY vs xx). Concretely, this results in the response to the xx|xY condition only showing up weakly in the data and at a time lag with respect to the initial local oddball response. Our model, however, does not show the transient initial response to local oddballs in the decoding direction for global oddballs. In a sense, the network model encodes the global oddball concept more robustly than is seen in the PFC data. That said, whether this indicates a genuine difference in representational strategies that needs to be further accounted for, or whether it is an issue stemming from limited sub-sampling of PFC neurons, remains unclear.
  
  (3.7) The criteria for selecting specific model variables for comparison with V1 versus PFC are not explicitly defined. Clarification is needed on whether the same latent variables were used for both brain regions or if different layers were selected.
  
  To clarify, the successor-like representations in human V1 and abstract representations in macaque PFC are two different experiments, so each has different latent variables requiring different RPL models. The architecture used for each experiment is detailed in Methods and the criteria for selecting each architecture was the simplest that should work given the task complexity. Throughout the paper, all representation analysis is done on the output of integrator (c) unless said otherwise. We hope this resolves the confusion.
  
  References
  
  Chen, Xinlei et al. (2021). “Exploring simple siamese representation learning”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758.
  
  Srinath Halvagal, Manu et al. (2023). “Implicit variance regularization in non-contrastive SSL”. In: Advances in Neural Information Processing Systems 36, pp. 63409–63436.
  
  Mikulasch, Fabian A et al. (2026). Understanding Self-Supervised Learning via Latent Distribution Matching. arXiv: 2605.03517[cs.LG].
  
  Furutachi, Shohei, Alexis D. Franklin, et al. (Sept. 2024). “Cooperative thalamocortical circuit mechanism for sensory prediction errors”. en. In: Nature 633.8029. Publisher: Nature Publishing Group, pp. 398–406. issn: 1476-4687. doi: 10.1038/s41586-024-07851-w.
  
  Furutachi, Shohei and Sonja B Hofer (2026). “Rethinking Predictive Processing”. In: Annual Review of Neuroscience 49.
  
  Vasilevskaya, Anna et al. (2026). “A functional influence based circuit motif that constrains the set of plausible algorithms of cortical function”. In: bioRxiv. doi: 10.64898/2026.01.29.702557. eprint: https://www.biorxiv.org/content/early/2026/01/29/2026.01.29.702557.full. pdf.
  
  Nejad, Kevin Kermani et al. (July 2025). “Self-supervised predictive learning accounts for cortical layer-specificity”. en. In: Nat Commun 16.1, p. 6178. issn: 2041-1723. doi: 10.1038/s41467-025-61399-5.
  
  Ekman, Matthias et al. (Feb. 2023). “Successor-like representation guides the prediction of future events in human visual cortex and hippocampus”. In: eLife 12. Ed. by Morgan Barense et al., e78904. issn: 2050-084X. doi: 10.7554/eLife.78904.
  
  Dayan, Peter (1993). “Improving generalization for temporal difference learning: The successor representation”. In: Neural computation 5.4, pp. 613–624.
  
  Nøkland, Arild (2016). “Direct feedback alignment provides learning in deep neural networks”. In: Advances in neural information processing systems 29.
  
  Lillicrap, Timothy P et al. (2016). “Random synaptic feedback weights support error backpropagation for deep learning”. In: Nature communications 7.1, p. 13276.
  
  Zenke, Friedemann et al. (2018). “Superspike: Supervised learning in multilayer spiking neural networks”. In: Neural computation 30.6, pp. 1514–1541.
  
  Bellec, Guillaume et al. (2020). “A solution to the learning dilemma for recurrent networks of spiking neurons”. In: Nature communications 11.1, p. 3625.
  
  Illing, Bernd et al. (2021). “Local plasticity rules can learn deep representations using self-supervised contrastive predictions”. In: Advances in Neural Information Processing Systems 34.
  
  Zihan, Wu S et al. (2026). “Can Local Learning Match Self-Supervised Backpropagation?” In: arXiv preprint arXiv:2601.21683.
  
  Srivastava, Nitish et al. (2015). “Unsupervised learning of video representations using lstms”. In: International conference on machine learning. PMLR, pp. 843–852.
  
  Ghaemi, Hafez et al. (2024). “Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models”. In: NeurIPS 2024 Workshop: Self-Supervised Learning - Theory and Practice.
  
  Garrido, Quentin et al. (2023). “Self-supervised learning of split invariant equivariant representations”. In: arXiv preprint arXiv:2302.10283.
  
  Hauri, Michael et al. (2026). “Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction”. In: arXiv preprint arXiv:2603.07083.
  
  Oord, Aaron van den et al. (July 2018). “Representation Learning with Contrastive Predictive Coding”. In: arXiv:1807.03748 [cs, stat]. arXiv: 1807.03748.
  
  Bardes, Adrien et al. (2024). V-JEPA: Latent Video Prediction for Visual Representation Learning.
  
  Assran, Mido et al. (2025). “V-jepa 2: Self-supervised video models enable understanding, prediction and planning”. In: arXiv preprint arXiv:2506.09985.
  
  Drozdov, Katrina et al. (2024). “Video representation learning with joint-embedding predictive architectures”. In: arXiv preprint arXiv:2412.10925.
  
  LeCun, Yann (2022). “A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-0627”. en. In.
  
  Asabuki, Toshitake et al. (2025). “Learning predictive signals within a local recurrent circuit”. In: Proceedings of the National Academy of Sciences 122.27, e2414674122. doi: 10.1073/pnas. 2414674122. eprint: https://www.pnas.org/doi/pdf/10.1073/pnas.2414674122.
  
  Yerxa, Thomas et al. (2024). “Contrastive-equivariant self-supervised learning improves alignment with primate visual area it”. In: Advances in neural information processing systems 37, pp. 96045–96070.
  
  Margalit, Eshed et al. (2024). “A unifying framework for functional organization in early and higher ventral visual cortex”. In: Neuron 112.14, pp. 2435–2451.
  
  Bakhtiari, Shahab et al. (2021). “The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning”. In: Advances in Neural Information Processing Systems. Ed. by M. Ranzato et al. Vol. 34. Curran Associates, Inc., pp. 25164–25178.
  
  Niu, Julie Xueyan et al. (2024). “Learning predictable and robust neural representations by straightening image sequences”. In: Advances in Neural Information Processing Systems 37, pp. 40316– 40335.
  
  H´enaff, Olivier J et al. (2019). “Perceptual straightening of natural videos”. In: Nature neuroscience 22.6, pp. 984–991.
  
  Rossbroich, Julian et al. (2025). “Breaking Balance: Encoding local error signals in perturbations of excitation-inhibition balance”. In: bioRxiv, pp. 2025–05.
  
  Liu, Peng et al. (2025). “Layer-specific changes in sensory cortex across the lifespan in mice and humans”. In: Nature neuroscience 28.9, pp. 1978–1989.
  
  Carandini, Matteo et al. (2012). “Normalization as a canonical neural computation”. In: Nature reviews neuroscience 13.1, pp. 51–62.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.11.25.690220v2
www.biorxiv.org www.biorxiv.org

Conservation Blind Spot: The Critical Role of Larval Stage in Assessing Extinction Risk

1
1. Public_Reviews 15 May 2026
  
  in eLife
  
  Reviewer #1 (Public review):
  
  The manuscript shows that different traits of adults and larvae correlate with Red List status. The authors argue that this shows a big gap in the conservation of amphibians and that the traits of all life stages should be taken into account in amphibian conservation. Specifically, amphibian conservation should do more for the habitats where the larvae live.
  
  The manuscript is well written and easy to understand. The methods are sound.
  
  While the study will make an interesting contribution to conservation science, there are many things that I disagree with.
  
  I don't think that amphibian larvae and their requirements are a "blind spot" as the title suggests. When reading the manuscript, I didn't learn how conservation practice should change in response to the results.
  
  I wonder whether the relationship between species traits and extinction risk is of great importance for conservation. If a species is Data Deficient on the IUCN Red List, then species traits could be used to predict its Red List category. However, for other conservation projects, I don't see how this would work. How would traits be linked to captive breeding, conservation translocation, pond construction or habitat management in general? In some cases, I can envision a link between species traits and pond hydroperiod.
  
  Species traits are body size and morphological traits. That makes sense. However, one of the species traits was microhabitat. I find it far-fetched to call habitat a species trait. This is standard habitat ecology. It is well known that habitats matter and that different habitat types face different threats, and consequently, the species that live in those habitats. Furthermore, habitat and morphology may be confounded. For example, tadpoles in lentic and lotic habitats have very different morphologies. So is it habitat or morphology?
  
  I don't know how the threat status of Chinese amphibians is determined. IUCN has multiple reasons why a species can be Red Listed. One reason is range size, and another reason is population decline. Personally, I don't think they should be pooled in an analysis because they are fundamentally different reasons why a species has a high extinction risk. A reduction in population size of greater than 30% in 10 years or 3 generations is not the same thing as a small distribution range. Another issue is that IUCN developed the Green Status of species. The Green Status shows that even a species which is LC on the Red List may be significantly depleted.
  
  The species traits in Table 1 are mostly functional/morphological and body size related (and microhabitat). While there may be correlations between traits and Red List status, it is unknown whether this is correlation or causation. In addition, it is difficult to know the conservation interventions that may be necessary now that we know that relative head with and Red List status are correlated.
  
  In the discussion, the authors explain why body size and other traits may affect extinction risk and whether there is a causal relationship. I agree that body size may have a direct effect because larger species are harvested more frequently (it was interesting to learn that tadpoles are harvested as well). However, as macroecological studies show, smaller species often have larger populations than larger species. Abundance may matter.
  
  I found it much harder to understand why relative head length and tympanum size correlated with Red List status. I wasn't convinced by the arguments in the discussion. Typanum size may be related to hearing and anthropogenic noise. Several studies are cited which show that frogs alter their calling behaviour in response to noise. Crucially, however, they describe changes in behaviour or properties of the advertisement call, yet none show that noise has effects on population viability. If some anthropogenic stressor affects individuals, then this does not mean that it will cause a population decline. When IUCN published the second global amphibian assessment, did they list noise as a major threat to amphibians?
  
  There are statements that the tadpole stage is the most important stage: "a critical period for amphibian survival" (line 78-79). While there is high mortality in the tadpole stage, tadpole survival is rather unlikely to affect population survival. Many population models show this. See, for example, Biek et al. 2002 in Conservation Biology. Other papers have argued that the postmetamorphic juvenile stage is most important (Petrovan and Schmidt 2009 Biological Conservation).
  
  The authors repeatedly make the statement that amphibian conservation should focus more on the tadpole stage. I don't understand why this statement is made. For example, a major activity in amphibian conservation is the restoration and de novo construction of ponds (see Calhoun et al. 2014 PNAS, Moor et al. 2022 PNAS). Ponds are habitats for tadpoles. Others removed fish from amphibian breeding sites because fish prey on tadpoles (and adults; see Vredenburg 2004 PNAS). Semlitsch (2002 in Conservation Biology) argued that the management of pond hydroperiod is a critical element of amphibian recovery plans. Ponds should be temporary because this effectively removes predators that consume tadpoles. Clearly, the tadpole stage is not a neglected stage in amphibian conservation.
  
  Review 1
Visit annotations in context

Tags

Review 1

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.03.17.712346v2
www.biorxiv.org www.biorxiv.org

Brawn before bite in endemic Asian mammals after the end-Cretaceous extinction

1
1. Public_Reviews 14 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  eLife Assessment
  
  This important study fills a major geographic and temporal gap in understanding Paleocene mammal evolution in Asia and proposes an intriguing "brawn before bite" hypothesis grounded in diverse analytical approaches. However, the findings are incomplete because limitations in sampling design - such as the use of worn or damaged teeth, the pooling of different tooth positions, and the lack of independence among teeth from the same individuals - introduce uncertainties that weaken support for the reported disparity patterns. The taxonomic focus on predominantly herbivorous clades also narrows the ecological scope of the results. Clarifying methodological choices, expanding the ecological context, and tempering evolutionary interpretations would substantially strengthen the study.
  
  We have now thoroughly revised our manuscript in response to the editor and reviewer’s comments. In particular with regard to:
  
  (1) Sampling design: we clarified our methods section to indicate that we did not use worn or broken teeth in our initial analyses. We added the following sentence around line 690:
  
  “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”
  
  (2) Pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).
  
  For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.
  
  For tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.
  
  For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S9-11, only Fig. S10 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.
  
  (3) Ecological scope of the study: although carnivorans and mesonychids are recorded from some of the time intervals examined in this study, our sampling choice of pantodonts and anagalids reflects the high abundance of available dental specimens in those clades, permitting us to make the strongest statistical inference given the incomplete fossil record. Additionally, all sampled taxa come from archaic clades that have not been determined to be specifically herbivorous; we included an additional paragraph in the introduction to explain this:
  
  “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the generally stratigraphically limited nature of early Cenozoic sequences. In Asia, Paleocene localities in China represent the best studied to date[11]. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene[11]. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction[1]. Herein we treat the archaic Paleocene taxa in our analyses as having generalized diets rather than categorizing them as insectivores, herbivores, or carnivores.”
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This work provides valuable new insights into the Paleocene Asian mammal recovery and diversification dynamics during the first ten million years post-dinosaur extinction. Studies that have examined the mammalian recovery and diversification post-dinosaur extinction have primarily focused on the North American mammal fossil record, and it's unclear if patterns documented in North America are characteristic of global patterns. This study examines dietary metrics of Paleocene Asian mammals and found that there is a body size disparity increase before dietary niche expansion and that dietary metrics track climatic and paleobotanical trends of Asia during the first 10 million years after the dinosaur extinction.
  
  Strengths:
  
  The Asian Paleocene mammal fossil record is greatly understudied, and this work begins to fill important gaps. In particular, the use of interdisciplinary data (i.e., climatic and paleobotanical) is really interesting in conjunction with observed dietary metric trends.
  
  Weaknesses:
  
  While this work has the potential to be exciting and contribute greatly to our understanding of mammalian evolution during the first 10 million years post-dinosaur extinction, the major weakness is in the dental topographic analysis (DTA) dataset.
  
  There are several specimens in Figure 1 that have broken cusps, deep wear facets, and general abrasion. Thus, any values generated from DTA are not accurate and cannot be used to support their claims. Furthermore, the authors analyze all tooth positions at once, which makes this study seem comprehensive (200 individual teeth), but it's unclear what sort of noise this introduces to the study. Typically, DTA studies will analyze a singular tooth position (e.g., Pampush et al. 2018 Biol. J. Linn. Soc.), allowing for more meaningful comparisons and an understanding of what value differences mean. Even so, the dataset consists of only 48 specimens. This means that even if all the specimens were pristinely preserved and generated DTA values could be trusted, it's still only 48 specimens (representing 4 different clades) to capture patterns across 10 million years. For example, the authors note that their results show an increase in OPCR and DNE values from the middle to the late Paleocene in pantodonts. However, if a singular tooth position is analyzed, such as the lower second molar, the middle and late Paleocene partitions are only represented by a singular specimen each. With a sample size this small, it's unlikely that the authors are capturing real trends, which makes the claims of this study highly questionable.
  
  With regard to sampling design: we clarified our methods section to indicate that we did not use worn or broken teeth in our initial analyses. We added the following sentence around line 690:
  
  “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”
  
  With regard to pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).
  
  For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.
  
  For the tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.
  
  For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S8-10, only Fig. S9 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study uses dental traits of a large sample of Chinese mammals to track evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis - mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.
  
  Strengths:
  
  This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper, and I think the results will be of interest to a broad audience.
  
  Weaknesses:
  
  I have four major concerns with the study, especially related to the sampling of teeth and taxa, that I discuss in more detail below. Due to these issues, I believe that the study is incomplete in its support of the 'brawn before bite' hypothesis. Although my concerns are significant, many of them can be addressed with some simple updates/revisions to analyses or text, and I try to provide constructive advice throughout my review.
  
  (1) If I understand correctly, teeth of different tooth positions (e.g., premolars and molars), and those from the same specimen, are lumped into the same analyses. And unless I missed it, no justification is given for these methodological choices (besides testing for differences in proportions of tooth positions per time bin; L902). I think this creates some major statistical concerns. For example, DTA values for premolars and molars aren't directly comparable (I don't think?) because they have different functions (e.g., greater grinding function for molars). My recommendation is to perform different disparity-through-time analyses for each tooth position, assuming the sample sizes are big enough per time bin. Or, if the authors maintain their current methods/results, they should provide justification in the main text for that choice.
  
  With regard to pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).
  
  For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.
  
  For the tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.
  
  For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S8-10, only Fig. S9 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.
  
  Also, I think lumping teeth from the same specimen into your analyses creates a major statistical concern because the observations aren't independent. In other words, the teeth of the same individual should have relatively similar DTA values, which can greatly bias your results. This is essentially the same issue as phylogenetic non-independence, but taken to a much greater extreme.
  
  It seems like it'd be much more appropriate to perform specimen-level analyses (e.g., Wilson 2013) or species-level analyses (e.g., Grossnickle & Newham 2016) and report those results in the main text. If the authors believe that their methods are justified, then they should explain this in the text.
  
  Based on the per-tooth partition analyses we performed and reported above, the results now show that the overall trends described in the previous draft of the study is a composite of signals from different regions of the dentition. For example, the OPCR, DNE, and FEA trends persist across most tooth positions, whereas the Slope and RFI trends are mainly driven by lower fourth premolar patterns. The tooth size results are also mainly driven by lower fourth premolar patterns, but tooth disparity trends are broadly supported across tooth positions. These observations indicate that the overall trends remain valid, but there are nuances as to which tooth positions are driving which components of the trends. As such, we deem the overall results to be valid, and focused our revision on providing the nuances so readers can assess through-time patterns in more detail than in the previous version of the study.
  
  (2) Maybe I misunderstood, but it sounds like the sampling is almost exclusively clades that are primarily herbivorous/omnivorous (Pantodonta, Arctostylopida, Anagalida, and maybe Tillodonta), which means that the full ecomorphological diversity of the time bins is not being sampled (e.g., insectivores aren't fully sampled). Similarly, the authors say that they "focused sampling" on those major clades and "Additional data were collected on other clades ... opportunistically" (L628). If they favored sampling of specific clades, then doesn't that also bias their results?
  
  If the study is primarily focused on a few herbivorous clades, then the Introduction should be reframed to reflect this. You could explain that you're specifically tracking herbivore patterns after the K-Pg.
  
  We appreciate the reviewer’s suggestion that our sampling may have focused on putative herbivorous clades more than others. However, at the early stage of placental evolution during the Paleocene, and in particular among the endemic forms we studied from south China, it is unclear to us that such clearcut ecomorphological categories were present amongst the fossil mammals. Thus, we take a more agnostic approach and do not define the dietary categories of the sample taxa (and by extension, those of the unsampled taxa). Although we recognize that representatives of certain clades, such as Carnivora, may be more reasonably interpreted as carnivores/insectivores/omnivores and, in the current context, remains unsampled, we point out the fact that including tooth samples from rare taxa such as carnivores likely would have biased the analyses temporally. Chinese Paleocene carnivores are known only from one of the three time intervals analyzed (representing only a handful of specimens), and so would potentially inflate the disparity in that time interval relative to the others (if dentitions specialized for carnivory is assumed to be present in the Paleocene). To clarify this point, we added a paragraph in the introduction:
  
  “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the generally stratigraphically limited nature of early Cenozoic sequences. In Asia, Paleocene localities in China represent the best studied to date[11]. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene[11]. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction[1]. Herein we treat the archaic Paleocene taxa in our analyses as having generalized diets rather than categorizing them as insectivores, herbivores, or carnivores.”
  
  (3) There are a lot of topics lacking background information, which makes the paper challenging to read for non-experts. Maybe the authors are hindered by a short word limit. But if they can expand their main text, then I strongly recommend the following:
  
  a) The authors should discuss diets. Much of the data are diet correlates (DTA values), but diets are almost never mentioned, except in the Methods. For example, the authors say: "An overall shift towards increased dental topographic trait magnitudes ..." (L137). Does that mean there was a shift toward increased herbivory? If so, why not mention the dietary shift? And if most of the sampled taxa are herbivores (see above comment), then shouldn't herbivory be a focal point of the paper?
  
  We edited the introduction to say that “We used dental topographical traits as indicators of ecomorphological diversity[28] and examined temporal shifts in tooth crown complexity, curvature, and height and their association with tooth performance in terms of deformation resistance using topographic and simulation analyses.” And also added the following to the methods section, in order to clarify that we are using DTA as a general ecomorphological proxy, and not a direct dietary proxy.
  
  “Overall, we use these DTA traits as indicators of ecomorphological capacity, but do not link them explicitly to dietary categories. The craniodental morphology of archaic placental clades in general have not been demonstrated to share the same structure-function linkages as crown mammals, so the aforementioned linkages between DTA and dietary ecology in extant species only serve as evidence that DTA is a potentially useful ecomorphological proxy, without the application of those DTA-diet relationships to the Paleocene fossil mammal dataset.”
  
  b) The authors should expand on "we used dentitions as ecological indicators" (L75). For non-experts, how/why are dentitions linked to ecology? And, again, why not mention diet? A strong link between tooth shape and diet is a critical assumption here (and one I'm sure that all mammalogists agree with), but the authors don't provide justification (at least in the Introduction) for that assumption. Many relevant papers cited later in the Methods could be cited in the Introduction (e.g., Evans et al. 2007).
  
  We added the following sentence to clarify our usage of tooth crowns as ecomorphological proxies: “Teeth are among the most well-preserved parts of fossil mammals, and the fact that they interface directly with the environment through mastication makes them suitable elements for studying potential ecology-morphology linkages.”
  
  c) Include a better introduction of the sample, such as explicitly stating that your sample only includes placentals (assuming that's the case) and is focused on three major clades. Are non-placentals like multituberculates or stem placentals/eutherians found at Chinese Paleocene fossil localities and not sampled in the study, or are they absent in the sampled area?
  
  We modified the following sentence to indicate our sampling focus on placentals: “Our analyses focused on placental mammals from three of the most fossiliferous and biogeographically isolated Paleocene sedimentary sequences in paleotropical Asia: The Nanxiong, Qianshan, and Chijiang Basins in present-day south China 23–27 (Fig. S1)”
  
  d) The way in which "integration" is being used should be defined. That is a loaded term which has been defined in different ways. I also recommend providing more explanation on the integration analyses and what the results mean.
  
  If the authors don't have space to expand the main text, then they should at least expand on the topics in the supplement, with appropriate citations to the supplement in the main text.
  
  We replaced all mentions of “integration” with “covariation” to avoid using the loaded terminology. Covariation more accurately reflects the correlation between two sets of traits (DTA vs FEA) without invoking developmental mechanisms implied by modularity/integration.
  
  (4) Finally, I'm not convinced that the results fully support the 'brawn before bite' hypothesis. I like the hypothesis. However, the 'brawn before ...' part of the hypothesis assumes that body size disparity (L63) increased first, and I don't think that pattern is ever shown. First, body size disparity is never reported or plotted (at least that I could find) - the authors just show the violin plots of the body sizes (Figures 1B, S6A). Second, the authors don't show evidence of an actual increase in body size disparity. Instead, they seem to assume that there was a rapid diversification in the earliest Paleocene, and thus the early Paleocene bin has already "reached maximum saturation" (L148). But what if the body size disparity in the latest Cretaceous was the same as that in the Paleocene? (Although that's unlikely, note that papers like Clauset & Redner 2009 and Grossnickle & Newham 2016 found evidence of greater body size disparity in the latest Cretaceous than is commonly recognized.) Similarly, what if body size disparity increased rapidly in the Eocene? Wouldn't that suggest a 'BITE before brawn' hypothesis? So, without showing when an increase in body size diversity occurred, I don't think that the authors can make a strong argument for 'brawn before [insert any trait]".
  
  Although it's probably well beyond the scope of the study to add Cretaceous or Eocene data, the authors could at least review literature on body size patterns during those times to provide greater evidence for an earliest Paleocene increase in size disparity.
  
  We added a sentence in the discussion of body size during the Paleocene to note that the largest late Cretaceous fossil mammals in China are shrew- to gopher-sized, whereas the largest early Paleocene Chinese Endemic Pantodonts are dog-sized:
  
  “Dog-sized CEPs such as Bemalambda reached sizes not seen in late Cretaceous mammals from China such as Zhangolestes and Kryptobaatar, which are shrew- to gopher-sized [Meng 2014]”
  
  Reference: Meng, J. (2014). Mesozoic mammals of China: implications for phylogeny and early evolution of mammals. Natl. Sci. Rev. 1, 521–542. 10.1093/nsr/nwu070.
  
  Furthermore, we tempered our discussion to restrict the “brawn before bite” hypothesis to post K-Pg recovery in the Paleocene. Body size patterns shifted in the Eocene as crown clades replaced the archaic endemic clades analyzed in our study, and much larger taxa began to appear after the PETM. Such body size shift patterns are based on different clades and likely different dynamics compared to the 10-million year interval examined in our study, so we refrain from commenting on post-Paleocene times.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) In regard to the DTA dataset: Was there a method used to 'fix' these teeth before dental topographic analyses were implemented? If so, this should be explicitly stated. If not, the authors should explain why broken, worn, or abraded teeth were used.
  
  We excluded the incomplete teeth from our analyses. We added the following sentence for clarification: “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”
  
  (2) The authors should explicitly explain why all tooth positions were analyzed together. Again, this is not something that is typically done, and some explanation would be helpful for readers.
  
  We added a paragraph in the methods section to explain both our pooled sampling approach, as well as the per-tooth analyses added in this revised manuscript:
  
  “Given the rarity of Paleocene fossil material from China, we combined data from different tooth positions into three pooled samples, one for each of the time intervals examined (early, middle, late Paleocene). We treated the pooled samples as representative of the range of dental topographic features and bite performance traits available to the mammal taxa under study. In this way, the variance estimates are interpreted as measures of the morphological and performance heterogeneity present in each time interval dataset. To further tease out the possibility of specific tooth positions driving the overall trends observed in the pooled samples, we also performed the DTA, FEA, DTA-FEA correlation, and tooth size through-time analyses using per-tooth data partitions.”
  
  (3) I think the authors should hedge their claims a bit more and recognize the limitations of their study (e.g., sample size and tooth preservation).
  
  We thank the reviewer for raising this important point. We carefully read through the main text and further tempered our interpretations based on the limitations of our data. Additionally, we added a paragraph in the supplemental text to summarize the major sources of uncertainty in the sample:
  
  “Sample and methodological limitations
  
  The highly fragmentary nature of early Cenozoic mammal fossils in Asia means that even the best preserved faunas studied herein contain much missing information. First, the absence of a high-resolution chronological framework prevents the fossil data from being analyzed on a continuous time axis; the binning of the samples into three main intervals within a 10-million-year period hinders additional hypotheses about the environmental and climatic correlations of the dental structure-performance results presented. Second, the uneven sampling of the available mammalian assemblage throughout the Paleocene sites in China limits the breadth of ecomorphological categories included in the analyses; rarer taxa representing more specialized carnivore, insectivore, or herbivore forms were not included in our sampling. Third, the spatial discontinuity of stratigraphically younger (Eocene) and older (Cretaceous) mammal assemblages means that body size and ecomorphological shifts bracketing the Paleocene cannot currently be analyzed alongside the dataset presented. These limitations should be taken into account when considering the interpretations made in the main text.”
  
  Reviewer #2 (Recommendations for the authors):
  
  I'm including my Line Comments here as recommendations for the authors. But note that many of my recommendations are also in my Public Review.
  
  L22: "3% of sites"? Do you mean 3% of global sites?
  
  Yes, we revised the sentence to indicate 3% of global sites. Thank you for this suggestion.
  
  L35: This is nitpicky because it's not crucial to your study, but I can't help but point out that the Long Fuse, etc, hypotheses are specifically about the DIVERGENCE TIMES for Placentalia and major subclades, NOT the 'adaptive radiation' of placentals like you imply in your text. Adaptive radiations include ecomorphological diversification and are driven by ecological opportunity (e.g., Schluter 2000). (Emphasis on 'ecological.') The long fuse, short fuse, and explosive models do not include an ecological component - i.e., the diversifications could have occurred without ecological diversification. Instead, for hypotheses that are specifically on the adaptive/ecological radiation of mammals, see the Early Rise, Suppression (or Dinosaur Incumbency; Benevento et al. 2023 Palaeontology), and Late Rise hypotheses (Grossnickle et al. 2019 TREE). These hypotheses apply broadly to all mammals, not just placentals (see Box 1's figure in Grossnickle et al. 2019), but they can still be applied to mammalian subclades like eutherians/placentals (e.g., see Thomas Halliday papers).
  
  Thank you for helping to clarify the adaptive radiation vs. divergence time concepts. We edited this sentence to mention the adaptive radiation hypotheses instead, adding in the references provided by the reviewer.
  
  L39-40: I think your comment is probably accurate. But keep in mind that advocates of the Early Rise and Delayed Rise hypotheses (see citations within Grossnickle et al. 2019) might argue that other time periods, other than the Paleocene, are equally or more important.
  
  We added a reference to Grossnickle et al. 2019 to bring attention to potential arguments otherwise. Thank you for the suggestion.
  
  L48: I think the inclusion of "at higher latitudes" is a little distracting or misleading and should be erased. It implies that the taxonomic diversification was ONLY rapid at higher latitudes. But many of the references that you cite include analyses at the global or continental scale (e.g., Alroy 1999, Grossnickle & Newham 2016) and don't distinguish patterns at different latitudes. If you want to keep the point about latitudes, then I recommend inserting a separate sentence on that point.
  
  We removed “at higher latitudes”.
  
  L50: Isn't "stem lineages and those with no living relatives" somewhat redundant? Or do you mean something like "stem placental/eutherian lineages and extinct placental subgroups"?
  
  Yes, we adopted the suggested phrasing. Thank you.
  
  L53: I recommend starting a new paragraph around here (maybe starting with "Distinct from ...") that focuses specifically on introducing the 'brawn before [ecomorphological trait]' hypothesis.
  
  Done.
  
  L56: "large herbivores and their predators"? Are you just referring to mammals? Wilson (2013), which you cite, and Grossnickle & Newham (2016) argued that dietary specialists were targeted at the K-Pg, but none of the herbivores were "large" (at least relative to Cenozoic herbivores). And most faunivorous mammals at the time were probably insectivorous and not preying on herbivorous mammals, besides maybe a few outlying taxa (e.g., Altacreodus, Nanocuris). I'd revise your sentence for clarity.
  
  We removed “disproportionately impacting large herbivores and their predators” for clarity.
  
  L63: I'd replace "ecometric" with "ecomorphological". Ecometrics commonly refers to using fossil traits to infer paleo environments/climate (e.g., see papers by David Polly, Michelle Lawing, etc), which I don't think is what you're referring to here. (E.g., I don't think that brain size or jaw shape patterns were/are used to infer paleo environments.)
  
  Revised. Thank you.
  
  L85: I strongly advise against making conclusions like this: "Dental height and sharpness variability ... [spiked] in the middle Paleocene corresponding to a short-lived negative excursion in global temperature." That implies that the change in dentitions is linked to global temperature changes, which I don't think your results support. Later in the text you highlight the temporal uncertainty of your time bin ages (L650) and say that the middle Paleocene bin could be as old as ~62 Ma (L646), which is well before the negative excursion (and looks to be more in line with a positive excursion!), at least according to the Figure 1 time scale (see comment below). So, I don't think that your results even support your statement.
  
  We reworded this sentence to say “Dental height and sharpness variability were low in the beginning and end of the time interval, with a peak in the middle Paleocene. This pattern is observed both when dentitions are considered holistically and by tooth position in the lower dentition (Fig. S5; upper teeth display the opposite pattern).”
  
  L144: Using variance for disparity seems fine. But keep in mind that other disparity metrics, such as range (or sum-of-ranges for multivariate data), might produce different results. For instance, variance of RFI and Slope spike in the middle Paleocene, like you point out, but based on the values in Figure 1A, it looks like the ranges stay relatively constant through the Paleocene (although I realize that the ranges might change with bootstrapping). So, your choice of disparity metric might have a big influence on your conclusions. Alternatively, you could calculate disparity using multiple metrics (e.g., Brusatte et al. 2012 Nature Communications; Grossnickle & Newham 2016 supplemental analyses), even if it's just for supplemental analyses.
  
  Thank you for bringing the choice of disparity measures to our attention. We conducted a parallel set of bootstrapped disparity calculation and comparison analyses using range lengths (maximum trait value – minimum trait value for a given trait) and summarized the through-time trends as for variance-based results (Fig. S5). Overall, very similar trends are observed, providing support for the variance-based data interpretation presented in the main text. We added explanation of this additional sensitivity testing both in the main text and in the supplemental text.
  
  L147: "body size disparity ... (Fig. 1B, S6A, Table 1, Data S5)." But I don't see disparity calculated or plotted in any of the figures/tables that you cite. You test for differences in disparity between time bins (Table 1), but that doesn't provide the actual disparity patterns.
  
  We generated a new figure (Fig. S8) to show the tooth size variance and range levels across time and data partitions, and modified this sentence to say that “Over the same time interval examined, body size disparity and mean were higher in the early Paleocene than in subsequent time intervals (Fig. S8, Table S3; also supported by premolar 4 and upper molar partition analyses), indicating that substantial increases in the disparity of dental complexity, curvature, and height lagged behind maximum size disparity tooth size during the Paleocene.”
  
  L151-153: Maybe. But you're basing this on a much narrower temporal range (Paleocene) than the brain and jaw studies, and I think those studies observed big increases in brain/jaw disparity in the Eocene, which you don't sample. And as I explained elsewhere, I'm not convinced that your results strongly support the same pattern. At a minimum, I recommend tempering your conclusions to better reflect the uncertainty of your results.
  
  We tempered our statements here to say that “This suggests a ‘brawn before bite’ pattern in endemic Asian mammals, partially mirroring the endocranial and jaw functional morphology patterns identified in their North American and European counterparts [21,22]. These findings raise the possibility that an initial size-driven post-K-Pg recovery followed by ecomorphological radiation was a global phenomenon, even as regional tectonic events such as the initial collision of the Indian subcontinent with Asia and Deccan Traps volcanism influenced local mammal evolution.”
  
  L170: I'm not well-versed in integration (and modularity) studies, so maybe this reflects my ignorance, but I had trouble understanding sentences like this: "These findings indicate that form-function malleability, the coexistence of distinct topography-performance relationships in each time and taxon partition while overall integration between the two trait groups increases between time bins, was present throughout the Paleocene." If there is space, I recommend revising and/or breaking apart long, jargon-y sentences like that (throughout the paper) so that they're more digestible for readers.
  
  We simplified complex sentences such as the one the reviewer noted, in order to communicate our findings and interpretations more clearly. Thank you for the suggestion.
  
  L183: It's probably fine to assume most placental orders arose in the Paleocene based on fossil evidence. But keep in mind that molecular studies often argue that many orders arose in the Late Cretaceous.
  
  We revised the statement to indicate a “Cretaceous/Paleocene” origin of many modern mammal orders.
  
  L200-207: Again, this might just reflect my ignorance concerning integration analyses, but I recommend expanding on this text to better explain how your integration results support this conclusion. It seems really interesting, and I like the Garden of Eden hypothesis. It's just not immediately clear to me how your results support that hypothesis. A little more background on how to interpret the integration results would be helpful.
  
  We expanded the discussion here to say that “Such flexibility in dental form-function linkage permits ‘mix and match’ trait combinations rather than evolutionary change as a single unit, potentially enhancing the evolvability of feeding ecological traits as new environmental conditions arose [Goswami et al. 2015]”
  
  Reference: Goswami, A., Binder, W.J., Meachen, J., and O’Keefe, F.R. (2015). The fossil record of phenotypic integration and modularity: A deep-time perspective on developmental and evolutionary dynamics. Proc. Natl. Acad. Sci. 112, 4891–4896. 10.1073/pnas.1403667112.
  
  L218: "reached maximum tooth size disparity early". Again, I don't see size disparity plotted or reported. And without baseline comparisons (Late K or Eocene), it's hard to interpret your results and evaluate what 'maximum' means (Figure 1B).
  
  We revised the sentence to now say “In response, Paleocene mammal clades in south China between dental topography and bite performance later, all the while maintaining high levels of variability in dental complexity and convexity (Fig. 1).”
  
  Figure 1A: The time scale in the top left of the figure looks off. Shouldn't the K-Pg be at 66 Ma (not 65 Ma) and the P-E boundary at 56 Ma (not ~54 or 55)?
  
  We revised Fig. 1 to fix the time scale so that K-Pg is at 65.5 Ma and the P-E boundary at 56 Ma. Thank you for catching this.
  
  Figure 1A: Is there a different y-axis scale for the variance (red line) results?
  
  Yes, the y axes for the variance curves were missing. We added them back in. Thank you.
  
  L628-629: As I explained above, it feels like you focused your sampling just on herbivorous/omnivorous groups, and, if true, this is an important point that should be discussed at the forefront of the paper. Does your sample truly represent the total ecological diversity of the mammalian faunas at the time?
  
  We agree with the reviewer about the potential partial sampling of the range of ecomorphological diversity when only the most abundant clades are included in the analyses. However, we refrain from interpreting the dietary groupings represented in the dataset using an assumption of functional morphology from crown/extant clades. We added a paragraph in the introduction to bring attention to the inherent uncertainty in the ecological diversity of the dataset:
  
  “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the stratigraphically limited nature of early Cenozoic sequences that produce fossil mammals. In Asia, Paleocene localities in China represent the best studied to date 11. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic placental clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene 11. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction 1. Herein we treat the archaic Paleocene taxa in our analyses as having uncharacterized diets rather than categorizing them as insectivores, herbivores, or carnivores. “
  
  L653: Sorry if this is mentioned elsewhere, but did you avoid using teeth with especially worn or broken cusps? You might expand on how you chose teeth for your sample.
  
  We left out this detail in the original submission. Thank you for pointing this out. We had to exclude a third of the teeth because they were too worn or broken. We added the following explanation to the methods section:
  
  “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”
  
  L654: "specimens" should be "teeth", correct? In the preceding sentence, you say that there are 200 teeth from only 48 specimens.
  
  Corrected.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.09.24.678280v2
www.biorxiv.org www.biorxiv.org

Uncoupling the TFIIH Core and Kinase Modules Leads To Misregulated RNA Polymerase II CTD Serine 5 Phosphorylation

1
1. Public_Reviews 14 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  eLife Assessment
  
  This important work demonstrates the role of physically linking the core and CTD kinase modules of TFIIH via separate domains of subunit Tfb3 in confining RNA Polymerase II Serine 5 CTD phosphorylation to promoter regions of transcribed genes in budding yeast. The main findings, resulting from analyses of viable Tfb3 mutants in which the linkage between TFIIH core and kinase modules has been severed, are supported by solid evidence from in vitro and in vivo experiments. The new findings raise the intriguing possibility that the Tfb3-mediated connection between core and kinase modules of TFIIH is an evolutionary addition to an ancestral state of physically unconnected enzymes.
  
  After consultation with the referees, we would like to suggest that you insert text into the RESULTS section acknowledging two limitations of your findings remaining in the revised manuscript, as follows:
  
  (i) It remains possible that Kin28 abundance was reduced by splitting Tfb3, which could be a factor in reducing its occupancies at gene promoters.
  
  In response, the paper now contains the following sentence:
  
  “Kin28 levels in extracts were below the limit of detection for our antibody, so we cannot rule out that the drop in ChIP signal is partly due to reduced Kin28 levels in the split Tfb3 strains. However, the viability of the cells (Figure 2) and the Tfb3-TAP purifications (Figure 3) argue against a complete loss of Kin28.”
  
  (ii) Lower than wild-type expression of the Tfb3 truncations might contribute to their mutant phenotypes shown in Figs. 2 & 5.
  
  In response, the paper now contains the following sentence:
  
  “There was some variation in protein expression levels (Figure 3A, left panel, lanes 1-4), and reduced levels of the split Tfb3 may contribute to the slow growth phenotypes.”
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.
  
  Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about the publication of this manuscript and offer a few minor comments below that may help to further strengthen the study.
  
  We appreciate the reviewer’s positive assessment of our work and suggestions for improvement.
  
  Page 4 PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Fig. 1A).
  
  The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not other fully-engaged PIC structures.
  
  Thanks for clarifying. We note that some structures of TFIIH alone also see the long helix. Accordingly, we modified this section to read:
  
  “In many TFIIH and PIC structures the linker is not visible, presumably due to flexibility. However, when it is seen (Abril-Garrido et al., 2023; Greber et al., 2019), the linker emerges from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits…”
  
  Page 8 Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as free Core Module, or the Kinase Module must dissociate soon after.
  
  To my knowledge, this is still controversial in the NER field. I note the potential function on the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3.
  
  We are not experts on NER, but in reviews of the field this appears to be a widely held assumption. A 2008 paper from the Egly lab (Coin et al., DOI 10.1016/j.molcel.2008.04.024) is usually cited, which shows that the interaction between XPD (metazoan Rad3) and XPA is likely incompatible with XPD-MAT1 interaction. In addition to the Yu 2023 review, we now also cite a more recent publication that more extensively reviews the models for core TFIIH interactions (van Sluis et al, 2025). We looked at the multiple recently published structures of various TCR-NER and GG-NER intermediate complexes, and none of them show the CAK module or even the Tfb3/Mat1 N-term, even though those proteins were typically included during assembly. We also consulted with our colleagues Johannes Walter and Lucas Farnung, who are studying various TC-NER intermediates biochemically and structurally. Although the CAK module is included in their assembly reactions, it is not visible in their cryoEM structures. They tell me that the presence of CAK would be compatible with early TC-NER intermediates, but is predicted to overlap with later interactions of XPD with the TC-NER factor STK19 (see Mevissen et al., Cell 2024). To be conservative, we modified the sentence to say “Recent structures … suggest” rather than “show”.
  
  Because the yeast strains used in Fig. 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.
  
  We agree that our experiment only shows that the connection between Tfb3 N- and C-term domains is not necessary for NER. The individual domains might still be able to function independently. Accordingly, we changed the heading of that section from “Disconnected core TFIIH does not cause an NER defect” to “Split Tfb3 does not cause an NER defect.” This more closely matches the figure legend title.
  
  Page 11. Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.
  
  Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).
  
  That is true. But please note that this sentence was meant to describe movement of the kinase module AFTER release from Mediator (see previous sentence). Re-reading the passage, we realized the confusion is because we propose multiple possible pathways in that paragraph. In the first half, we suggest the capture of the kinase module by Mediator might trigger the conformation changes in the linker. In the second half (where it says “Alternatively….”) we suggest the Mediator-CAK interaction could instead come first, and the release of this contact could free the CAK module to move around. We have modified the paragraph to make it clear these are two different distinct models.
  
  Comments on revisions:
  
  Revised ms clarified all my points, including those I previously misunderstood.
  
  Thanks again for helping us improve the manuscript.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.
  
  Strengths:
  
  The experiments presented are straightforward and the model for coupling initiation and CTD phosphorylation and for evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.
  
  Comments on revisions:
  
  The revised version with revisions to figures, text and new data has addressed all of our prior comments.
  
  We thank the reviewer for helping us improve the paper.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.
  
  The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module, and of Ser5 phosphorylation on the CTD of Pol II, is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.
  
  Strengths:
  
  Experiments involving expression of Tfb3 domains in yeast are well-controlled and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.
  
  We appreciate that the reviewer finds that our main conclusions are convincing.
  
  Weaknesses:
  
  The work is limited in scope and does not provide major insights into the mechanism of transcription. The main addition to current models of transcription is that tethering of Kin28 to Tfb3 may limit kinase action from occurring downstream from the initiation site.
  
  The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3 is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript, although the experiment apparently motivated the subsequent studies reported here.
  
  We elected not to do this control experiment for several reasons. As reviewer 3 points out, this kinase fusion experiment turned out to be somewhat disconnected from the rest of the paper. Even though it didn’t work, we included it in the paper because the results led us to the realization that the Tfb3 C-term was actually not fully essential for viability as reported, which in turn led us to the idea of splitting Tfb3. Structural studies (https://doi.org/10.1126/sciadv.abd4420, https://doi.org/10.1073/pnas.2009627117, https://doi.org/10.7554/eLife.44771) show that, in addition to providing linkage to the core module, the C-term of Tfb3 induces a conformation change in Kin28/Cdk7 necessary for full kinase activity (which is likely why the strains without C-term are just barely viable). If we were to pursue why the fusions didn’t work, we could tether Kin28 directly to the Tfb3 linker (and may try this in the future), but then would need to also express the C-term separately for its activating function. Even then, this would be an imperfect control for the fusion experiments in Figure 1. Because were trying to best mimic Kin28 being tethered via the accessory subunit Tfb3/Mat1, in the Figure 1 experiment we did not directly attach the kinases to Tfb3. For Ctk1/Cdk12, we fused the Tfb3 linker to the Ctk3 accessory subunit (analogous to Tfb3), and for Bur1/Cdk9, we fused to the cyclin subunit Bur2 (there is no known third subunit in this complex). The one exception was Mpk1, which has no partner subunits and is not a CDK. There are many reasons why this high-risk protein fusion experiment may not have worked, but chose not to pursue it further at this time.
  
  Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. It will be interesting to have this idea tested more thoroughly as more molecular evolutionary data becomes available.
  
  Comments on revisions:
  
  For the most part, the authors have satisfactorily addressed my previous critique. In particular, they have added to their discussion of evolutionary implications, and performed an experiment casting doubt on the assertion of a dominant negative effect, and as a consequence removed this claim from the manuscript. I also pointed out that the fusion experiments that lead off the Results section are missing the crucial control of including a Tfb3-Kin28 fusion. The authors have elected not to perform this control experiment, pointing out that even this control would be imperfect in some respects, and agreeing that this experiment is somewhat disconnected from the rest of the paper. The reason for including it, in spite of its somewhat tangential nature, is that it provides something of a rationale for the experiments that follow. I don't so much mind their retaining the experiment, as the absence of this control (and indeed, the results) does not so much impact the later results. However, I think if it is to be included, this shortcoming should be explicitly recognized, especially as a service to younger scientists who could benefit from an exposition that includes a thorough consideration of potential control experimenents.
  
  We thank the reviewer for helping us improve the paper.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2023.09.11.557269v5
arxiv.org arxiv.org

Modulating task outcome value to mitigate real-world procrastination via noninvasive brain stimulation

1
1. Public_Reviews 13 May 2026
  
  in eLife (unscoped)
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The authors report the results of a tDCS brain stimulation study (verum vs sham stimulation of left DLPFC; between-subjects) in 46 participants, using an intense stimulation protocol over 2 weeks, combined with an experience-sampling approach, plus follow-up measures after 6 months.
  
  Strengths:
  
  The authors are studying a relevant and interesting research question using an intriguing design, following participants quite intensely over time and even at a follow-up time point. The use of an experience-sampling approach is another strength of the work.
  
  Weaknesses:
  
  There are quite a few weaknesses, some related to the actual study and some more strongly related to the reporting about the study in the manuscript. The concerns are listed roughly in the order in which they appear in the manuscript.
  
  We truly appreciate your dedicating time and efforts to review our manuscript. Yes, we do perceive that those weaknesses you raised all make sense. We agree with you on almost all the suggestions that you detailed below, particularly in clarifying statistics and sample size determination. Please see specific responses below.
  
  Major Comments
  
  (1) In the introduction, the authors present procrastination nearly as if it were the most relevant and problematic issue there is in psychology. Surely, procrastination is a relevant and study-worthy topic, but that is also true if it is presented in more modest (and appropriate) terms. The manuscript mentions that procrastination is a main cause of psychopathology and bodily disease. These claims could possibly be described as 'sensationalized'. Also, the studies to support these claims seem to report associations, not causal mechanisms, as is implied in the manuscript.
  
  Thank you for this very practical suggestion. We agree that the current statements to underline the importance of procrastination are somewhat overreaching. Upon revision, we have overall toned down such claims by explicitly stating them as “associative evidence”, and rewritten a portion of terms in a more modest and balanced style. Please see specific revisions in the main text below:
  
  Introduction Section (Page 5, Line 64-81)
  
  “Procrastination is increasingly becoming a prevalent behavioral problem around the world, which reflects the irrational voluntary postponement of scheduled tasks albeit being worse off for such delays (Blake, 2019; Steel, 2007). In the epidemiological investigations, more than 15% of adults were identified as having chronic procrastination problems, and the situation for students was worse as 70-80% of undergraduates engaged in procrastination (American College Health Association, 2022; Ferrari et al., 2005). Moreover, the behavioral genetic evidence indicates a certain heritability of procrastination in human beings as well (Gustavson et al., 2017; Gustavson et al., 2014, 2015). In addition to its prevalence, the undesirable associations between procrastination behavior and health also warrant cautions. There is cumulative evidence to show the close associations between procrastination behavior and working performance, financial status, interpersonal relationships, and subjective well-being (Ferrari, 1994; Pychyl & Sirois, 2016; Steel et al., 2021). Further, as the prospective cohort studies indicated, many mental health problems emerge alongside procrastination, particularly in sleep problems, depression, and anxiety (Hairston & Shpitalni, 2016; Johansson et al., 2023). Even worse, chronic procrastination behavior has been observed to impair general health, as manifested by the intimate associations with close system disruption, gastrointestinal disturbance, as well as a high risk of hypertension and cardiovascular disease (Sirois, 2015; Sirois, 2016). ... ”
  
  (2) It is laudable that the study was pre-registered; however, the cited OSF repository cannot be accessed and therefore, the OSF materials cannot be used to (a) check the preregistration or to (b) fill in the gaps and uncertainties about the exact analyses the authors conducted (this is important because the description of the analyses is insufficiently detailed and it is often unclear how they analyzed the data).
  
  We are sorry to encounter a serious technical barrier making our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account (please see the screenshot below). This results in no access to all materials already deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report. We reckon that this may be triggered by my affiliation change to the Third Military Medical University of the People’s Liberation Army (PLA).
  
  To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” into the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the whole revised manuscript. Furthermore, we fully understand the gaps of comprehending the statistics of this study, resulting from inadequate methodological details in the reporting. Therefore, we have clearly reported extensive details in the Methods section to clarify how to conduct those analyses, favoring the smooth evaluations of our conclusions. Please see what we have added in the lines below (Comments #4-9).
  
  Methods Section (Page 5, Line 186-191)
  
  “This study fully adhered to CONSORT reporting guidelines, and was originally preregistered in the OSF repository (10.17605/OSF.IO/Y3EDT). However, due to the technical constraint related to OSF account service (see SM), this OSF page is no longer accessible. For transparency and best practices of open science, based on the original protocol documentations, a preregistration statement has been reconstructed to clarify aprior hypotheses, sample size determinations, and analysis plans for this study (Table S1).”
  
  (3) Related to the previous point: I find it impossible to check the analyses with respect to their appropriateness because too little detail and/or explanation is given. Therefore, I find it impossible to evaluate whether the conclusions are valid and warranted.
  
  Again, we apologize for confusing you because of inadequate statistical and methodological details. As you may know, this manuscript has ever been reviewed by Nature Human Behaviour, which editorially constrained the paper length. Thus, a substantial number of details had to be omitted or removed. As you kindly suggested, we have diligently added extensive descriptions to clarify how we carried out statistical analyses in the present study. Please see specific instances underneath.
  
  (4) Why is a medium effect size chosen for the a priori power analysis? Is it reasonable to assume a medium effect size? This should be discussed/motivated. Related: 18 participants for a medium effect size in a between-subjects design strikes me as implausibly low; even for a within-subjects design, it would appear low (but perhaps I am just not fully understanding the details of the power analysis).
  
  Thank you for raising this crucial question. We have determined this a priori effect size based on the existing work we published previously (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In our pilot study (Xu et al., 2023), we identified a significant interaction effect between the single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in the laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori. To clarify, we have explicitly justified the selection of this effect size in the Methods section.
  
  Methods Section (Page 5, Line 206-215)
  
  “A full randomized block design was used to assign participants to both groups (active neuromodulation group, NM; sham-control group, SC) (see Fig. 2C). As the pilot study probing into the effect of single-session tDCS stimulation to change procrastination willingness indicated (t = 2.38, p = .02, 95% CI [0.14, 1.49]; Xu et al., 2023), statistical power was predetermined by G*Power at a relatively medium effect size (1-β err prob = 0.80, f = 0.25), yielding the total sample size at 18 to reach acceptable power (see SM Methods and Fig. S1)....”
  
  We fully understand that this sample size to reach a medium effect size is seemingly low, and that the18 participants for each group are apparently limited in any case. Upon double-checking these power analyses, we confirmed that this sample size requirement is indeed correct. Please see the G*Power outputs in Author response image 1.
  
  Author response image 1.
  
  Despite the absence of algorithmic errors in the power analysis here, we are aware that this limited sample size may hamper statistical robustness. To tackle this weakness, we have clearly warranted such cautions in the Limitation section:
  
  Limitations Section (Page 12, Line 637-640)
  
  “... In addition to technical limitations, given the apparently limited size of the sample (total N = 46), it warrants caution in generalizing these findings elsewhere, and necessitates further validations in a large-scale cohort.”
  
  (5) It remains somewhat ambiguous whether the sham group had the same number of stimulation sessions as the verum stimulation group; please clarify: Did both groups come in the same number of times into the lab? I.e., were all procedures identical except whether the stimulation was verum or sham?
  
  Yes, we fully followed the CONSORT pipeline to carry out this double-blind trial, and thus confirmed that all the participants in both groups had the same number of stimulation sessions in our lab. That is to say, except for the stimulation type (verum vs sham), all the procedures, equipment and even the room were identical for all the participants. For clarification, we have clearly stated this in the main text:
  
  Results Section (Page 9, Line 419-423)
  
  “In both groups, almost all participants (93.2%, 41/44) reported perceiving acceptable pain stemming from current stimulation, and believed they were receiving treatment (91.30% (21/23) for active neuromodulation group (NM), 86.95% (20/23) for sham control group (SC), x<sup>2</sup> = 0.224, p = .636). All the participants were engaged in the identical experimental procedures excepting to stimulation’s type (active vs sham). ...”
  
  (6) The TDM analysis and hyperbolic discounting approach were unclear to me; this needs to be described in more detail, otherwise it cannot be evaluated.
  
  We apologize for the inadequate details, which hindered a precise understanding of the TDM and the hyperbolic discounting model. The Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations to take away from playing actions now for avoiding negative experiences). Once task aversiveness overrides the pursuit of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). Considering the nonlinear dynamics inherent in this hyperbolic discounting, we therefore employed a log-spaced temporal sampling scheme (Myerson et al., 2001) to strengthen curve-fitting performance (please see the schematic diagram (https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time)):
  
  Specifically, based on the log-spaced temporal sampling rule, five time points were first selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampling occurred at 10:00, 16:00, 18:00, 19:30, 20:00). At each time point, participants reported task aversiveness (A) on a 0–100 Visual Analog Scale (VAS). Then, task aversiveness discounting was calculated as 1- (A<sub>t</sub> / A<sub>earliest</sub>), where t<sub>earliest</sub> was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from these five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed as the trapezoidal integration of task aversiveness discounting over time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination. As you kindly suggested, we have added these details to explicitly clarify how to use the hyperbolic discounting approach for determining sampling time points and for calculating AUC of task aversiveness discounting.
  
  Methods Section (Page 6, Line 268-283)
  
  “On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives when performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a priori by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting, requiring ≥ 4 points (Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure.”
  
  Methods Section (Page 7, Line 318-334)
  
  “... As articulated temporal decision theoretical model above, the task aversiveness evoked by executing a task was temporally dynamic in a hyperbolic discounting pattern, with sharply discounting in faring away from deadline but slowly discounting in nearing deadline (Zhang & Feng, 2020). To quantitatively characterize the task aversiveness with consideration for its dynamics, the model-free area under the curve (AUC) was calculated. Specifically, based on the log-spaced temporal sampling rule, task aversiveness was measured by 100-point visual analog scale at the five sampling moments. Then, the task aversiveness discounting (A) was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point, serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), the AUC was computed as the trapezoidal integration between task aversiveness discounting and time across five data points, basing on the Myerson algorithm (Myerson et al., 2001). By doing so, a higher AUC reflects stronger temporal discounting of task aversiveness along with nearing deadline, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. As for the task outcome value, it was theoretically posited as a relatively stable evaluation of the task (Zhang & Feng, 2020; Zhang et al., 2021).”
  
  References
  
  Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235
  
  Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312
  
  Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053
  
  Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492
  
  Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643
  
  (7) Coming back to the point about the statistical analyses not being described in enough detail: One important example of this is the inclusion of random slopes in their mixed-effects model which is unclear. This is highly relevant as omission of random slopes has been repeatedly shown that it can lead to extremely inflated Type 1 errors (e.g., inflating Type 1 errors by a factor of then, e.g., a significant p value of .05 might be obtained when the true p value is .5). Thus, if indeed random slopes have been omitted, then it is possible that significant effects are significant only due to inflated Type 1 error. Without more information about the models, this cannot be ruled out.
  
  Thank you for sharing this very timely and crucial comment. After careful scrutiny, we identified this statistical flaw you pointed out - each participant was not yet modeled as random slopes but as random intercepts merely. As you kindly suggested, we have reanalyzed all the statistics by adding random slopes (i.e., (1 + day|SubjectID)). Results showed a statistically significant interaction effect for both procrastination willingness (β = -7.8, SE = 1.8, DF = 45.6, p < .001) and actual procrastination rates (β = -7.4, SE = 2.4, DF = 46.6, p = .004), indicating the effectiveness of multi-session neuromodulation in mitigating procrastination. In the post-hoc simple effect analyses, participants who engaged in active neuromodulation (NM) showed a significant increase in task-execution willingness (i.e., decreased procrastination willingness; NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction) and a decrease in actual procrastination rates (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), while no such effects were identified for participants in the sham control group (for willingness, SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction; for actual procrastination, SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction). Taken together, we do appreciate your pointing out this definitely crucial statistical weakness, and have confirmed that our findings remain reliable after adjusting for Type 1 error by adding random slopes. Moreover, as you kindly suggested, we have incorporated these statistical details, particularly those concerning the GLMM, into the main text to facilitate your evaluation. Please see specific revisions below:
  
  Methods Section (Page 8, Line 381-401)
  
  “To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test....”
  
  Results Section (Page 9, Line 428-449)
  
  “To identify whether ms-tDCS targeting the left DLPFC can alleviate subjective procrastination willingness and actual procrastination behavior, a generalized linear mixed-effects model with Scatterthwaite algorithm was built, with task-execution willingness and actual procrastination rates (PR) as primary outcomes, respectively. For procrastination willingness, results showed a statistically significant interaction effect between multi-session neuromodulations and groups (β = -7.8, SE = 1.8, DF = 45.6, p < .001; Fig. 3A). In the post-hoc simple effect analysis, it demonstrated a significantly increased task-execution willingness (i.e., decreased procrastination willingness) after neuromodulation in the active neuromodulation group (NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction), but no such effects were identified in the sham control group (SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction) (Fig. 3B-C). A linear uptrend for task-execution willingness was further observed across multiple sessions in the active NM group, indicating gradually increasing neuromodulation effects (Fig. 3D; p < .01, Mann-Kendall test). For actual procrastination behavior, changes to actual procrastination rates across all the sessions have been detailed in the Fig. 3E. Similarly, a statistically significant interaction effect was identified here (β = -7.4, SE = 2.4, DF = 46.6, p = .004), and the simple effect analysis further revealed decreased actual procrastination rates after ms-tDCS in the active neuromodulation group (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), but no such prominent changes found in the sham control group (SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction) (Fig. 3F-G). Also, a significant downtrend for procrastination rates across all the sessions was identified in the active NM group (Fig. 3H; p < .01, Mann-Kendall test).”
  
  (8) Related to the previous point: The authors report, for example, on the first results page, line 420, an F-test as F(1, 269). This means the test has 269 residual degrees of freedom despite a sample size of about 50 participants. This likely suggests that relevant random slopes for this test were omitted, meaning that this statistical test likely suffers from inflated Type 1 error, and the reported p-value < .001 might be severely inflated. If that is the case, each observation was treated as independent instead of accounting for the nestedness of data within participants. The authors should check this carefully for this and all other statistical tests using mixed-effects models.
  
  Thank you for underlining this very timely and helpful comment. As you correctly pointed out above, we did not include random slopes in the original GLMM, highly risking the inflation of the false-positive rate (i.e., Type-I error). By adding the random slopes, we reanalyzed all the statistics from the GLMM, and confirmed that all the findings are still reliable from those new GLMMs with random slopes. Again, thank you for this crucial statistical advice, and please see the above response for full details regarding what we have revised to address this comment you kindly raised.
  
  (9) Many of the statistical procedures seem quite complex and hard to follow. If the results are indeed so robust as they are presented to be, would it make sense to use simpler analysis approaches (perhaps in addition to the complex ones) that are easier for the average reader to understand and comprehend?
  
  We do thank you for this practical and helpful comment. In the original manuscript, we incorporated a joint model of longitudinal and survival data (JM-LSD), in conjunction with machine learning algorithms, to strengthen the robustness of our statistical findings. Nevertheless, we all agree with you on this point: there is no need to complicate the analyses by repeatedly probing the same research question to increase methodological robustness, at the expense of compromising readability and intelligibility for a broader audience. As you suggested, we have removed these complicated statistical methods, and merely maintained the primary ones - GLMM and X<sup>2</sup> cross-tab test, as well as a complementary one - Mann-Kendall linear trend test. Thus, we have almost rewritten the whole Results section. Please see the specific instances below:
  
  Results Section (Page 9, Line 468-485)
  
  “Ms-tDCS changes task aversiveness and task-outcome value
  
  Both task aversiveness and task outcome value serve as key pathways determining whether one would procrastinate. To this end, we further utilized a generalized linear mixed-effects model to examine the effects of ms-tDCS on changes in task aversiveness and task outcome value. Task aversiveness changes across all the sessions are shown in the Fig. 4A and 4C. We demonstrated a statistically significant decrease in task aversiveness and an increase in task outcome value via ms-tDCS in the neuromodulation group (Task aversiveness: interaction effect, β = -0.12, SE = 0.04, DF = 46.7, p = .002; simple effect, NM-before <sub>(AUC)</sub>: 1.13 ± 0.53, NM-after <sub>(AUC)</sub>: 1.95 ± 0.85, t.ratio = 4.5, p < .001, Tukey correction; Outcome value: β = -6.8, SE = 1.74, DF = 46.2, p < .001; simple effect, NM-before: 35.86 ± 27.82, NM-after: 73.08 ± 23.33, t.ratio = 5.0, p < .001, Tukey correction; see Fig. 4B), but not in the sham control group (Task aversiveness: SC-before <sub>(AUC)</sub>: 1.07 ± 0.51, SC-after <sub>(AUC)</sub>: 1.28 ± 0.46, t.ratio = 1.3, p = .20, Tukey correction; Outcome value: SC-before: 34.00 ± 25.17, SC-after: 40.13 ± 28.94, t.ratio = 0.8, p = .41, Tukey correction; see Fig. 4D). In the neuromodulation (NM) group, task aversiveness steadily decreased with the cumulative number of stimulation sessions, while perceived task outcome value increased significantly (see Fig. 4E-F, p < .05, Mann-Kendall test). Thus, it provides causal evidence clarifying that neuromodulation to left DLPFC reduces task aversiveness and enhances task-outcome value meanwhile.”
  
  Results Section (Page 10, Line 525-542)
  
  “Long-term effects of ms-tDCS
  
  We have also attempted to conduct a follow-up investigation to test the long-term retention of ms-tDCS in reducing actual procrastination. Almost all the participants had undergone follow-up except one in the neuromodulation group after last neuromodulation for 6 months (N<sub>NM</sub> = 22, N<sub>SC</sub> = 23). Thus, the GLMM was constructed, with the PR before first neuromodulation vs. PR after last neuromodulation for 6 months as covariates of interest. Results showed the statistically significant group*time interaction effects (β = 16.5, SE = 9.9, p = .049). Simple-effect model demonstrated a decrease in actual procrastination rates in the active neuromodulation group after last stimulation for 6 months compared to baseline (β = -22.05, SE = 10.0, p = .038, Tukey correction; NM-before: 40.68 ± 37.96, NM-after<sub>6-months</sub>: 18.63 ± 29.80), and revealed null effects in the SC group (β = 1.26, SE = 9.78, p = .99, Tukey correction; SC-before: 46.47 ± 40.75, SC-after<sub>6-months</sub>: 47.73 ± 39.18) (see Fig. 6).. Furthermore, using a nonparametric x<sup>2</sup> test to compare differences in the number of procrastinated tasks, we still found a statistically significant reduction in procrastination frequency in NM group after neuromodulation for 6 months compared to baseline (x<sup>2</sup> = 3.30, p = .035, NM-before: 68.19% (15/22), NM-after<sub>6-months</sub>: 40.91% (9/22)), while no significant changes were observed in the SC group (x<sup>2</sup> = 0.11, p = .74, SC-before: 69.56% (16/23), SC-after<sub>6-months</sub>: 73.91% (17/23)). Therefore, beyond to short-term effects, the benefits of ms-tDCS neuromodulation to reduce procrastination pose the long-term retention.”
  
  (10) As was noted by an earlier reviewer, the paper reports nearly exclusively about the role of the left DLPFC, while there is also work that demonstrates the role of the right DLPFC in self-control. A more balanced presentation of the relevant scientific literature would be desirable.
  
  We are grateful to you for noticing the unbalanced presentation of the literature on left DLPFC. As you kindly suggested, we have added literature to support the association between self-control and the right lateralization of the DLPFC. Please see below for what we have revised:
  
  Introduction Section (Page 4, Line 137-143)
  
  “...In addition to the left lateralization, there is solid evidence indicating significant associations between self-control and the right DLPFC indeed, particularly given that this region specifically functions in top-down regulation, future self-continuity representation and social decisions (Huang et al., 2025; Lin and Feng, 2024; Knoch & Fehr, 2007). Despite this case, Xu and colleagues demonstrated null effects of anodally stimulating the right DPFC to modulate either value evaluation or emotional regulation for changing procrastination willingness (Xu et al., 2023).”
  
  (11) Active stimulation reduced procrastination, reduced task aversiveness, and increased the outcome value. If I am not mistaken, the authors claim based on these results that the brain stimulation effect operates via self-control, but - unless I missed it - the authors do not have any direct evidence (such as measures or specific task measures) that actually capture self-control. Thus, that self-control is involved seems speculation, but there is no empirical evidence for this; or am I mistaken about this? If that is indeed correct, I think it needs to be made explicit that it is an untested assumption (which might be very plausible, but it is still in the current study not empirically tested) that self-control plays any role in the reported results.
  
  We truly appreciate your pointing out this weakness with regard to conceptualization. Yes, you are correct in understanding this causal chain: we conceptually speculate that the HD-tDCS stimulation over the left DLPFC operates self-control to change procrastination, rather than empirically validating this component in the chain: brain stimulation→increased self-control→increased task outcome value→decreased procrastination. In this causal chain, we did not collect data to directly measure self-control at either baseline or post-neuromodulation times. Therefore, we all agree with your suggestion to explicitly claim this case in the main text. Following this advice, we have redrawn a portion of the Conclusion by clearly pointing out the hypothesis-generating role of self-control in mitigating procrastination, and have further claimed this case in the Limitation section:
  
  Abstract Section (Page 2, Line 55-57)
  
  “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and offers a validated, theory-driven strategy for interventions.”
  
  Results Section (Page 10, Line 489-492 and 520-522)
  
  “Given the dual neurocognitive pathways identified above—reduced task aversiveness and increased task-outcome value—we proposed that these changes, conceptually driven by enhanced self-control via ms-tDCS over left DLPFC, account for how neuromodulation reduces procrastination. ...”
  
  “In summary, these findings demonstrated a mechanistic pathway underlying procrastination: the self-control that was conceptualized to be governed by left DLPFC mitigate procrastination by plausibly increasing task-outcome value.”
  
  Discussion Section (Page 13, Line 642-645)
  
  “Moreover, this study did not collect data for assessing participants’ self-control at either baseline or post-neuromodulation, thereby limiting our ability to determine whether the effects on procrastination were uniquely attributable to neuromodulation-induced changes in self-control. ...”
  
  (12) Figures 3F and 3H show that procrastination rates in the active modulation group go to 0 in all participants by sessions 6 and 7. This seems surprising and, to be honest, rather unlikely that there is absolutely no individual variation in this group anymore. In any case, this is quite extraordinary and should be explicitly discussed, if this is indeed correct: What might be the reasons that this is such an extreme pattern? Just a random fluctuation? Are the results robust if these extreme cells are ignored? The authors remove other cells in their design due to unusual patterns, so perhaps the same should be done here, at least as a robustness check.
  
  Thank you for raising this highly important and helpful comment. Indeed, we fully understand that this result is somewhat extraordinary, a fact that was equally striking to us when unblinding the data. After carefully scrutinizing the data and statistics, we are thrilled to confirm that this pattern is true. In support of this observation, we were gratified to receive numerous thank-you letters from participants who engaged in active neuromodulation. They expressed gratitude to us, and reported that they have substantially ameliorated procrastination behavior in real-life activities after completing the trial. While this does not constitute formal scientific evidence, we are also glad to see the benefits of this neuromodulation for those procrastinators.
  
  Two reasons could account for this pattern herein. One interpretation is to attribute this pattern to “scalar inflation”. In the present study, the procrastination rate was calculated as 1 minus the task-completion rate (e.g., 80%, 60%, 40%) by the deadline. At sessions # 6 and #7, all the participants completed their real-life tasks before the deadline, yielding a 0% (1 minus 100% completion rate) procrastination rate, without any between-individual variation. Thus, rather than there being no individual variation in procrastination, this scalar – the procrastination rate - is too insensitive to capture subtle differences per se. For instance, although participants #1 and #2 both showed a 0% procrastination rate - meaning that both completed their tasks before the deadline - Participant #1 might have completed it 3 hours before the deadline, whereas Participant #2 might have completed it only 10 minutes before. In this case, the “scalar inflation” emerges to let us perceive that both participants have equivalent procrastination rates, although participant #2 may have a higher procrastination level than #1. As conceptually defined in the field, procrastination is contextualized as “not completing a task before the deadline”. Thus, if this task is completed before the deadline, regardless of whether it was finished close to or far in advance of the deadline, this case is defined as “no procrastination”. In the present study, the primary outcome is whether a participant procrastinated on a real-life task before the deadline in real-world settings, irrespective of when she/he completed this task. Thus, this scalar - procrastination rate - fits our conceptualization of procrastination.
  
  Another reason is the potential accumulative effects from sequential multi-session tDCS stimulation. As shown in Mann-Kendall trend tests, the procrastination rates show a significant linear downtrend in the active neuromodulation group across sessions, even after removing sessions #6 and #7. This indicates that the improvements of going against procrastination may be sequentially accumulative along with the increase in sessions, implying a potential “dose-dependent effect”. Despite a speculative interpretation, this “dose-dependent effect” in neuromodulation has been well-documented in previous studies, showing the robustly linear association between the number of sessions and effectiveness (c.f., Cole et al., 2020; Hutton et al., 2023; Sabé et al., 2024; Schulze et al., 2018). Therefore, although this extreme pattern is somewhat extraordinary compared to previous observations, it makes sense.
  
  Yes, this is a definitely great idea to carry out a robustness check by removing sessions #6, #7, or both. We do believe that this analysis could support statistical robustness to go against potential biases from extreme cells. By doing so, we found that all the group*treatment_day interaction effects remained significant when removing either session #6 or session #7 (or even both, all p-values < .05), indicating high statistical robustness. Please see Supplementary table S3 and S4
  
  Taken together, in spite of their being extraordinary, we confirm that those findings are statistically robust to extreme outliers. As you kindly suggested, we have added those findings of the robustness check into the revised Supplemental Materials section.
  
  References
  
  Cole, E. J., Stimpson, K. H., Bentzley, B. S., Gulser, M., Cherian, K., Tischler, C., Nejad, R., Pankow, H., Choi, E., Aaron, H., Espil, F. M., Pannu, J., Xiao, X., Duvio, D., Solvason, H. B., Hawkins, J., Guerra, A., Jo, B., Raj, K. S., Phillips, A. L., … Williams, N. R. (2020). Stanford Accelerated Intelligent Neuromodulation Therapy for Treatment-Resistant Depression. The American journal of psychiatry, 177(8), 716–726. https://doi.org/10.1176/appi.ajp.2019.19070720
  
  Hutton, T. M., Aaronson, S. T., Carpenter, L. L., Pages, K., Krantz, D., Lucas, L., Chen, B., & Sackeim, H. A. (2023). Dosing transcranial magnetic stimulation in major depressive disorder: Relations between number of treatment sessions and effectiveness in a large patient registry. Brain stimulation, 16(5), 1510–1521. https://doi.org/10.1016/j.brs.2023.10.001
  
  Sabé, M., Hyde, J., Cramer, C., Eberhard, A., Crippa, A., Brunoni, A. R., Aleman, A., Kaiser, S., Baldwin, D. S., Garner, M., Sentissi, O., Fiedorowicz, J. G., Brandt, V., Cortese, S., & Solmi, M. (2024). Transcranial Magnetic Stimulation and Transcranial Direct Current Stimulation Across Mental Disorders: A Systematic Review and Dose-Response Meta-Analysis. JAMA network open, 7(5), e2412616. https://doi.org/10.1001/jamanetworkopen.2024.12616
  
  Schulze, L., Feffer, K., Lozano, C., Giacobbe, P., Daskalakis, Z. J., Blumberger, D. M., & Downar, J. (2018). Number of pulses or number of sessions? An open-label study of trajectories of improvement for once-vs. twice-daily dorsomedial prefrontal rTMS in major depression. Brain stimulation, 11(2), 327–336. https://doi.org/10.1016/j.brs.2017.11.002
  
  (13) The supplemental materials, unfortunately, do not give more information, which would be needed to understand the analyses the authors actually conducted. I had hoped I would find the missing information there, but it's not there.
  
  Sorry to offer uninformative supplemental materials (SM) in the original submission. As you suggested, we have added a substantial number of details to clarify how we conducted data analyses in the main text, and also tightened the whole SM section to improve readability and comprehensibility. We do hope that this revised manuscript could offer clear and adequate information in understanding methods and statistics for broader readers.
  
  In sum, the reported/cited/discussed literature gives the impression of being incomplete/selectively reported; the analyses are not reported sufficiently transparently/fully to evaluate whether they are appropriate and thus whether the results are trustworthy or not. At least some of the patterns in the results seem highly unlikely (0 procrastination in the verum group in the last 2 observation periods), and the sample size seems very small for a between-subjects design.
  
  Thank you for this very helpful summary. As you kindly suggested above, we have overhauled this manuscript to address those points that you listed here, particularly where we added relevant literature to balance our claims, added a huge amount of details to sufficiently/transparently report statistics, and conducted a robustness check to confirm the statistical robustness of our findings to those plausible extreme patterns (sessions #6 and #7), as well as justified how we determined this sample size fulfilling medium statistical power in a priori. Please see above for full details regarding how we addressed those comments, point-by-point.
  
  Reviewer #2 (Public Review):
  
  Chen and colleagues conducted a cross-sectional longitudinal study, administering high-definition transcranial direct stimulation targeting the left DLPFC to examine the effect of HD-tDCS on real-world procrastination behavior. They find that seven sessions of active neuromodulation to the left DLPFC elicited greater modulation of procrastination measures (e.g., task-execution willingness, procrastination rates, task aversiveness, outcome value) relative to sham. They report that tDCS effects on task-execution willingness and procrastination are mediated by task outcome value and claim that this neuromodulatory intervention reduces procrastination rates quantified by their task. Although the study addresses an interesting question regarding the role of DLPFC on procrastination, concerns about the validity of the procrastination moderate enthusiasm for the study and limit the interpretability of the mechanism underlying the reported findings.
  
  Strengths:
  
  (1) This is a well-designed protocol with rigorous administration of high-definition transcranial direct current stimulation across multiple sessions. The approach is solid and aims to address an important question regarding the putative role of DLPFC in modulating chronic procrastination behavior.
  
  (2) The quantification of task aversiveness through AUC metrics is a clever approach to account for the temporal dynamics of task aversiveness, which is notoriously difficult to quantify.
  
  Thank you for taking your invaluable time to review our manuscript, warmly applauding the strength in research design and the conceptualization of scaling task aversiveness, as well as kindly sharing such helpful and insightful evaluations. As you correctly pointed out, we are aware of the absence of detailed, clear and understandable reporting of measures (e.g., real-world procrastination), statistics and methods, in the original manuscript. Following all your suggestions, we have thoroughly revised this manuscript to address those comments that you kindly made, point-by-point. Please see the full response underneath.
  
  Weaknesses:
  
  (1) The lack of specificity surrounding the "real-world measures" of procrastination is problematic and undermines the strength of the evidence surrounding the DLPFC effects on procrastination behavior. It would be helpful to detail what "real-world tasks" individuals reported, which would inform the efficacy of the intervention on procrastination performance across the diversity of tasks. It is also unclear when and how tasks were reported using the ESM procedure. Providing greater detail of these measures overall would enhance the paper's impact.
  
  We genuinely appreciate your raising this very crucial comment. We are sorry for omitting a tremendous number of methodological details to comply with the editorial requirement on the manuscript’s length, which hampered the comprehension of how we measure “real-life tasks” and “real-world procrastination”.
  
  As shown in the schematic diagram for experimental procedure (Fig. 1), the experimental protocol alternated between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On each Neuromodulation Day, participants received either active or sham HD-tDCS, and—critically—before stimulation—were instructed to specify a real-life task they were required to complete the following day, with a deadline between 18:00 and 24:00. This ensured ≥24 hours between neuromodulation and task execution, isolating offline after-effects. For instance, on Day #2 (Neuromodulation Day), before carrying out stimulation, participants were asked to report a real-life task that has a deadline within 18:00 - 24:00 for tomorrow’s “task day” (Day #3) (please see the schematic diagram in Author response image 2).
  
  Author response image 2.
  
  There are some real-life tasks that they reported in our experiment as examples: “Complete and submit a homework assignment”, “Complete a standardized English proficiency test”, “Complete an online course module required for applying a Class C driver’s license”, “Prepare slides for a seminar presentation”, “Practice guitar”, “Practice Chinese calligraphy”, and “Do the laundry”. Reported tasks spanned academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity.
  
  On each “task day”, participants engaged in an intensive Experience Sampling Method (iESM) protocol via a custom-built mobile app. Using this app, participants were required to report a subjective task-execution willingness score (i.e., a one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”; procrastination willingness = 100 – the task-execution willingness score), the subjective task aversiveness (i.e., a one-item 100-point visual analog scale), the subjective task outcome value (i.e., a one-item 100-point visual analog scale), and the objective procrastination rate, respectively.
  
  Rather than self-reported scores from those one-item visual analog scales, we asked participants to report real “task completion rate” for the objective quantification of the “real-world procrastination behavior”. Specifically, at the deadline, each participant was asked to report whether she/he had completed this task. If she/he reported not having yet completed the task (i.e. procrastination behavior emerged), she/he was further required to report the percentage of the task completed (1% - 99%), which was defined as the task completion rate. By doing so, we could calculate the real-world procrastination rate for the real-life task as the “1 – the task completion rate”. For instance, if a participant did not complete her/his real-life task before the deadline (i.e. she/he procrastinated this task) and reported completing 75% of this task at the deadline, her/his real-world procrastination rate was computed as the 25% (1 - 75%) (Please see the schematic diagram in Author response image 3).
  
  Moreover, rather than merely a self-reported task completion rate, each participant was also asked to upload proof (e.g., screenshots of submitted assignments, photos of printed documents, system timestamps) to the ESM digital system for validation.
  
  Author response image 3.
  
  To determine the sampling time points for this mobile app in the ESM, we capitalized on both the conceptual temporal decision model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001) (please see the schematic diagram in https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time):
  
  By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00). Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. As the primary outcomes, the procrastination rate (i.e., 1 – the task completion rate) and the procrastination willingness were sampled at the deadline point.
  
  Furthermore, yes, we fully concur with you on this great idea, that is, transparency about task diversity strengthens the generalizability of our findings. In response, we have tabulated these real-life tasks that were reported in this experiment in the independent Appendix 1, with automatic translations from Chinese to English via Qwen GPT. Please see below for what we have added to the main text:
  
  Methods Section (Page 6-7, Line 238-308)
  
  “Nested cross-sectional longitudinal design
  
  This study used a nested cross-sectional longitudinal design to investigate whether the multiple-session anodal HD-tDCS targeting the left DLPFC could reduce actual procrastination behavior and to probe how this effect manifests. To assess procrastination in daily life, we implemented a 15-day protocol alternating between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On the Neuromodulation days, the 20-min anodal HD-tDCS neuromodulation targeting the left DLPFC was performed for HD-tDCS active group at intervals of 2 days, while the sham-control group received sham HD-tDCS training. This HD-tDCS training was repeated for a total of seven sessions, and lasted 15 days (see Fig. 1a). Crucially, to capture procrastination in ecologically valid contexts, prior to receiving either active or sham HD-tDCS (administered between 09:00–18:00), participants were instructed to specify a real-life task they were personally obligated to complete the following day, with a self-defined deadline strictly constrained to 18:00–24:00 to ensure ≥24 hours between stimulation offset and task deadline, thereby isolating offline after-effects. This task should meet the following three criteria: (a) it should be already assigned in the real-world settings; (b) deadline should be constrained to 18:00-24:00 (see above); (c) it should be more likely to induce procrastinate. By doing so, more than 300 real-life tasks were collected, spanning academic (e.g., “submit a statistics homework assignment”), occupational (e.g., “draft and email a project proposal”), administrative (e.g., “complete online application for Class C driver’s license”), self-improvement (e.g., “practice guitar for ≥30 minutes”), domestic (e.g., “do laundry ”), and health-related (e.g., “running 2,000m for exercise”). Full task list has been tabulated in the Appendix 1. As primary outcomes, all the participants were required to reported task-execution willingness (TEW) (Zhang & Feng, 2020; Zhang, Liu, et al., 2019), for a real-life task 24 hours post-neuromodulation. Thus, procrastination willingness was quantified as 100-TEW score (see underneath for details). Furthermore, we asked participants to report the actual task completion rate (CR) of the task at the deadline (e.g. participant A finished 90% homework at deadline and reported this situation to us at deadline). In this vein, the actual procrastination rate (PR) was quantified as 1-CR.
  
  On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a prior by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting (requiring ≥ 4 points; Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure. To obviate the confounds of daily emotions in task aversiveness evaluation, we used the averaged scores of PANAS at 10:00 (noon) and 16:00 (afternoon) as anchoring points to quantify one’s daily emotions by using this ESM app. Before each session of HD-tDCS training, each participant was required to report a real-life task whose deadline is tomorrow. To obtain the long-term effect of HD-tDCS (i.e., the interval between HD-tDCS and task completion is at least 24 hours), the task deadline that participants reported was required to be between 18:00 - 24:00. Once a sampling time reached, this app would send a digital message to require participants to fill online form for data collection.
  
  Quantification of covariates of interests
  
  Outcome variables of this study were twofold: one is task-execution willingness and another is procrastination rate (PR). Task-execution willingness is used to evaluate one’s subjective inclination to avoid procrastination (Zhang & Feng, 2020). In this vein, we used a 100-point scale to require participants to report their task-execution willingness (0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). This metric was recorded 24 hours after neuromodulation to examine its long-term effects. PR is used to quantify the extent to which one task has been procrastinated, and was calculated as 1 - CR (task completion rate). Critically, at the precise deadline, the app prompted participants to (a) indicate task completion status (yes/no), and if incomplete, (b) report the percentage completed (1–99%), defined as the Task CR, while simultaneously uploading objective evidence (e.g., screenshots of submitted files, photos of physical outputs, system-generated logs, or app-exported records). If the task was actually completed before the deadline, the CR would be 100% and the PR would be calculated as 0% (1-CR). PR was recorded at the actual task deadline for each participant. We were also interested in re-investigating their actual procrastination by using PR 6 months after the last neuromodulation to test the long-term retention of this neuromodulation effect.”
  
  References
  
  Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235
  
  Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312
  
  Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053
  
  Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492
  
  Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643
  
  (2) Additionally, it is unclear whether the reported effects could be due to differential reporting of tasks (e.g., it could be that participants learned across sessions to report more achievable or less aversive task goals, rather than stimulation of DLPFC reducing procrastination per se). It would be helpful to demonstrate whether these self-reported tasks are consistent across sessions and similar in difficulty within each participant, which would strengthen the claims regarding the intervention.
  
  Thank you for raising this very crucial comment. We indeed agree with you on this point that the reported effects may vary with task difficulties and task-execution proficiency, which potentially confound the effects of stimulation on mitigating procrastination. As you correctly comment, given no data collection on difficulties or other relevant characteristics of tasks, we cannot completely rule out this confounder in interpreting our findings on the one hand. As a result, we have explicitly claimed this limitation in the Discussion section.
  
  On the other hand, despite no quantitative evidence, this risk of confounding main effects with disparities in task characteristics was controlled experimentally. As we reported above, all the reported tasks were mandated to meet three criteria: (a) they were already assigned in the real-world settings; (b) the deadline was constrained to 18:00-24:00; (3) they were likely to lead to procrastinate. To do so, each participant was clearly instructed to report a real-life task that was more likely to be procrastinated in real-world settings, and was not allowed to report easy, achievable and cost-less tasks. Supporting this case, those reported tasks were found spanning academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity and difficulty. This was resonated by observing the high within-subject task homogeneity. For instance, for Participant #5, she/he reported the tasks that were almost all around academic activities across all the sessions. Therefore, as the task list reported (please see Appendix 1), these self-reported tasks were plausibly consistent across sessions and similar in difficulty within each participant.
  
  In addition, as we tested, almost all the participants reported they were receiving treatment, with 91.30% (21/23) for the active neuromodulation group (NM) and with 86.95% (20/23) for the sham control group (SC) (x<sup>2</sup> = 0.224, p = .636), indicating the effectiveness of the double-blinding methods. If participants learned across sessions to report more achievable or less aversive task goals, their procrastination willingness and procrastination rates for their reported tasks would all increasingly decrease, irrespective of whether they were in the active neuromodulation-effect group or the sham group. However, no such effects - procrastination willingness and procrastination rates for their reported tasks increasingly decreasing across sessions - existed in the sham control group (Mann-Kendall test, for procrastination willingness, tau = 0.60, p = .13; for procrastination rate, tau = 0.61, p = .13), indicating no statistically significant learning effect or strategic effect on task performance. Again, thank you for this very crucial comment, and we do hope these clarifications could address it.
  
  Limitations Section (Page 12, Line 637-640)
  
  “In addition, despite instructing to report valid real-life tasks with high probabilities to procrastinate, we had not yet measured the task difficulty and consistency across sessions for each participant. Consequently, interpreting the effects of neuromodulation to mitigate procrastination as “unique contributions” should warrant cautions. ...”
  
  (3) It would be helpful to show evidence that the procrastination measures are valid and consistent, and detail how each of these measures was quantified and differed across sessions and by intervention. For instance, while the AUC metric is an innovative way to quantify the temporal dynamics of task-aversiveness, it was unclear how the timepoints were collected relative to the task deadline. It would be helpful to include greater detail on how these self-reported tasks and deadlines were determined and collected, which would clarify how these procrastination measures were quantified and varied across time.
  
  We do appreciate your highlighting the importance of clarifying how to measure procrastination, substantially helping readers to interpret these findings. As reported above, the primary outcomes of this experiment included subjective procrastination willingness and objective actual procrastination rate. For the subjective procrastination willingness, using the purpose-built mobile app, participants were required to report subjective task-execution willingness score (i.e., one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). Thus, the procrastination willingness was computed as “100 – the task-execution willingness score”. For the objective procrastination rate, rather than self-reported scores from those one-item visual analog scales, we asked participants to report the real “task completion rate from 1% to 99%” for the objective quantification of the “real-world procrastination behavior”. Full details can be found in Response #1.
  
  For determining sampling time points for the quantification of AUC, we capitalized on both the conceptual Temporal Decision Model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when being far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001). By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00).
  
  Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. After capturing the task aversiveness from those five time points, the task aversiveness discounting was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from those five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed via the trapezoidal integration between task aversiveness discounting and time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination.
  
  Taken together, following your suggestion, we have added a substantial number of details to clarify how to measure procrastination, when to sample the data and how to estimate the AUC into the revised manuscript. Please see them in Response #1.
  
  (4) There are strong claims about the multi-session neuromodulation alleviating chronic procrastination, which should be moderated, given the concerns regarding how procrastination was quantified. It would also be helpful to clarify whether DLPFC stimulation modulates subjective measures of procrastination, or alternatively, whether these effects could be driven by improved working memory or attention to the reported tasks. In general, more work is needed to clarify whether the targeted mechanisms are specific to procrastination and/or to rule out alternative explanations.
  
  Yes, we fully agree with you on this consideration: we should tone down the conclusions currently claimed in the main text, given the inherent shortcomings mentioned above. As you helpfully suggested, we have moderated our overall claims regarding the effects of multi-session neuromodulation in alleviating chronic procrastination. Please see specific instances below:
  
  Abstract Section (Page 2, Line 55-57)
  
  “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and potentially offers a validated, theory-driven strategy for interventions.”
  
  Conclusion Section (Page 13, Line 657-664)
  
  “In conclusion, this study potentially provides an effective way to reduce both procrastination willingness and actual procrastination behavior by using neuromodulation on the left DLPFC. Furthermore, such effects have been observed for 2-day-interval long-term after-effects, and were also found for 6-month long-term retention in part. More importantly, this study identified that the ms-tDCS neuromodulation could decrease task aversiveness and increase task outcome value while, and further demonstrated that the increased task outcome value could predict decreased procrastination, a relationship conceptually driven by enhancing self-control. In this vein, the current study enriches our understanding of neurocognitive mechanism of procrastination by showing the prominent role of increased task outcome value in reducing procrastination. Also, it may provide an effective method for intervening in human procrastination.”
  
  Moreover, yes, as we clarified above, in addition to the objective measure of procrastination behavior, we also leveraged a one-item visual analog scale (i.e. one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”) to measure subjective procrastination willingness. Results demonstrated that the subjective procrastination willingness significantly decreased across neuromodulation sessions in the active group, but not in the sham control group, consistent with the observed reduction in the objective procrastination measure. In addition, we all perceive it as helpful and crucial to note that we cannot draw the conclusion that the effects of neuromodulation on mitigating procrastination are contributed by increasing task outcome value uniquely. Given no measures or evidence of other factors, such as working memory and attention, we cannot rule out other neurocognitive pathways. To address this point, we have removed or rephrased such statements throughout the whole revised manuscript, and explicitly constrained to interpret this neurocognitive mechanism (i.e., increased task outcome value) within the theory-driven framework of the temporal decision model.
  
  Reviewer #3 (Public review):
  
  This manuscript explores whether high-definition transcranial direct current stimulation (HD-tDCS) of the left DLPFC can reduce real-world procrastination, as predicted by the Temporal Decision Model (TDM). The research question is interesting, and the topic - neuromodulation of self-regulatory behavior - is timely.
  
  Many thanks for kindly dedicating time to review our manuscript, and for the helpful comments detailed below. Thank you for appreciating the novelty of this study.
  
  However, the study also suffers from a limited sample size, and sometimes it was difficult to follow the statistics.
  
  Thank you for pointing out these crucial concerns. As you correctly raised, the sample size is somewhat small in any case, but we confirm that this sample size is adequate to obtain medium statistical power.
  
  For estimating the sample size, we determined the a priori effect size based on the existing work we published (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In this pilot study, we identified a significant interaction effect between single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori.
  
  Using the GPower software with an estimation of a medium effect size, we determined that a total sample size of N<sub>total</sub> = 34 could reach adequate statistical power. Please see outputs of the GPower in Author response image 1.
  
  As for the statistics, we genuinely acknowledge that the vague methodological descriptions and complex algorithms indeed complicated the understanding of the methods and statistics. To address this, echoing the comment raised by Reviewer #1, we have removed the complicated statistics and methods, and further clarified how we used the generalized linear mixed-effect model (GLMM) for statistical analysis. Please see the specific revisions below:
  
  Methods Section (Page 8, Line 378-403)
  
  “Statistics
  
  All the statistics were implemented by R (https://www.rstudio.com/) and R-dependent packages.
  
  To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test. Regarding the 6-month follow-up investigation, this GLMM was also built to examine the long-term retention of neuromodulation on reducing actual procrastination.”
  
  The preregistration and ecological design (ESM) are commendable, but I was not able the find the preregistration, as reported in the paper.
  
  We are sorry to encounter a serious technical barrier that has rendered our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account. This has prevented access to all materials deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report (please see the screenshot below). We reckon that this may be due to my affiliation change to the Third Military Medical University of People’s Liberation Army (PLA).
  
  To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” to the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the revised manuscript.
  
  Overall, the paper requires substantial clarification and tightening.
  
  We are grateful for your evaluation, and we fully agree with you. In response, we have added a tremendous number of details to clarify how to measure procrastination, how to conduct the statistical analyses, and how to collect real-life tasks, as well as other experimental materials. Please see the revisions in the Methods section of the revised manuscript. Again, thank you for those helpful suggestions.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) In the Supplemental Materials, page 4, lines 163 to 167 seem to be from a different manuscript (as the section talks about neural markers, significant clusters, and brain networks).
  
  We are sorry for erroneously embedding this irrelevant section here. We have removed it, and have double-checked the document to avoid such mistakes.
  
  (2) I'm no expert here, but some of the trace and density plots in the SOM look problematic (e.g., Figure S5 top panel). But it's not made clear to which model/analysis these plots belong, so they are not very helpful without that information.
  
  Thank you for bringing these potentially problematic plots to our attention. Following your great suggestion, these results have been removed from the SM to amplify readability and comprehensibility.
  
  (3) Table S1 reports side effects "from the neurostimulation" (this is also the language used in the main manuscript), but having the flu is rather unlikely to be a side effect from the stimulation, isn't it? Thus, this language is highly confusing, and when reading the main text, it's not clear that these are just life events that are most likely unrelated to the stimulation, but have the potential to affect the measured variables (i.e., ultimately, they seem a source of noise).
  
  We apologize for this confusing wording. Here, the “side effects” are defined as confounding effects deriving from unexpected life events that uncontrollably disrupt task execution and task performance, such as “having the flu”, or “an unexpected mandatory CCP (Communist Party of China) meeting assignment”. To obviate misunderstanding, we have rephrased “side effects” as “unexpected life events disrupting task execution” in both the main text and the SM section both.
  
  (4) The use of the English language could be improved.
  
  Thank you for your very practical suggestion. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) It would be helpful to include greater detail about the ESM procedure and details of the self-reported tasks. This would help rule out potential confounds of difficulty or learning (e.g., participants may have learned to identify more achievable and less difficult tasks across the sessions, which would mean they are learning to perform the task better rather than to procrastinate less). Further elaboration on the quantification of procrastination measures would help clarify the mechanism underlying this behavior, which is important for clarifying how these effects arise and what aspect of procrastination behavior is being targeted by the tDCS intervention (and rule of alternative explanations).
  
  We wholeheartedly appreciate your sharing this very crucial recommendation. As we mentioned above, we fully followed your helpful suggestions, particularly by adding massive details to fully report how to collect real-life tasks (with consistent and plausible difficulty across sessions), how to determine sampling time points, and how to quantify metrics (e.g., subjective procrastination willingness score, objective procrastination rate, AUC of task aversiveness, and task outcome value) to the revised manuscript. We do believe that these revisions and clarifications are imperative and necessary. By including these details, we do believe that the readability and clarity have been substantially improved in the current form. Please see the specific revisions and clarifications above.
  
  (2) It would be helpful to proofread for grammatical and spelling typos (e.g., DLPFC is spelled incorrectly in line 140, Satterwaite is spelled incorrectly in Line 415).
  
  Thank you for your kind suggestion. Both spelling typos have been corrected, and we have double-checked the revised manuscript to ensure no such typos remain. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.
  
  (3) Please clarify in Figure 4 that a higher AUC is associated with lower task aversiveness (which is stated in the methods but not clearly in the figure).
  
  Many thanks to you for your helpful suggestion. As you kindly suggested, we have clarified this case in the figure legend.
  
  Reviewer #3 (Recommendations for the authors):
  
  I want to see the preregistration.
  
  Thank you for your helpful recommendation. As we replied above, a serious technical issue on OSF occurred, making our preregistration invisible and inaccessible. OSF has disabled my account, claiming to detect “suspicious user’s activities” in my account. As a result, there is no access to all materials that were already deposited in this OSF account, including this preregistration. We have reconstructed this preregistration based on archived documents, and reported it in the SM. As we reported above, although this partially addresses the problem, it no longer fulfills the best practices of preregistration. Consequently, in addition to transparently reporting this case, we have removed all the preregistration statements throughout the revised manuscript.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

arxiv.org/abs/2506.21000v2
www.biorxiv.org www.biorxiv.org

Are interphylum spiralian relationships resolvable?

1
1. Public_Reviews 13 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  This interesting paper probes the problematic relationships between the classical "spiralian" taxa, i.e., annelids, molluscs, brachiopods, platyhelminths and nemerteans, and shows that the branches leading to them are so short as to be unreliable guides to their relationships. This, in turn, has important implications for how we view the origin of the animal phyla.
  
  Strengths:
  
  A very careful analysis of a famous old problem with quite significant results. The results seem to be robust and support their conclusions.
  
  It often passes uncommented that many different trees are published about animal relationships, yet some parts of the tree seem extremely difficult to resolve; the spiralians are perhaps the most difficult case. More recently, problems about sponges or ctenophores as sister groups to the rest of the animals have alerted us to major areas of uncertainty in large-scale phylogenetic reconstruction; this paper is a welcome reminder that other, perhaps even harder, problems exist which may be difficult to ever resolve with the (molecular) data we have.
  
  Weaknesses:
  
  The paper could have perhaps drawn out some of the implications of its results in a clearer manner.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  The relationships among the phyla making up Spiralia - a major clade of animals including molluscs, annelids, flatworms, nemerteans and brachiopods - have been challenging from a phylogenomic perspective despite decades of molecular phylogenetic effort. Every topology uniting subsets of these phyla has been recovered with apparent support in at least one study, yet no consensus has emerged even from large-scale genomic datasets. Serra Silva and Telford set out to determine whether this instability reflects a genuine biological signal being obscured by analytical limitations, or whether it reflects a rapid, near-simultaneous origin of these phyla that has left behind in modern genomes far too little phylogenetic information to resolve. They focused deliberately on five phyla, reducing the problem to a tractable set of 15 unrooted and 105 rooted topologies, and applied a suite of complementary approaches across two independent datasets and multiple substitution models to test whether any topology is significantly preferred over alternatives.
  
  Strengths:
  
  (1) The conceptual framing of the problem is excellent, and the study makes a convincing case across several lines of evidence. By enumerating all possible topologies and demonstrating empirically that every one of the 15 unrooted arrangements has been recovered as the preferred solution in at least one published study, the authors make a strong argument about the state of the field. The use of two entirely independent datasets as a consistency check is great, and convergence between them, where it occur,s substantially strengthens confidence in the conclusions.
  
  (2) It is my view that the simulation framework is a particular strength. Generating data on a fully unresolved star tree and scoring those data under both correctly-specified and misspecified substitution models provides convincing evidence that the strong preference for rooting Spiralia on the flatworm branch is, at least partly, an analytical artefact driven by the exceptionally long branch in combination with compositional heterogeneity across sites. This is an important methodological demonstration with implications beyond spiralian phylogenetics, as the same issue is likely to affect other deep, long-branched lineages in the animal tree of life.
  
  (3) The randomised taxon-jackknifing approach is a very nice addition here. The demonstration that preferred topologies shift depending on which species happen to be sampled (even within the same phylum) is a convincing indicator of weak signal, and provides a practical caution for future studies that may report strong support for a particular spiralian arrangement based on a fixed taxon sample.
  
  (4) The branch-length analyses, benchmarking internal interphylum branches against the already disputed and extremely short branch uniting deuterostomes (work also by this group), are well-conceived and solid.
  
  (5) I think it is worth highlighting the notable intellectual honesty throughout the paper: the authors do not overstate their results, correctly acknowledging that while the unrooted topology grouping molluscs with brachiopods and flatworms with nemerteans emerges most consistently, this preference is not statistically significant under more adequate substitution models and may itself carry some artefactual component.
  
  Weaknesses:
  
  (1) The restriction to five phyla is the most significant limitation, as the authors acknowledge this and give a clear computational justification, but readers should be aware that the paper's convincing conclusions apply specifically to the five focal phyla and the evidence remains incomplete with respect to spiralian phylogeny as a whole.
  
  (2) The treatment of substitution model adequacy, while commendably thorough for site-heterogeneous models, is necessarily bounded. The authors note that models accounting for non-stationarity, across-lineage compositional heterogeneity, or mixtures of tree histories might yield different results, and that even the most sophisticated currently available approaches have not produced consistent spiralian topologies across studies. This is not a criticism of what has been done here - the analytical scope is reasonable and well-implemented - but it means the paper cannot be read as a definitive demonstration that no model will ever resolve these relationships. The distinction between a true hard polytomy and a radiation that is effectively unresolvable given current data and methods could be drawn more sharply in the discussion.
  
  (3) The reticulation-aware coalescent analyses are presented somewhat briefly relative to the likelihood-based topology scoring. The finding that flatworms are recovered within a paraphyletic jaw-bearing animal clade in both summary trees - interpreted as long-branch attraction - is striking, and its implications for gene-tree-based approaches to spiralian rooting deserve more discussion than they currently receive.
  
  (4) The central conclusions - that interphylum branches in Spiralia are extraordinarily short, that topological preferences are strongly model-dependent and taxon-sampling-sensitive, and that an ancient rapid radiation is the most parsimonious explanation - are convincingly supported by the evidence presented. The identification of flatworm long-branch attraction as an important confounding factor in rooting analyses is itself an important and well-demonstrated result.
  
  Conclusion:
  
  This paper clearly makes an important contribution to the ongoing debate about spiralian relationships and, more broadly, to methodological discussions about how to handle anciently diversified clades where phylogenetic signal is genuinely limited. The exhaustive topology-scoring framework combined with taxon-jackknifing and simulation under unresolved trees is a valuable methodological template that could usefully be applied to other notoriously difficult nodes in the animal tree. I thoroughly enjoyed the discussion of the implications of these findings for interpreting Cambrian fossils and the evolutionary history of shells, segmentation, larval types and other characters - it is both thoughtful and thought-provoking and will be of broad interest well beyond the phylogenomics and zoology communities. From a very practical perspective, the data and scripts provided make the work useful to researchers wishing to apply similar approaches to other groups.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This paper addresses the controversial internal relationships within the Spiralia, a major clade of invertebrate animals including molluscs, annelids, brachiopods and flatworms.
  
  Strengths:
  
  Performs a range of empirical analyses and simulations that address the core question. Although a favoured unrooted topology finds some support, this is not strongly endorsed in the paper.
  
  Weaknesses:
  
  (1) Only considers a subset of relevant phyla (e.g. gastrotrichs are relevant to the phylogenetic position of Platyhelminthes), although how this would change the scale of the analyses (i.e. number of topologies) is addressed in the paper.
  
  (2) Discussion of Spiralia evolution and broader context, particularly the relevance for the fossil record. Line 448: our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction, which have unusual character combinations, have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.
  
  (3) This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like Radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.
  
  We thank the reviewers for their kind comments. Please see below for detailed responses to all identified weaknesses.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  Some minor comments that might help improve the paper:
  
  (1) Abstract L17. "Most analyses on the 15 unrooted trees showed a preference for the same topology but the support over other solutions was non significant" - I don't really understand this sentence in the context of the paper; it makes it sound as if the tree is, after all, well resolved! Non-significant, or not significant better than non significant?
  
  Having read the rest of the paper I see what this refers to (uT4), but still I don't understand the second clause.
  
  Re-written to clarify.
  
  (2) Introduction L31. This makes it sound as if phoronids are actually part of brachiopods, and while that was recovered by Cohen and Weydmann 2005, I'm not sure if it's really a general result. In addition, rather than using "brachiopods plus phoronids" everywhere, you could use "Brachiozoa" (Cavalier-Smith 1998, Biol. Rev).
  
  We have updated our text and figures to use Brachiozoa.
  
  (3) L36-37. Yes, but the presence of Chaetagnatha in this clade is suggestive that their primitive body size is not small.
  
  Have made clear that chaetognaths are not all tiny.
  
  (4) L85. Kumar et al. may have claimed that Spiralia are as old as 670, but many other analyses would suggest a range of different results. Why choose just this one? In addition, this age seems rather incompatible with your results.
  
  We agree this maximum age is highly improbable (the principal point remains the deep age of the protostomes). We have used a different reference and refer to a generally acceptable minimum age only.
  
  (5) L88. The key part of this sentence, "proving a hard polytomy", comes at the end of a long set of references that makes it hard to connect to the lead-in "given the age of", so I would suggest rephrasing.
  
  Rephrased for clarity.
  
  (6) L109. It is unclear what this means in the context: "and even support multiple topologies".
  
  Re-worded for clarity.
  
  (7) Figure 1. Why did you choose to indicate brachiopods plus phoronids as a larval form, unlike the other clades? Perhaps it's because we don't know what the last common ancestor of the two looked like (unless P is an ingroup of B), but that's arguably true for some of the other clades as well!
  
  Apologies, this was laziness as we already had a line drawing of an actinotroch larva. Have improved the images in figures 1 and 5 where required.
  
  (8) L164. Reticulation-aware analyses. As I understand it, this would include introgression, hybridization, etc. However, incomplete lineage sorting has also been invoked, not just for Cambrian-explosion age events but also for other major radiations, such as for angiosperms and birds. How significant might ILS be for generating the results you get?
  
  Section title amended. Results section updated to reflect this. We now explicitly mention the potential impact of ILS and introgression on spiralian relationships in our discussion.
  
  Unrooted trees analysis:
  
  (9) L405 on. Maybe it would be worth including a figure showing the relative branch lengths of uT4. All the images of trees show similar-length branches, which gives off the wrong impression within the context of the paper!
  
  We understand the motivation, but we worry that showing uT4 as the sole phylogram may end up with this being interpreted by a casual reader as being the main result of the paper. Hopefully the figures with branch lengths encompass this information well enough and with no danger of misinterpretation.
  
  (10) L430 on. Why is this a "conservative" interpretation?
  
  Yes agreed not clear. Have changed to “We interpret our results as showing that…”
  
  (11) You mention synapomorphy accumulation time and implicitly equate shortness of branches with shortness of time. However, other options are available under varying diversification rate models (e.g. ClaDs, Barido-Sottani et al. 2023 Syst. Biol.; CET, Budd and Mann 2025, Syst.Biol.). In particular, the latter paper shows that when unusually large clades are selected for study (as is arguably the case here), then those clades are likely to have started with very high "evolutionary tempo", which speeds up all aspects of evolution, including diversification rates.
  
  In the Budd and Mann scenario large clades begin with high tempo of cladogenesis, high substitution rate and high diversification rate (rapid origin of new characters). This would suggest that the period of the radiation was extra rapid (even less time than in a ‘normal’ period during which smaller clades emerge) so we feel the point stands.
  
  (12) L449. Maybe refer to the Song et al. paper again here on scaphopods plus bivalves, as it makes the same sort of points, albeit in a slightly different context.
  
  We thank the reviewer for the suggestion and have added the citation where relevant.
  
  (13) Finally, to return to L20. You mention implications for the Cambrian fossil record, but then fail to deliver any!
  
  We have hopefully addressed this remark in the discussion better (at least to the extent we are qualified to).
  
  Yet if you are correct, then synapomorphy accumulation would unite groups of phyla, and would surely lead to a scenario highly incompatible with clock models suggesting deep origins of clades (as they would all be more fossilisable).
  
  Apologies but we don’t completely understand this point as ‘synapomorphy accumulation would unite groups of phyla’ is a little ambiguous. Of course, this is generally true, but our results suggest there was little opportunity to accumulate identifiable synapomorphies linking pairs, triplets or quartets of our 5 spiralian phyla.
  
  In addition, clock results suggest rather long periods of time leading to the phyla, which would imply that there would have to be extremely slow rates of molecular evolution to yield the short early branches here. Also, it might be worth referring to papers compatible with this view, such as Wernström, J.V. et al., EvoDevo 13, 17 (2022). https://doi.org/10.1186/s13227-022-00202-8 or some of the palaeo literature, such as Budd and Jackson 2016, Phil Trans.
  
  The referee refers to clock results suggesting a (deep) Ediacaran origin of Lophotrochozoa/Spiralia. We interpret the spiralian radiation itself as rapid but, in the absence of a clock analysis, we cannot comment on when it took place.
  
  Reviewer #2 (Recommendations for the authors):
  
  (My not very) Major points - as I feel this is an excellent paper.
  
  (1) The coalescent-based summary tree analyses warrant expansion. The recovery of flatworms within a paraphyletic jaw-bearing animal clade in both summary trees is a striking result attributed to long-branch attraction, but this interpretation would be strengthened by examining whether pruning or downweighting the longest-branching taxa within those groups affects the outcome, or by reporting per-node quartet scores more fully. This would make the reticulation-aware results more directly informative and would bring this section into better balance with the detailed likelihood-based analyses.
  
  We thank the reviewer for the suggestion of the expanded analyses. We have now done these, and they yielded essentially the same results as the unpruned analyses. Additionally, while not discussed, we ran the Astral analyses on the subset of gene-trees where all groups of interest (spiralian phyla and superphyletic Ecdysozoa, Deuterostomia, etc.) were monophyletic and found no changes to interphylum quartet scores beyond those due to enforced (super)phylum monophyly, with Platyhelminths still recovered within Gnathifera.
  
  We have expanded our description of the results slightly as well as our discussion. Location of the tables with detailed quartet scores and local posterior probabilities has been added to Fig. S1’s legend.
  
  (2) It would strengthen the paper to include at least a brief analysis or explicit discussion of whether any currently available models accounting for non-stationary or across-lineage compositional heterogeneity show any change in the pattern of support, even if only tested on a subset of topologies. A null result here would itself be informative and would make the conclusions more robust to the concern that unexamined model classes might behave differently.
  
  We thank the reviewer for the suggestion, but this represents a considerable amount of new work and we think it falls outside the scope of the present work. We have, as suggested, included this as a discussion point.
  
  (3) The authors note that topologies grouping flatworms with ribbon worms appear among the higher-scoring arrangements even under model misspecification in simulations. It would be helpful to comment explicitly on whether the apparent signal for this grouping should therefore be regarded with particular scepticism, or whether it survives artefact correction in any of the analyses, as this is a grouping that has appeared repeatedly in the literature and readers will want guidance on how to interpret it.
  
  We do state that the nemertean+platyhelminth grouping seems likely to be at the least emphasised by an artefact (as the referee points out it is common to the higher scoring trees in the star tree simulations). We state that this suggests “…that this grouping derives some support from systematic errors.” We now return briefly to this in the discussion.
  
  Writing and presentation
  
  (1) The abstract states that rooting Spiralia on the flatworm branch "is a long-branch artefact" - this is slightly stronger than the language used in the body of the paper, where the authors correctly write that this preference is "at least enhanced by" the artefact. The abstract phrasing should be softened to reflect the more nuanced conclusion in the text.
  
  Good point. Done.
  
  (2) A brief signposting sentence near the start of the Results, setting out the overall analytical logic before the individual sections begin, would help orient readers. The strategy - score all topologies, test robustness to model choice and taxon sampling, then use simulation to identify artefactual signals - is clear in retrospect but would benefit from being made explicit upfront.
  
  We have taken this suggestion on board. The summary seemed in the end better placed as the final part of the introduction.
  
  (3) Figure 3 is complex and would be easier to interpret with a brief explanatory note in the legend clarifying what a wide versus narrow range of log-likelihood scores across topologies means in practical terms for statistical resolution between trees.
  
  Added sentence to legend.
  
  Minor Corrections:
  
  (1) The Figure 2 legend contains a typographical error: "shorter than the short, disputed deuterostome branch" should read "shorter than."
  
  Done
  
  (2) At least one reference appears to carry a future publication year (Ishii et al., 2026) and should be verified for accuracy before final submission.
  
  This reference is correct per the journal’s website. We did find Google Scholar to list it as being from 2025.
  
  Reviewer #3 (Recommendations for the authors):
  
  (1) Abstract/SI definitions of Spiralia/Lophotrochozoa
  
  While I don't have strong feelings about this, if Spiralia is being used as an apomorphy-based name, then it still might be equivalent to Lophotrochozoa, as spiral cleavage in Gnathostoniula jenneri was illustrated by Riedl (1969). Although no other studies have replicated this observation, this should at least be mentioned.
  
  Sorry this reference to gnathostomulid spiral cleavage was included in a longer version of the discussion of nomenclature. This was first reduced in length (which was when the mention of gnathostomulid spiral cleavage was dropped) then finally moved to the supplementary material. We have now re-included mention of this in the discussion in supplementary info.
  
  The SI text suggests that the name Lophotrochozoa, as used in its original form by Halanych et al. (1995), was a node-based definition, and that this name is for the sister group of Ecdysozoa. However, in that paper, the name is actually defined as "as the last common ancestor of the three traditional lophophorate taxa, the molluscs, and the annelids, and all of the descendants of that common ancestor". This definition would exclude Gnathifera, and depending on the internal relationships of the non-Gnathiferan phyla, may be equivalent (or not) to the usage of the name Spiralia adopted in the present paper. The perils of mixing node and apomorphy-based definitions of clades are clear, and the situation is less straightforward than the paper suggests, and (somewhat unhelpfully given the subject of the paper) may only become clearer if the relationships of non-ecdysozoan protostomes are resolved.
  
  We believe that the community universally understood the definition of Lophotrochozoa following the 1997 paper (by the authors who also provided the original 1995 definition). This 1997 definition included both chaetognaths and rotifers as examples of the Gnathifera. The Spiralia, in contrast, began life not even as a name for a clade but a description of a character shared by some apparently unrelated taxa – similar to a grouping of ‘carnivores’. The introduction of a new name was, we suggest, unhelpful. We hope that by defining our terms up front the meaning in the current paper is clear.
  
  (2) Introduction
  
  Line 76. Some references needed regarding claims that there was a polymeric brachiopod ancestor, e.g. Gutman (1978), Temereva and Malakhov (2011), Guo et al. (2023). Likewise for the chaetae of brachiopods, annelids and molluscs, e.g. Schiemann (2017), as it's key to trace where these ideas originated.
  
  Added
  
  Figure 1. This is a nice illustration of the uncertainty in the relationships of these groups. However, I kept checking which thumbnail image was which for nemerteans and annelids. A minor suggestion, but perhaps a polychaete instead for the annelid?
  
  We have replaced the rather poor image of an earthworm with a polychaete and also now include labels. We hope the improved images are more helpful. Good point.
  
  (3) Results
  
  Branch length comparison. I understand why the deuterostome stem was chosen as the branch for comparison from the point of view of phylogenetic uncertainty. However, what about the branch leading to ecdysozoa or the branch subtending lophotrochozoan and/or gnathifera? Given that the short internodes are used as an argument underpinning uncertain relationships, can we be sure that Gnathifera is not nested within the group of interest, especially given that Gnathifera contains many long-branched taxa and the root may be misplaced within the group?
  
  We have added the Lophotrochozoa and Ecdysozoa median lengths to our plots and now discuss both the lophotrochozoan branch in our results.
  
  Line 249. Given that Spiralia is the group of interest, why were the Gnathiferans also chosen at random?
  
  The point of the experiment was to see the effect of taxon sampling on the consistency of the resulting topology. Random sampling across the tree seems helpful in this context. We chose Gnathifera as one group to sample from as this ensured they would be present in all trees. This seems appropriate as they are the sister group of the clade of interest and as such their inclusion reflects a choice a typical investigator might make when choosing which species to include. Additionally, as noted in the reviewer’s earlier comment, Gnathifera includes many long-branched taxa and we wanted to ensure our root-placement results were robust to this aspect of taxon sampling.
  
  (4) Discussion
  
  Line 448. Our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction that have unusual character combinations have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.
  
  This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.
  
  We accept these points (though are clearly not experts on these fossils). We have (slightly tentatively given our lack of expertise) expanded our discussion to include these fossil taxa with their combinations of characters.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.64898/2026.01.25.701568v2
bafybeihwigujdzh7xrbwmf2t2zv5eku6cr3reb5qzqmhgrpnfdd2ryhh7y.ipfs.dweb.link bafybeihwigujdzh7xrbwmf2t2zv5eku6cr3reb5qzqmhgrpnfdd2ryhh7y.ipfs.dweb.link

HyperePost | Future of Text Talk

1
1. indyweb 13 May 2026
  
  in Public
  
  Who is doing the job of helping you to map the expanding frontier of your knowledgeparaphrased: Who is doing the job of creating a map of the Web frontier as you explore it. (200????) Or indeed a map of the territory already explored, ready to resume exactly where you left off. How Hyperpost addresses these problemsDeep rearrangability and repurposabilty supported by a new "Cosmology for Computing' Capture Intertwingularity as scaffoldings of everything you care about in one placeAdd new capabilities at the Meta LevelHyperPost: The Thought Processor for Google+Google+: interest based social networking serviceSocial Knowledge Network: Intersection of Knowledge Graph, Google+ Circles and Thought GraphsGoogle+: interest based social networking serviceCircles: entire saffolding: Vannevar Bush: American electrical engineer and science administratorMemex: hypothetical proto-hypertext system that Vannevar Bush described in 1945HyperPost: The Thought Processor for Google+trails: Trail blazing: StubMemex: hypothetical proto-hypertext system that Vannevar Bush described in 1945Connected neighborhoods of nodes thus conveyed contain not only the information presented in the narrative trails, but they also contain as it were the entire scaffolding with which they were erected. Trail blazing as in the Memex Thought Vectors in Concept Space kernel for tinkerable Hypermedia Direct manipulation interfaces to suit personal needsthe Lively Kernel project.tinkerable: through associations: "The Human mind works by association" As We May Think - The Atlantickernel: main component of most computer operating systemsLively Kernel: StubDirect manipulation interface: StubHypermedia: Hypermedia, an extension of the term hypertext, is a nonlinear medium of information that includes graphics, audio, video, plain text and hyperlinks.To have sufficient built in capability in the kernel that support tinkerable Hypermedia formats incorporating Direct manipulation interfaces to suit personal needs, as it was done in the Lively Kernel project. Thought Graph Search for Things as you writeThe sentences that you write are Nodes Structural linksHyperPost: The Thought Processor for Google+search and mention: entity: something that existswrite about things: Node: network conceptThought Graph: HyperPost invites us to search and mention all the things that are important in the context of our thoughts that are related to the things we write about. The sentences you write down are turned into Nodes in a Thought Graph Public Knowledge Graph Incorporate Entities from Google's Knowledge GraphWikiData auto suggest boxesThing: Wikidata: free knowledge database project hosted by Wikimedia and edited by volunteersPersonal Knowledge Graph: Google: American multinational Internet and technology corporationKnowledge Graph: knowledge base used by Google to enhance its search engine's search resultsWhen you want to mention some Thing the search box autosuggests matching entities drawn Wikidata.A new node in the user's Personal Knowledge Graph is created that references the node in the Google's Knowledge Graph. Personal Entities When you reach the edge of your recorded knowledgeThe Wiki GambitCreate your own on the flyAutomatic contextualizationfor thoughts and discovered web resourcesFocus on what you write, not where you put itPersonal Entity: wiki: type of website that visitors can editIn case no public entity matches the user's search a new Personal Entity node is created in the user's Personal Knowledge Graph. This is analogous to the greatest gambit of the wiki. When you reach the edges of your knowledge just create a new page for it. Here it is more fine grained, it is just a node. You do not need to think up a name for a page. Nor would you need to worry about where it is created, because the identity of the node is independent from where you put it. Context of discovery and Justification Like the eval and apply of LISPContext of justification: refers to the later or final phase of research when evidence is applied to and compared with a hypothesis.Context of discovery: StubThe Lakatos's term Context of discovery can be created by marking trails during your web research with HyperPost, whereas in the Context of justification linking to web resources discovered completes the circle. Blaze Trails Attach Narrative Trails to entity nodeslink to web resourcesThe context for a sentence automatically contextualizes linked resourceDiscussion Threads: It is possible to attach Narrative Trails to any entity node so that more information about it can be further elaborated. These narrative trails comprise sequences of paragraph, which in turn, consist of sentences for individual thoughts. In addition links to web resources can be attached so that they are linked to relevant contexts and will not be lost. Deep Re-arrangability and Re-purposing Reuse through transclusion any trails or contextproduce every sentence is a node, it can be moved, transcluded in any contextsocial media Posts, blog posts, Presentations, Project Plans, Issue Trackers rooted in your own graph of all your articulated knowledgetransclusion: technical method of including some or all of one stored document in another document, without having to copy the data itselfDeep Rearrangeabilty: Ted Nelson: American information technologist, philosopher, and sociologist; coined the terms "hypertext" and "hypermedia"immitationg paper: social media: interaction among people in which they create, share, and/or exchane information and ideas in virtual communities and networksPosts: blog: discussion or informational site published on the World Wide WebPresentation slide: A slide is a single page of a presentation. Collectively, a group of slides may be known as a slide deckBy providing suitable structural links all kinds of presentation format's like social media Posts, blog posts, Presentation slides, etc can be applied to arbitrary network of nodes in the Thought Graph. Combine that with transclusion and we have "Deep Rearrangeabilty" ref required to solve Ted Nelson's problem with "immitating paper" ref Capture Intertwingularity as scaffoldings of everything you care about in one placeAdd new capabilities at the Meta LevelHyperPost: The Thought Processor for Google+Google+: interest based social networking serviceSocial Knowledge Network: Intersection of Knowledge Graph, Google+ Circles and Thought GraphsGoogle+: interest based social networking serviceCircles: entire saffolding: Vannevar Bush: American electrical engineer and science administratorMemex: hypothetical proto-hypertext system that Vannevar Bush described in 1945HyperPost: The Thought Processor for Google+trails: Trail blazing: StubMemex: hypothetical proto-hypertext system that Vannevar Bush described in 1945Connected neighborhoods of nodes thus conveyed contain not only the information presented in the narrative trails, but they also contain as it were the entire scaffolding with which they were erected. Trail blazing as in the Memex Demo This presentation was created in HyperPostPosts can be derived from it and will be publishedword processor: computer program used for writing and editing documentsHyperPost is used to generate the presentation it remains the master.Working with it preserves all the familiar characteristics of a word processor augmented to accommodate thoughts and knowledge in their native associative graph model. ConclusionHyperPost shows the way how to overcome the problem with paperIt is put forward as one possible way forward to reinvent hypertext for Academia Availibility Hyperpost landing page: Landing Page | hyperPostThis presentation will shortly be available at Hyperpost landing page For people who sign up for the beta an extended version will be made available presenting a much larger graph, containing our development road map. It will be dynamically extended. Thanks And Thanks for all the fish
  
  map the expanding frontier
Visit annotations in context

Annotators

indyweb

URL

bafybeihwigujdzh7xrbwmf2t2zv5eku6cr3reb5qzqmhgrpnfdd2ryhh7y.ipfs.dweb.link/slides2.html
www.biorxiv.org www.biorxiv.org

Five-layer systems analysis of Leishmania stage differentiation reveals an essential role for protein degradation in parasite development

1
1. Public_Reviews 12 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Comments on revised version:
  
  The authors have appropriately addressed my comments and questions from the initial review process. My remaining concern relates to the lack of evidence to confirm proteasomal inhibition by lactacystin in both promastigotes and amastigotes. The immunoblotting experiment newly presented does not reveal a clear increase in the levels of poly-ubiquitylated proteins in treated parasites. In fact, poly-Ub levels were lower at both the 4h and 18h timepoints of treatment. If alternative antibodies or additional immunoblots are not available, the manuscript would benefit from an expanded discussion of this observation and potential explanations. In particular, the interpretation that lactacystin stabilizes ama- and pro-specific degradation would be greatly strengthened by such validation.
  
  Reviewer #2 (Public review):
  
  General comments on the revisions:
  
  My view is that the authors have made significant, satisfactory changes that address the comments and queries I made on the original manuscript (Review Commons).
  
  There are two areas where the authors had to make major changes/justifications where further comment is merited, these were:
  
  RNA-seq.
  
  The most significant issue was the originally underpowered RNA-seq which had only two replicates. This has been repeated with four replicates now. This has not led to changes in the interpretation of the data between the original study and this one. One comment that the authors make in the response to this was : "Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary". Ensuring that animal experiments are properly powered and that maximum robustness of the data from the minimum sample size is an important part of experimental design for ethical use of animal models. Essentially the replication here could have been avoided if the original study had used 1 more animal. However, the new version of RNA-seq brings appropriate confidence to the interpretation of the data.
  
  Phosphoproteomics.
  
  The authors provide a robust justification of their strategy for the phosphoproteomics and highlight the inclusion criteria for phosphosites: "Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate". The way missing values were dealt with is explained "For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition." This fills in some of the gaps I was missing from the original manuscript, and I am satisfied that the data analysis is entirely appropriate for a discovery/system -based approach such as this one. The authors also edit the manuscript to reflect that "occupancy" or "stoichiometry" might not be the best description of what they were presenting and switched to the terminology of "normalised phosphorylation level" - I think this is an appropriate response.
  
  Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.
  
  In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.
  
  Strengths:
  
  Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.
  
  The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.
  
  The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.
  
  Weaknesses:
  
  While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.
  
  Significance and Impact in the field.
  
  The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.
  
  The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.
  
  According to the reviewers’ comments, we made the following minor changes:
  
  As suggested by reviewer 1, we have extended the discussion of the results related to the analysis of the ubiquitination pattern by Western blot analysis as follows: “Proteasome inhibition blocked amastigote-to-promastigote differentiation, without inducing rapid global accumulation of ubiquitinated proteins (Figure S7C, upper panel) consistent with a quiescent-like state and low basal ubiquitin–proteasome system activity in amastigotes. After 18 h, ubiquitination levels remained similar to untreated cells, indicating that protein turnover and ubiquitin accumulation are primarily driven by developmental remodeling rather than acute proteasome inhibition. In promastigotes, the lack of detectable change (Fig. S7C, lower panel) may also reflect high basal ubiquitination, engagement of compensatory pathways such as autophagy, and/or only partial proteasome inhibition.”
  
  Recommendations for the authors:
  
  Reviewer #3 (Recommendations for the authors):
  
  Minor comments:
  
  - Supplementary figure 3 is not referenced in the main text.
  
  - The authors removed the "infinite" sign from figures 3 and 4 to better present the data according to their chosen approach to missing values when LFQ=0. However, the sign is still present in the respective figure legends, please adjust.
  
  Supplementary Figure 3 (Figure S3) is now referenced in the main text as requested.
  
  The "infinite" sign has been removed from the legends of Figures 3 and 4 as requested.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.03.24.644963v3
www.biorxiv.org www.biorxiv.org

The role of MICOS in organizing mitochondrial cristae in malaria parasites

1
1. Public_Reviews 12 May 2026
  
  in eLife
  
  Author response:
  
  Reviewer #1 (Evidence, reproducibility and clarity):
  
  Summary:
  
  This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then this develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors at HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite. They also genetically deleted both gene singly and in parallel and phenotyped the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensible for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using EM techniques they show that the morphology of gametocyte mitochondria is abnormal in the knockout lines, although there is great variation.
  
  Major comments:
  
  The manuscript is interesting and is an intriguing use of a well studied organism of medical importance to answer fundamental biological questions. My main comments are that there should be greater detail in areas around methodology and statistical tests used. Also, the mosquito transmission assays (which are notoriously difficult to perform) show substantial variation between replicates and the statistical tests and data presentation are not clear enough to conclude the reduction in transmission that is claimed. Perhaps this could be improved with clearer text?
  
  We would like to thank the reviewer for taking the time to review our manuscript. We are happy to hear the reviewer thinks the manuscript is interesting and thank the reviewer for their constructive feedback.
  
  To clarify the statistical analyses used, we included a new supplementary dataset with all statistical analyses and p-values indicated per graph. Furthermore, figure legends now include the information on the exact statistical test used in each case.
  
  Regarding mosquito experiments, while we indeed reported a reduction in transmission and oocysts numbers, we are aware that this effect might be due to the high variability in mosquito feeding assays. To highlight this point, we deleted the sentence “with the transmission reduction of [numbers]….” and we included the sentence “The high variability encountered in the standard membrane feeding assays, though, partially obstructs a clear conclusion on the biological relevance of the observed reduction in oocyst numbers“
  
  More specific comments to address:
  
  Line 101/Fig1E (and figure legend) - What is this heatmap showing. It would be helpful to have a sentence or two linking it to a specific methodology. I could not find details in the M+M section and "specialized, high molecular mass gels" does not adequately explain what experiments were performed. The reference to Supplementary Information 1 also did not provide information.
  
  We added the information “high molecular mass gels with lower acrylamide percentage” to clarify methodology in the text. Furthermore, we extended the figure legend to include all relevant information. Further experimental details can be found in the study cited in this context, where the dataset originates from (Evers et al., 2021).
  
  Line 115 and Supplementary Figure 2C + D - The main text says that the transgenic parasites contained a mitochondrially localized mScarlet for visualization and localization, but in the supplementary figure 2 it shows mitotracker labelling rather than mScarlet. This is very confusing. The figure legend also mentions both mScarlet and MitoTracker. I assume that mScarlet was used to view in regular IFAs (Fig S2C) and the MitoTracker was used for the expansion microscopy (Fig S2D)?
  
  Please clarify.
  
  We thank the reviewer for pointing this out – this was indeed incorrectly annotated. We used the endogenous mito-mScarlet signal in IFA and mitoTracker in U-ExM. The figure annotation has now been corrected.
  
  Figure 2C - what is the statistical test being used (the methods say "Mean oocysts per midgut and statistical significance were calculated using a generalized linear mixed effect model with a random experiment effect under a negative binomial distribution." but what test is this?)?
  
  The statistic test is now included in the material and method section with the sentence “The fitted model was used to obtain estimated means and contrasts and were evaluated using Wald Statistics”. The test is now also mentioned in the figure legend.
  
  Also the choice of a log10 scale for oocyst intensity is an unusual choice - how are the mosquitoes with 0 oocysts being represented on this graph? It looks like they are being plotted at 10^-1 (which would be 0.1 oocysts in a mosquito which would be impossible).
  
  As the data spans three orders of magnitude with low values being biologically meaningful, we decided that a log scale would best facilitate readability of the graph. As the 0 values are also important to show, we went with a standard approach to handle 0s in log transformed data and substituted the 0s with a small value (0.001). We apologize for not mentioning this transformation in the manuscript. To make this transformation transparent, we added a break at the lower end of the log-scaled y-axis and relabelled the lowest tick as ‘0’. This ensures that mosquitoes with zero oocysts are shown along the x-axis without being assigned an artificial value on the log scale. We would furthermore like to highlight that for statistics we used the true value 0 and not 0.001.
  
  Figure 2D - it is great that the data from all feeding replicates has been shared, however it is difficult to conclude any meaningful impact in transmission with the knock-out lines when there is so much variation and so few mosquitoes dissected for some datapoints (10 mosquitoes are very small sample sizes). For example, Exp1 shows a clear decrease in mic19- transmission, but then Exp2 does not really show as great effect. Similarly, why does the double knock out have better transmission than the single knockouts? Sure there would be a greater effect?
  
  We agree with the reviewer and with the new sentence added, as per major point, we hope we clarified the concept. Note that original Figure 2D has been moved to the supplementary information, as per minor comment of another reviewer.
  
  Figure 3 legend - Please add which statistical test was used and the number of replicates.
  
  Done
  
  Figure 4 legend - Please add which statistical test was used and the number of replicates.
  
  Done. Regarding replicates, note that while we measured over 100 cristae from over 30 mitochondria, these all stem from the same parasite culture.
  
  Figure 5C - the 3D reconstructions are very nice, but what does the red and yellow coloring show?
  
  Indeed, the information was missing. We added it to the figure legend.
  
  Line 352 - "Still, it is striking that, despite the pronounced morphological phenotype, and the possibly high mitochondrial stress levels, the parasites appeared mostly unaffected in life cycle propagation, raising questions about the functional relevance of mitochondria at these stages."
  
  How do the authors reconcile this statement with the proven fact that mitochondria-targeted antimalarials (such as atovaquone) are very potent inhibitors of parasite mosquito transmission?
  
  Our original sentence was reductive. What we wanted to state was related to the functional relevance of crista architecture and overall mitochondrial morphology rather than the general functional relevance of the mitochondria. We changed the sentence accordingly.
  
  Furthermore, even though we do not discuss this in the article, we are aware of mitochondria targeting drugs that are known to block mosquito transmission. We want to point out that it is difficult to discern the disruption of ETC and therefore an impact on energy conversion with the impact on the essential pathway of pyrimidine synthesis, highly relevant in microgamete formation. Still, a recent paper from Sparkes et al. 2024 showed the essentiality of mitochondrial ATP synthesis during gametogenesis so it is very likely that the mitochondrial energy conversion is highly relevant for transmission to the mosquito.
  
  Reviewer #1 (Significance):
  
  This manuscript is a novel approach to studying mitochondrial biology and does open a lot of unanswered questions for further research directions. Currently there are limitations in the use of statistical tests and detail of methodology, but these could be easily be addressed with a bit more analysis/better explanation in the text.
  
  This manuscript could be of interest to readers with a general interest in mitochondrial cell biology and those within the specific field of Plasmodium research.
  
  My expertise is in Plasmodium cell biology.
  
  We thank the reviewer for the praise.
  
  Reviewer #2 (Evidence, reproducibility and clarity):
  
  Major comments:
  
  (1) In my opinion, the authors tend to sensationalize or overinterpret their results. The title of the manuscript is very misleading. While MICOS is certainly important for crista formation, it is not the only factor, as ATP synthase dimer rows make a highly significant contribution to crista morphology. Thus, one can argue with equal validity that ATP synthase should be considered the 'architect', as it's the conformation of the dimers and rows modulate positive curvature. Secondly, while cristae are still formed upon mic60/mic19 gene knockout (KO), they are severely deformed, and likely dysfunctional (see below). Thus, I do not agree with the title that MICOS is dispensable for crista formation, because the authors results show that it clearly is essential. So, the title should be changed.
  
  We thank the reviewer for taking the time to review our manuscript.
  
  Based on the reviewers’ interpretation we conclude the title does not come across as intended. We have changed the title to: “The role of MICOS in organizing mitochondrial cristae in malaria parasites”
  
  The Discussion section starting from line 373 also suffers from overinterpretation as well as being repetitive and hard to understand. The authors infer that MICOS stability is compromised less in the single KOs (sKO) in compared to the mic60/mic19 double KO (dKO). MICOS stability was never directly addressed here and the composition of the MICOS complex is unaddressed, so it does not make sense to speculate by such tenuous connections. The data suggest to me that mic60 and mic19 are equally important for crista formation and crista junction (CJ) stabilization, and the dKO has a more severe phenotype than either KO, further demonstrating neither is epistatic.
  
  We do agree with the reviewer’s notion that we did not address complex stability, and our wording did not make this sufficiently clear. We shortened and rephrased the paragraph in question.
  
  The following paragraphs (line 387 to 422) continues with such unnecessary overinterpretation to the point that it is confusing and contradictory. Line 387 mentions an 'almost complete loss of CJs' and then line 411 mentions an increase in CJ diameter, both upon Mic60 ablation. I do not think this discussion brings any added value to the manuscript and should be shortened. Yes, maybe there are other putative MICOS subunits that may linger in the KOS that are further destabilized in the dKO, or maybe Mic60 remains in the mic19 KO (and vice versa) to somehow salvage more CJs, which is not possible in the dKO. It is impossible to say with confidence how ATP synthase behaves in the KOs with the current data.
  
  We shortened this paragraph.
  
  (2) While the authors went through impressive lengths to detect any effect on lifecycle progression, none was found except for a reduction in oocyte count. However, the authors did not address any direct effect on mitochondria, such as OXPHOS complex assembly, respiration, membrane potential. This seems like a missed opportunity, given the team's previous and very nice work mapping these complexes by complexome profiling. However, I think there are some experiments the authors can still do to address any mitochondrial defects using what they have and not resorting to complexome profiling (although this would be definitive if it is feasible):
  
  i) Quantification of MitoTracker Red staining in WT and KOs. The authors used this dye to visualize mitochondria to assay their gross morphology, but unfortunately not to assay membrane potential in the mutants. The authors can compare relative intensities of the different mitochondria types they categorized in Fig. 3A in 20-30 cells to determine if membrane potential is affected when the cristae are deformed in the mutants. One would predict they are affected.
  
  Interesting suggestion. As our staining and imaging conditions are suitable for such analysis (as demonstrated by Sarazin et al., 2025, https://www.biorxiv.org/content/10.1101/2025.11.27.690934v1), we performed the measurements on the same dataset which we collected for Figure 3. We did, however, not detect any difference in mitotracker intensity between the different lines. The result of this analysis is included in the new version of Supplementary figure S6.
  
  ii) Sporozoites are shown in Fig S5. The authors can use the same set up to track their motion, with the hypothesis that they will be slower in the mutants compared to WT due to less ATP. This assumes that sporozoite mitochondria are active as in gametocytes.
  
  While theoretically plausible and informative, we currently do not know the relevance of mitochondrial energy conversion for general sporozoite biology or specifically features of sporozoite movement. Given the required resources and time to set this experiment up and the uncertainty whether it is a relevant proxy for mitochondrial functioning, we argue it is out of scope for this manuscript.
  
  iii) Shotgun proteomics to compare protein levels in mutants compared to WT, with the hypothesis that OXPHOS complex subunits will be destabilized in the mutants with deformed cristae. This could be indirect evidence that OXPHOS assembly is affected, resulting in destabilized subunits that fail to incorporate into their respective complexes.
  
  While this experiment could potentially further our understanding of the interaction between MICOS and levels of OXPHOS complex subunits we argue that the indirect nature of the evidence does not justify the required investments.
  
  To expedite resubmission, the authors can restrict the cell lines to WT and the dKO, as the latter has a stronger phenotype that the individual KOs and conclusions from this cell line are valid for overall conclusions about Plasmodium MICOS.
  
  I will also conclude that complexome/shotgun proteomics may be a useful tool also for identifying other putative MICOS subunits by determining if proteins sharing the same complexome profile as PfMic60 and Mic19 are affected. This would address the overinterpretation problem of point 1.
  
  (3) I am aware of the authors previous work in which they were not able to detect cristae in ABS, and thus have concluded that these are truly acristate. This can very well be true, or there can be immature cristae forms that evaded detection at the resolution they used in their volumetric EM acquisitions. The mitochondria and gametocyte cristae are pretty small anyway, so it not unreasonable to assume that putative rudimentary cristae in ABS may be even smaller still. Minute levels of sampled complex III and IV plus complex V dimers in ABS that were detected previously by the authors by complexome profiling would argue for the presence of miniscule and/or very few cristae.
  
  I think that authors should hedge their claim that ABS is acristate by briefly stating that there still is a possibility that miniscule cristae may have been overlooked previously.
  
  We acknowledge that we cannot demonstrate the absolute absence of any membrane irregularities along the inner mitochondrial membrane. At the same time, if such structures were present, they would be extremely small and unlikely to contain the full set of proteins characteristic of mature cristae. For this reason, we consider it appropriate to classify ABS mitochondria as acristate. To reflect the reviewer’s point while maintaining clarity for readers, we have slightly adjusted our wording in the manuscript, changing ‘fully acristate’ to ‘acristate’.
  
  This brings me to the claim that Mic19 and Mic60 proteins are not expressed in ABS. This is based on the lack of signal from the epitope tag; a weak signal is detected in gametocytes. Thus, one can counter that Mic19 and Mic60 are also expressed, but below the expression limits of the assay, as the protein exhibits low expression levels when mitochondrial activity is upregulated.
  
  We agree with the reviewer that the absence of a detectable epitope-tag signal does not definitively exclude low-level expression, and we have therefore replaced the term ‘absent’ with ‘undetectable’ throughout the manuscript. In context with previous findings of low-level transcripts of the proteins in a study by Lopez-Berragan et al. and Otto et al., we also added the sentence “The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.” to the discussion. At the same time, we would like to clarify that transcript levels for both genes fall within the <25th percentile, suggesting that these low values likely represent background signal rather than biologically meaningful expression. This interpretation is further supported by proteomic datasets in PlasmoDB, which report PfMIC19 and PfMIC60 expression in gametocyte and mosquito stages, but not in asexual blood stages.”
  
  To address this point, the authors should determine of mature mic60 and mic19 mRNAs are detected in ABS in comparison to the dKO, which will lack either transcript. RT-qPCR using polyT primers can be employed to detect these transcripts. If the level of these mRNAs are equivalent to dKO in WT ABS, the authors can make a pretty strong case for the absence of cristae in ABS.
  
  We appreciate the reviewer’s suggestion. As noted in the Discussion, existing transcriptomic datasets already show detectable MIC19 and MIC60 mRNAs in ABS. For this reason, we expect RT-qPCR to reveal low (but not absent) levels of both transcripts, unlike the true loss expected to be observed in the dKO. Because such residual signals have been reported previously and their biological relevance remains uncertain, we do not believe transcript levels alone can serve as a definitive indicator of cristae absence in ABS.
  
  They should highlight the twin CX9C motifs that are a hallmark of Mic19 and other proteins that undergo oxidative folding via the MIA pathway. Interestingly, the Mia40 oxidoreductase that is central to MIA in yeast and animals, is absent in apicomplexans (DOI: 10.1080/19420889.2015.1094593).
  
  Searching for the CX9C motifs is a valuable suggestion. In response to the reviewer´s suggestion we analysed the conservation of the motif in PfMIC19 and included this in a new figure panel (Figure 1 F).
  
  Did the authors try to align Plasmodium Mic19 orthologs with conventional Mic19s? This may reveal some conserved residues within and outside of the CHCH domain.
  
  In response to this comment we made Figure 1 F, where we show conserved residues within the CHCH domains of a broad range of MIC19 annotated sequences across the opisthokonts, and show that the Cx9C motifs are conserved also in PfMIC19. Outside the CHCH domain, we did not find any meaningful conservation, as PfMIC19 heavily diverges from opisthokont MIC19.
  
  (5) Statistical significance. Sometimes my eyes see population differences that are considered insignificant by the statistical methods employed by the authors, eg Fig. 4E, mutants compared to WT, especially the dKO. Have the authors considered using other methods such as student t-test for pairwise comparisons?
  
  The graphs in figures 3, 4 and 5 got a makeover, such that they now are in linear scale and violin plots (also following a suggestion from further down in the reviewer’s comments). We believe that this improves interpretability. ANOVA was kept as statistical testing to assure the correction for multiple comparisons that cannot be performed with standard t-test. A full overview of statistics and exact pvalues can also be found in the newly added supplementary information 2.
  
  Minor comments:
  
  Line 33. Anaerobes (eg Giardia) have mitochondria that do produce ATP, unlike aerobic mitochondria
  
  We acknowledge that producing ATP via OXPHOS is not a characteristic of all mitochondria-like organelles (e.g. mitosomes), which is why these are typically classified separately from canonical mitochondria. When not considering mitochondria-like organelles, energy conversion is the function that the mitochondrion is most well-known for and the one associated with cristae.
  
  Line 56: Unclear what authors mean by "canonical model of mitochondria"
  
  To clarify we changed this to “yeast or human” model of mitochondria.
  
  Lines 75-76: This applies to Mic10 only
  
  We removed the “high degree of conservation in other cristate eukaryotes” statement.
  
  Line 80: Cite DOI: 10.1016/j.cub.2020.02.053
  
  Done
  
  Fig 2D: I find this table difficult to read. If authors keep table format, at least get rid of 'mean' column' as this data is better depicted in 2C. I suggest depicted this data either like in 3B depicting portion of infected vs unaffected flies in all experiments, then move modified Table to supplement. Important to point out experiment 5 appears to be an outlier with reduced infectivity across all cell lines, including WT.
  
  To clarify: the mean reported in the table indicates the mean per replicate while the mean reported in figure 2C is the overall mean for a given genotype that corrects for variability within experiments. We agree that moving the table to the supplementary data is a good idea. We decided to not include a graph for infected and non-infected mosquitoes as this information would be partially misleading, highlighting a phenotype we argue to be influenced by the strong variability.
  
  Fig. 3C-G: I feel like these data repeatedly lead to same conclusions. These are all different ways of showing what is depicted in Fig 2B: mitochondria gross morphology is affected upon ablation of MICOS. I suggest that these graphs be moved to supplement and replaced by the beautiful images.
  
  Thank you for the nice comment on our images. We have now moved part of the graphs to supplementary figure 6 and only kept the Relative Frequency, Sphericity and total mitochondria volume per cell in the main figure.
  
  Line 180: Be more specific with which tubulin isoform is used as a male marker and state why this marker was used in supplemental Fig S6.
  
  We have now specified the exact tubulin isoform used as the male gametocyte marker, both in the main text and in Supplementary Fig. S6. This is a commercial antibody previously known to work as an effective male marker, which is why we selected it for this experiment. This is now clearly stated in the manuscript.
  
  Line 196 and Fig 3C: the word 'intensities' in this context is very ambiguous. Please choose a different term (puncta, elements, parts?). This is related to major point 2i above.
  
  To clarify the biological effect that we can conclude form the measurement, we added an explanation about it in the respective section of the results, and we decided to replace the raw results of the plug-in readout with the deduced relative dispersion.
  
  Line 222: Report male/female crista measurements
  
  We added Supplementary information 2, which contains exact statistical test and outcomes on all presented quantifications as well as a per-sex statistical analysis of the data from figure 4. Correspondingly, we extended supplementary information 2 by a per-sex colour code for the thin section TEM data.
  
  Fig. 4B-E: depict data as violin plots or scatter plots like Fig. 2C to get a better grasp of how the crista coverage is distributed. It seems like the data spread is wider in the double KO. This would also solve the problem with the standard deviation extending beyond 0%.
  
  We changed this accordingly.
  
  Lines 331-333: Please clarify that this applies for some, but not all MICOS subunits. Please also see major point 1 above. Also, the authors should point out that despite their structural divergence, trypanosomal cryptic mitofilins Mic34 and Mic40 are essential for parasite growth, in contrast to their findings with PfMic60 (DOI: https://doi.org/10.1101/2025.01.31.635831).
  
  This has been changed accordingly.
  
  Line 320: incorrect citation. Related to point 1above.
  
  Correct citation is now included in the text.
  
  Lines 333-335. This is related to the above. Again, some subunits appear to affect cell growth under lab conditions, and some do not. This and the previous sentence should be rewritten to reflect this.
  
  This has been changed accordingly.
  
  Line 343-345: The sentence and citation 45 are strange. Regarding the former, it is about CHCHD10, whose status as a bona fide MICOS subunit is very tenuous, so I would omit this. About the phenomenon observed, I think it makes more sense to write that Mic60 ablation results in partially fragmented mitochondria in yeast (Rabl et al., 2009 J Cell Biol. 185: 1047-63). A fragmented mitochondria is often a physiological response to stress. I would just rewrite as not to imply that mitochondrial fission (or fusion) is impaired in these KOs, or at least this could be one of several possibilities.
  
  The sentence has been substituted following the indication of the reviewer. Though we still include the data of the human cells as this has also been shown in Stephens et al. 2020.
  
  Line 373: 'This indicates' is too strong. I would say 'may suggest' as you have no proof that any of the KOs disrupts MICOS. This hypothesis can be tested by other means, but not by penetrance of a phenotype.
  
  Done
  
  Line 376-377; 'deplete functionality' does not make sense, especially in the context of talking about MICOS subunit stability. In my opinion, this paragraph overinterprets the KO effects on MICOS stability. None of the experiments address this phenomenon, and thus the authors should not try to interpret their results in this context. See major point 1.
  
  We removed the sentence. Also, the entire paragraph has been shortened, restructured and wording was changed to address major point 1.
  
  Other suggestions for added value
  
  (1) Does Plasmodium Sam50 co-fractionate with Mic60 and Mic19 in BN PAGE (Fig. 1E)
  
  While we did identify SAMM50 in our BN PAGE, the protein does not co-migrate with the MICOS components but instead comigrates with other components of a putative sorting and assembly machinery (SAM) complex. As SAMM50, the SAM complex and the overarching putative mitochondrial membrane space bridging (MIB) complex are not mentioned in the manuscript, we decided to not include the information in Author response image 1.
  
  Author response image 1.
  
  Reviewer #2 (Significance):
  
  The manuscript by Tassan-Lugrezin is predicated on the idea that Plasmodium represents the only system in which de novo crista formation can be studied. They leverage this system to ask the question whether MICOS is essential for this process. They conclude based on their data that the answer is no, which the authors consider unprecedented. But even if their claim is true that ABS is acristate, this supposed advantage does not really bring any meaningful insight into how MICOS works in Plasmodium.
  
  First the positives of this manuscript. As has been the case with this research team, the manuscript is very sophisticated in the experimental approaches that are made. The highlights are the beautiful and often conclusive microscopy performed by the authors. Only the localization of Mic60 and Mic19 was inconclusive due to their very low expression unfortunately.
  
  The examination of the MICOS mutants during in vitro life cycle of Plasmodium falciparum is extremely impressive and yields convincing results. Mitochondrial deformation is tolerated by life cycle stage differentiation, with a modest but significant reduction of oocyte production, being observed.
  
  However, despite the herculean efforts of the authors, the manuscript as it currently stands represents only a minor advance in our understanding of the evolution of MICOS, which from the title and focus of the manuscript, is the main goal of the authors.
  
  In its current form, the manuscript reports some potentially important findings:
  
  (1) Mic60 is verified to play a role in crista formation, as is predicted by its orthology to other characterized Mic60 orthologs.
  
  (2) The discovery of a novel Mic19 analog (since the authors maintain there is no significant sequence homology), which exhibits a similar (or the same?) complexome profile with Mic60. This protein was upregulated in gametocytes like Mic60 and phenocopies Mic60 KO.
  
  (3) Both of these MICOS subunits are essential (not dispensable) for proper crista formation
  
  (4) Surprisingly, neither MICOS subunit is essential for in vitro growth or differentiation from ABS to sexual stages, and from the latter to sporozoites. This says more about the biology of plasmodium itself than anything about the essentiality of Mic60, i.e. plasmodium life cycle progression tolerates defects to mitochondrial morphology. But yes, I agree with the authors that Mic60's apparent insignificance for cell growth in examined conditions does differ with its essentiality in other eukaryotes. But fitness costs were not assayed (e.g. by competition between mutants and WT in infection of mosquitoes)
  
  (5) Decreased fitness of the mutants is implied by a reduction of oocyte formation.
  
  While interesting in their own way, collectively they do not represent a major advance in our understanding of MICOS evolution. Furthermore, the findings bifurcate into categories informing MICOS or Plasmodium biology. Both aspects are somewhat underdeveloped in their current form.
  
  This is unfortunate because there seem to be many missed opportunities in the manuscript that could, with additional experiments, lead to a manuscript with much wider impact. For me, what is remarkable about Plasmodium MICOS that sets it apart from other iterations is the apparent absence of the Mic10 subunit. Purification of plasmodium MICOS via the epitope tagged Mic60 and Mic19 could have verified that MICOS is assembled without this core subunit. Perhaps Mic60 and Mic19 are the vestiges of the complex, and thus operate alone in shaping cristae. Such a reduction may also suggest the declining importance of mitochondria in plasmodium.
  
  Another missed opportunity was to assay the impact of MICOS-depletion of OXPHOS in plasmodium.
  
  This is a salient issue as maybe crista morphology is decoupled from OXPHOS capacity in Plasmodium, which links to the apparent tolerance of mitochondrial morphology in cell growth and differentiation. I suggested in section A experiments to address this deficit.
  
  Finally, the authors could assay fitness costs of MICOS-ablation and associated phenotypes by assaying whether mosquito infectivity is reduced in the mutants when they are directly competing with WT plasmodium. Like the authors, I am also surprised that MICOS mutants can pass population bottlenecks represented by differentiation events. Perhaps the apparent robustness of differentiation may contribute plasmodium's remarkable ability to adapt.
  
  I realize that the authors put a lot of efforts into their study and again, I am very impressed by the sophistication of the methods employed. Nevertheless, I think there is still better ways to increase the impact of the study aside from overinterpreting the conclusions from the data. But this would require more experiments along the lines I suggest in Section A and here.
  
  We thank the reviewer for their extensive analysis of the significance of our findings, including the compliments on our microscopy images and the sophisticated experimental approaches. We hope we have convincingly argued why we could or could not include some of the additional analyses suggested by the reviewer in section 1 above.
  
  With regard to the significance statement, we want to point out that our finding that PfMICOS is not needed for initial formation of cristae (as opposed to organization thereof), is a confirmation of something that has been assumed by the field, without being the actual focus of studies. We argue that the distinction between formation and organization of cristae is important and deserves some attention within the manuscript. The result of MICOS not being involved in the initial formation of cristae, we argue to be relevant in Plasmodium biology and beyond. As for the insights into how MICOS works in Plasmodium we have confirmed that the previously annotated PfMIC60 is indeed involved in the organization of cristae. Furthermore, we have identified and characterized PfMIC19. These findings, we argue, are indeed meaningful insights into PfMICOS.
  
  Reviewer #3 (Evidence, reproducibility and clarity):
  
  Summary:
  
  MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.
  
  We thank the reviewer for their time and compliment.
  
  Major comments:
  
  (1) The authors should improve to present their findings in the right context, in particular by:
  
  i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.
  
  We extended the introduction to include this information.
  
  iii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).
  
  To clarify we rephrased the sentence to: “Although MICOS has been described as an organizer of crista junctions, its role during the initial formation of nascent cristae has not been investigated.”
  
  (2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum but this is not compared to the expected length or the size in S. cerevisiae.
  
  To solve the reference issue, we added the uniprot IDs we compared to see that the annotated ORF is bigger in Plasmodium. We also changed the comparison to yeast instead of human, because we realized it is confusing to compare to yeast all throughout the figure, but then talk about human in this specific sentence.
  
  Regarding whether the true N-terminus is known. Short answer: No, not exactly.
  
  However, we do know that the Pf version is about double the size of the yeast protein.
  
  As the reviewer correctly states, we show the size of 120kDa for the tagged protein in Figure 1G. Considering that we tagged the protein C-terminally, and observed a 120kDa product on western blot, it is safe to conclude that the true N-terminus does not deviate massively from the annotated ORF, and hence, that there is a considerable extension of the protein beyond a 60kDa protein. We do not directly compare to yeast MIC60 on our western blots, however, that comparison can be drawn from literature: Tarasenko et al., 2017 showed that purified MIC60 running at ~60kDa on SDS-PAGE actively bends membranes, suggesting that in its active form, the monomer of yeast MIC60 is indeed 60kDa in size.
  
  To clarify, we now emphasize that we ran the Alphafold prediction on the annotated open reading frame (annotated and sequenced by Bohme et al. and Chapell et al. now cited in the manuscript), and revised the wording to make clear what we are comparing in which sentence.
  
  (3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Fig 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.
  
  As a reply to this and other comments from the reviewers we added the multiple testing within all samples. In addition, to clarify statistics used we included a supplementary dataset with all p-values and statistical tests used.
  
  (4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.
  
  We deleted this statement.
  
  (5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:
  
  "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."
  
  This seems in line with what the authors show, rather than "different".
  
  This sentence has been removed.
  
  (6) lines 380-385: "... thus suggesting that membrane invaginations still arise, but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.
  
  This sentence has been deleted in the revised version of the manuscript.
  
  Minor comments:
  
  (1) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.
  
  Title is changed accordingly
  
  - Line 43, of the three seminal papers describing the discovery of MICOS in 2011, the authors only cite two (refs 6 and 7), but miss the third paper, Hoppins et al, PMID: 21987634, which should probably be corrected.
  
  Done, the paper is now cited
  
  - Page 2, line 58: for a more complete picture the authors should also cite the work of others here which shows that although at very low levels, e.g. complex III (a drug target) and ATP synthase do assemble (Nina et al, 2011, JBC).
  
  Done
  
  - Page 3, line 80: "Irrespective of the shape of an organism's cristae, the crista junctions have been described as tubular channels that connect the cristae membrane to the inner boundary membrane (22, 24)." This omits the slit-shaped cristae junctions found in yeast (Davies et al, 2011, PNAS), which the authors should include.
  
  The paper and concept have been added to the manuscript, though the sentence has been moved up in the introduction, when crista junctions are first introduced.
  
  - Line 97: "poorly predicted N-terminal extension", as there is no experimental structure, we don't know if the prediction is poor. Presumably the authors mean either poorly ordered or the absence of secondary structure elements, or the poor confidence score for that region in the prediction? This should be clarified or corrected.
  
  We were referring to the poor confidence score. To address this comment as well as major point 2, we rewrote the respective paragraph. It now clearly states that confidence of the prediction is low, and we mention the tool that was used to identify conserved domains (Topology-based Evolutionary Domains).
  
  - Line 98: "an antiparallel array of ten β-sheets". They are actually two parallel beta-sheets stacked together. The authors could find out the name of this fold, but the confidence of the prediction is marked a low/very low. So, its existence is unknown, not just its "function".
  
  We adapted the domain description to “a stack of two parallel beta-sheets" and replaced the statement on unknown function by the statement “Because this domain is predicted solely from computational analysis, both its actual existence in the native protein and its biological function remain unknown.”
  
  - Fig 1B: The authors show two alphafold predictions of S. cerevisiae and P. falciparum Mic60 structures. There is however an experimental Mic60/19 (fragment) structure from the former organism (PMID: 36044574), which should be included if possible.
  
  We appreciate the reviewer’s suggestion and note that the available structural data indeed provides valuable insight into how MIC60 and MIC19 interact. However, these structures represent fusion constructs of limited protein fragments and therefore capture only a small portion of each protein, specifically the interaction interface. Because our aim in Fig. 1B is to compare the overall domain architecture of the full-length proteins, we believe that including fragment-based structures would be less informative in this context.
  
  - Line: 318-321: "The same trend was observed for PfMIC19 and PfMIC60. Although transcriptomic data suggested that low-level transcripts of PfMIC19 and PfMIC60 are present in ABS (38), we did not detect either of the proteins in ABS by western blot analysis. While this statement is true, the authors should comment on the sensitivity of the respective methods - how well was the antibody working in their hands and how do they interpret the absence of a WB band compared to transcriptomics data?
  
  The HA antibody used in our experiments is a standard commercial reagent that performs reliably in both WB and IFA, although it shows a low background signal in gametocytes. We agree that the sensitivity of the method and the interpretation of weak or absent bands should be addressed explicitly. Transcript levels for both PfMIC19 and PfMIC60 in asexual blood stages fall within the <25 percentile, suggesting that these signals likely represent background. Nevertheless, we acknowledge that low-level protein expression below the detection limit of western blot analysis cannot be excluded. To reflect these considerations, we added the sentence: ‘The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.
  
  - Lines 322-323: would the authors not typically have expected an IFA signal given the strength of the band in Western blot? If possible, the authors should comment if the negative fluorescence outcome can indeed be explained with the low abundance or if technical challenges are an equally good explanation.
  
  Considering the nature of the investigated proteins (embedded in the IMM and spread throughout the mitochondria) difficulties in achieving a clear signal in IFA or U-ExM are not very surprizing. While epitopes may remain buried in IFA, U-ExM usually increases accessibility for the antibodies. However, U-ExM comes at the cost of being prone to dotty background signals, therefore potentially hiding low abundance, naturally dotty signals such as the signal of MICOS proteins that localize to distinct foci (at the CJ) along the mitochondrion. Current literature suggests that, in both human and yeast, STED is the preferred method for accurate spatial resolution of MICOS proteins (https://www.ncbi.nlm.nih.gov/pubmed/32567732,https://www.ncbi.nlm.nih.gov/pubmed/3206734 4). Unfortunately, we do not have experience with, nor access to, this particular technique/method.
  
  - Lines 357-365: the authors describe limitations of the applied methods adequately. Perhaps it would be helpful to make a similar statement about the analysis of 3D objects like mitochondria and cristae from 2D sections. E.g. the apparent cristae length depends on whether cristae are straight (e.g. coiled structures do not display long cross sections despite their true length in 3D).
  
  The limitations of other methods are described in the respective results section.
  
  We added a clarifying sentence in the results section of Figure 4:
  
  “Note that such measurements do not indicate the true total length or width of cristae, as the data is two-dimensional. The recorded values are to be considered indicative of possible trends, rather than absolute dimensions of cristae.“
  
  This statement refers to the length/width measurements of cristae.
  
  In the context of Figure 4D we mention the following (see preprint lines 229 – 230): “We expect this effect to translate into the third dimension and thus conclude that the mean crista volume increases with the loss of either PfMIC19, PfMIC60, or both.”
  
  For Figure 5, we included a clarifying statement in the results section of the preprint (lines 269 – 273): “Note that these mitochondrial volumes are not full mitochondria, but large segments thereof. As a result of the incompleteness of the mitochondria within the section, and the tomography specific artefact of the missing wedge, we were unable to confirm whether cristae were in fact fully detached from the boundary membrane, or just too long to fit within the observable z-range.”
  
  - Line 404: perhaps undetected or similar would be a better description than "hidden"?
  
  The sentence does not exist in the revised manuscript.
  
  Reviewer #3 (Significance):
  
  The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism. The limitation of the study stems from what is already known about MICOS and its subunits in great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis. Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.
  
  As suggested by Reviewer 2, we examined mitochondrial membrane potential in gametocytes using MitoTracker staining and did not observe any obvious differences associated with the morphological defects. At present, additional assays to probe mitochondrial function in P. falciparum gametocytes are not sufficiently established, and developing and validating such methods would require substantial work before they could be applied to our mutant lines. For these reasons, a more detailed mechanistic link between the observed morphological changes and the reduced infection efficiency is currently beyond reach.
  
  The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.10.13.682069v2
www.biorxiv.org www.biorxiv.org

The insulin / IGF axis is critically important for controlling gene transcription in the podocyte

1
1. Public_Reviews 12 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, where both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated and studied podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism of cell death, the authors performed global proteomics and found that spliceosome proteins were downregulated. They confirmed this directly by using long-read sequencing. These results suggest a novel role for insulin and IGF1R signaling in RNA splicing in podocytes.
  
  This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling regulates splicing is not directly addressed but implicates potentially the phosphorylation downstream of these receptors. In the revised manuscript, it is shown that the mouse KO is incomplete potentially explaining the slow onset of renal insufficiency. Direct measurement of GFR and serial serum creatinines might also enhance our understanding of progression of disease, proteinuria is a strong sign of renal injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful but may be masked by defects in other spliceosome genes. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.
  
  Significance:
  
  With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism.
  
  Comments on revised version:
  
  I'm satisfied with the revised manuscript and the responses to my previous concerns.
  
  Thank you.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.
  
  Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.
  
  Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.
  
  Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.
  
  The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, bars/superimposed dot-plots.
  
  Methods are generally well described.
  
  Comments on revised version:
  
  Coward and colleagues have done an excellent job of responding to all the reviewer comments.
  
  Thank you.
  
  Reviewer #4 (Public review):
  
  Summary and background:
  
  This report entitled "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte" from Hurcombe et al is based on a mouse double knockdown of the IR and IGF1R and a parallel cultured mouse podocyte model. Insulin/IGF signaling system in mammals evolved as three gene reduplicated peptides (insulin, IGF-1, and IGF-2) and their two receptors IR and IGF1R that cross-react to variable extents with the peptides, are ubiquitously expressed, and signal through parallel pathways. The major downstream effect of insulin is to regulate glucose uptake and metabolism, while that of the IGF pathways is to regulate growth and cell cycling in part through mTORC1. The GH-IGF-1-IGF1R pathway regulates post-natal growth. IGF-2 signaling is thought to play a major role in regulating intrauterine growth and development, although IGF-2 is also present at high levels in post-natal life. Thus, one would anticipate that reducing IR/IGF1R signaling in any cell would slow growth and cell cycling by reducing growth factor and metabolic mTORC1-mediated and other processes including the splicing of RNA for protein synthesis.
  
  Thank you for this new extra review and assessing our paper with new suggestions (we addressed the previous suggestions to the satisfaction of other reviewers). Of note -regarding this introduction – the podocyte is a terminally differentiated cell and may have unique responses to insulin / IGF as it is accepted it does not generally proliferate (hence we consider understanding the actions of insulin / IGF and their receptors to be of interest). Indeed, we have recently shown a contrasting effect of IGF signalling in the podocyte. Partial suppression of the IGF1 receptor is beneficial in contrast to near complete suppression that results in mitochondrial dysfunction (PMID:38706850).
  
  Mouse IR/IGF1R double knockdown model:
  
  A double knockdown mouse model was generated by interbreeding mice with different genetic backgrounds carrying floxed sites for IR and IGF-1R to produce mixed background offspring with both floxed IR and IGF-1R genes. These mice were crossed so that the podocin promoter driven-Cre (that comes on at about embryonic day 12 bas podocytes are developing) would delete IR and IGF-1R genes. Since podocin is believed to be an absolutely podocyte-specific protein, this podocin promoter this is predicted to specifically knock down the IR and IGF1R genes only in podocytes. The weight and growth of double KO offspring was not different from controls, but some proportion of the double knockdown mice subsequently developed proteinuria by 6 months and 20% died, although no specific data is provided to identify the cause of the deaths since eGFR was not decreased. Surviving mice were evaluated at 6 months of age. The efficacy of knockdown was not demonstrated in the mouse model itself, although a temperature-sensitive cell line developed from these double knockdown mice showed that expression of IR and IGF-1R proteins in the Cre-treated cell line were both reduced by about 50% (no statistical analysis of this result provided).
  
  In the knockout mice, proteinuria was significantly increased by 6 months, but not at earlier time points. Histologic analysis showed proteinaceous casts, glomerulosclerosis and interstitial fibrosis. Podocyte number was stated to be reduced by about 30% in double knockdown mice, although the method by which this was evaluated seems to have been by counting WT1 positive nuclei in glomerular cross-sections, an approach that is well-known not to be a reliable way of assessing true podocyte number. No information is provided about podocyte size, density or glomerular volume.
  
  Comment: If IR/IGF1R deletion plays a significant role in normal podocyte function sufficient to cause proteinuria and glomerulosclerosis then the effect of reduced IR and IGF1R protein expression on podocyte function would have been expected to produce a phenotype before 6 months. A more likely scenario to explain the overall result is that deleting the IR and IGF1R genes at about embryonic day12 impacted podocyte development to a variable extent such that some mice developed fewer podocytes per glomerulus than other mice. As mice grow and their glomeruli and glomerular capillary area increases, those mice with fewer podocytes would not be able to completely cover the filtration surface with foot processes and would develop proteinuria and glomerulosclerosis. If reduced podocyte number per glomerulus is the proximate cause of the observed proteinuria, then modulation of the body and kidney growth rate by calorie restriction to slow growth (lower circulating IGF-1 levels) would be expected to be protective, while a high protein high calorie diet (higher circulating IGF-1 levels) or uni-nephrectomy to increase kidney growth rate would be expected to enhance proteinuria and glomerulosclerosis.
  
  Thank you for these comments. In response to them:
  
  (1) WT1 as a marker of podocyte number. We agree may not be the most accurate way of precisely measuring podocyte number but is widely accepted in the field (PMID:33655004 / PMID:38542564) and we think convincingly shows fewer podocytes at 6-months.
  
  (2) Podocyte size and density was not measured. This was not the focus of the paper and the histology obviously showed a significant phenotype in several mice (Figs 1D-F). Of note we did objectively assess a glomeruloscleorosis index (Fig 1D). We took the approach to understand mechanism through non-biased proteomics and phospho-proteomics of conditionally immortalised podocytes in which we had convincingly knocked down the insulin and IGF1 receptors (Figure 2)
  
  (3) You did not study the mice earlier to ascertain the developmental phenotype. We concede we did not do this but there was no significant proteinuria detected early in the mice so elected not to increase mouse numbers by studying them then (which we consider good practice for reduction, replacement and refinement). We suspect there would have been subtle changes in those mice that had significantly reduced simultaneous IR and IGF1R knockdown. It was precisely because of this that we generated a conditionally immortalised podocyte cell line with robust simultaneous knock-down of both receptors.
  
  (4) You did not show significant insulin and IGF1 receptor knockdown in the conditionally immortalised cell line (reviewer states it was 50%). We clearly knocked both receptors down (insulin and IGF1R) in the podocyte line by >80% which was highly statistically significant (p<0.00001). Figure 2A. We agree this was crucial (and we made the cell line because of the variability in the mouse model).
  
  The model as used may be more representative of a variable degree of podocyte depletion than an effect of impaired IR/IGF1R signaling. Therefore, although the phenotype may be ultimately attributable to the IR/IGF1R gene deletions the proteinuria and glomerulosclerotic phenotype itself was probably a consequence of defective podocyte development. Examining podocyte number, size, density and glomerular volume at earlier time points (4 weeks) would help to answer this question. Therefore, a more appropriate title would be "The insulin/IGF axis is critically important (for) normal podocyte development and deployment". In this context the effect of the knockdowns on splicing would make more sense.
  
  Please see our response (above). We think our final conclusion that in the podocyte the insulin/IGF axis is important for spliceosome activity and control is valid. This is due to our findings (both total and phospho proteomics results) and considering recent other papers showing this axis can rapidly phosphorylate a variety of spliceosome proteins in different cell types (PMID:39939313 / PMID:32888406). All discussed in detail in the manuscript).
  
  Cell culture studies. A cell line was generated using a temperature sensitive SV40 system that has been previously reported from this laboratory. A detailed analysis is provided to show that double knockout cells exhibited abnormal spliceosome activity. This forms the basis for the conclusion that "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte". There are several concerns that weaken this conclusion.
  
  (1) In the double knockdown cell culture system about 30% of cells were "lost" by 3 days and about 70% of cells were "lost" by 5days. The studies were done at the 3 day time point. It is not clear whether "lost" cells were in the process of dying, stress-induced detachment, or just growing more slowly than control due to reduced IR and IGF-1R signaling. These processes could have impacted splicing in a non-specific way independent of IR/IGF1R signaling itself.
  
  (2) Can a single cell line derived from the double floxed mice be relied on to provide an unbiased picture of the effect of deleting IR and IGF-1R? Presumably, the transfection and selection process will select for cells that survive thereby including unknown biases, possibly related to spliceosome function. Is a single cell line adequate? These investigators have extensive experience with this type of analysis, but this question is not addressed in the discussion.
  
  (3) To determine whether the effect is specific to reduced IR/IGFR signaling the deletion of IR and IGF-1R could be corrected by transfecting full length IR and IGF-1R cDNAs into the cells to restore normal IR/IGF1R signaling. If transfected cells with intact IR and IGF-1R expression and activity returns spliceosome activity to normal this would be evidence that receptors themselves play some role in spliceosome activity, as opposed to the downstream effect on growth limitation/stress on the cells.
  
  (4) Other ways of testing whether the splicing effect is specifically due to reduced IR/IGF-1R signaling would be to (a) block IR and IGF1R receptors using available inhibitors, (b) remove or reduce insulin, IGF-1 and IGF-2 levels in the culture medium, (c) use low glucose and amino acid culture medium to slow growth rate independent of receptor function, (d) or block intra-cellular signaling via the IR and IGF-1R receptors through mTORC1 inhibition using rapamycin or other signaling targets.
  
  (5) It would be useful to determine whether the cultured cells stressed in other ways (e.g. ischemia, toxins, etc.) also results in the same splicing abnormalities.
  
  Point 1. 70% cell loss was observed at day 7 (not day 5). We found approximately 20% loss at day 3. We opted to go for this early date hypothesising the key detrimental processes would be clear then. This 3 day time point also ensures there has been enough time to allow for the expression of Cre recombinase, receptor gene excision and degradation of existing endogenous IR/IGF1R following lentiviral transduction. Interestingly we did not find a major “death or apoptosis” signal in our data then but agree it should be considered. We think this is a specific pathway as we have examined several other conditionally immortalised detrimental podocyte cell line previously using proteomics with a much more severe phenotype of cell death (E.g. podocyte GSK3 alpha/beta knockdown) and we detected NO spliceosome signal (PMID:30679422). Furthermore, there are now other podocyte proteomics “stress” studies that have been published in which there is proteinuria and significant cell loss / death that also do not show spliceosome dysfunction. These include studying the detailed proteosomal signature of podocytes stressed with Doxorubicin and Lipopolysaccharide endotoxin LPS in mice (PMID:32047005) and bradykinin stimulation of rat podocytes (PMID:32518694).
  
  Point 2. Yes, we think it is valuable and reproducible. We generated a podocyte cell line from insulin receptor and IGF1 receptor homozygous floxed cells. Hence there is no selection bias in the cells when generating the line as both receptors are effectively intact. We then temporally “knocked down” the receptors with extrinsic lentiviral Cre.
  
  Importantly we validated our cell line findings both back in the cells (with Western blotting) and in our transgenic receptor knockdown mice and found evidence of spliceosomal dysregulation (Figure 3E and 3F). Also as discussed above the spliceosome has been identified in other models in the insulin/IGF pathway.
  
  Point 3. We don’t think the experiment of knocking down the receptors and then reconstituting them would prove this hypothesis. This is because if splicing abnormality was due to generalised cell dysfunction (which we do not think is the case in this situation) then putting the receptors back may simply restore cell health and the spliceosomal function (e.g. it does not prove it is via the receptors). Secondly, the process of transduction with multiple lentiviruses may be inherently stressful to the cell and there may be a high level of extrinsic receptor inserted which may also be confounding/detrimental. Finally, as discussed there are now several lines of evidence describing insulin / IGF signalling to spliceosomal proteins which we consider important (discussed in the paper in detail).
  
  Point 4. We think modulating the receptors using the Cre-lox approach is the cleanest approach (with fewer off-target effects) to interrogate the insulin / IGF axis. It allows us to differentiate the cells by thermo-switching (which is crucial for this terminally differentiated cell) and then robustly knocking down both receptors simultaneously to investigate mechanism. We agree these supplementary approaches may give some extra information if their limitations (eg off target effects of inhibitors) are also taken into consideration.
  
  Point 5. They do not. Please see response to point 1 above regarding GSK3, Doxorubicin, LPS and bradykinin challenge.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.05.20.594973v3
www.biorxiv.org www.biorxiv.org

Brain-wide arousal signals are segregated from movement planning in the superior colliculus

1
1. Public_Reviews 12 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the previous reviews
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.
  
  Comments on revised manuscript:
  
  The authors have done a very good job of responding to all of the reviewers' concerns.
  
  No weaknesses to address.
  
  Reviewer #2 (Public review):
  
  Weaknesses:
  
  (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.
  
  (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.
  
  (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?
  
  Comments on revised manuscript:
  
  I remain somewhat concerned that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see a more detailed discussion justifying the use pupil size alone (i.e., w/o other indicators such as RT) as indicative of fluctuations in general arousal that are causal to concomitant changes in SC activity. Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'.
  
  Other than this conceptual issue, I do not have major problems with the analysis per se.
  
  We agree with the reviewer that we may have advanced into discussing arousal-related effects in the previous version of the manuscript without providing a thorough explanation for why we think the slow drift axis is associated with changes in the monkey’s arousal levels. Arousal has been linked to the size of the pupil as well as movements of the eyes in numerous previous studies. We have made the following changes in the revised manuscript to address the reviewer’s concern:
  
  (1) When first describing how the spiking responses of SC neurons fluctuate over the course of a recording session (Lines 130-132), we have used the phrase "slow fluctuations in the spiking responses" rather than "arousal-related fluctuations in the spiking responses". Then, when describing these effects in more detail (Lines 136-147), we have explained why we think these fluctuations may be related to arousal. The following text has been added in the revised manuscript for clarification:
  
  “We found that this low-dimensional pattern of activity in the SC was also correlated with pupil size in the present study and with simultaneously recorded data in the prefrontal cortex (PFC), pointing to a link between this brain-wide fluctuation and changes in the monkeys’ arousal levels while performing the task.” (Lines 136-147)
  
  (2) We have changed the subheading in Line 183 of the revised manuscript from "Arousal-related fluctuations are present in the SC and correlated with pupil size and fluctuations in PFC activity" to "Slow fluctuations in SC spiking activity are correlated with pupil size and PFC activity". Given that we have not yet explained the results linking these fluctuations to arousal at this stage of the manuscript, we believe that this revised title is more accurate and avoids jumping too quickly to arousal-related fluctuations without first explaining the link between SC slow drift, pupil size and PFC activity.
  
  (3) We have provided additional justification for using pupil size and PFC activity to assess whether SC slow drift is associated with changes in the monkeys’ arousal levels. In a previous study, we computed an identical slow drift axis for spiking responses in visual cortex (V4) and PFC, and investigated how these low-dimensional neural activity patterns, which were themselves strongly correlated, were associated with various eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). Results showed that pupil size was the strongest predictor of slow drift in V4 and PFC. Given that the eye metrics were also strongly correlated with each other, we believe that the observed relationship between SC slow drift, pupil size and PFC activity provides sufficient evidence to suggest that the fluctuations observed in the SC are arousal-related. The following text has been added to the Results section of the revised manuscript:
  
  “Moreover, previous work in our laboratory computed a similar slow-drift axis using spiking activity in visual cortex (V4) and PFC, and investigated the relationship between these low-dimensional neural activity patterns and different eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). In addition to observing a strong correlation between V4 and PFC slow drift, we found that, relative to the other eye-related metrics, pupil size was the strongest predictor of these fluctuations (Johnston et al., 2022a). Thus, to further confirm the link between the SC slow drift axis and changes in the monkeys’ arousal levels while they performed the MGS task, we next sought to explore if projections onto the SC slow drift axis were associated with pupil size.” (Lines 236-344)
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is low if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.
  
  Strengths:
  
  The paper is clear and well-written.
  
  Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.
  
  Weaknesses:
  
  The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.
  
  Of course, the general conclusion is that the motor neurons will not have the arousal signal. It's just the interpretation that is different in the sense that the lack of the arousal signal is due to a lack of visual sensitivity in the motor neurons.
  
  I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. Please also note that I do not mean the luminance transient associated with the target onset. I mean the luminance of the gray display. it is a source of light. if the pupil diameter changes, then the amount of light entering to the visually sensitive neurons also changes.
  
  Comments on revised manuscript:
  
  The authors have addressed my first primary comment. For the light comment, I'm still not sure they addressed it. At the very least, they should explicitly state the possibility that the amount of light entering from the gray background can matter greatly, and it is not resolved by simply changing the analysis interval to the baseline pre-stimulus epoch. I provide more clear details below:
  
  In line 194 of the redlined version of the article (in the Introduction), the citation to Baumann et al., PNAS, 2023 is missing near the citation of Jagadisan and Gandhi, 2022. Besides replicating Jagadisan and Gandhi, 2022, this other study actually showed that the subspaces for the visual and motor epochs are orthogonal to each other
  
  We thank the reviewer for this comment and apologize that the citation to Baumann et al., PNAS, 2023 was missing in the previous version of the manuscript. In addition to including this citation in the revised version, we have provided a much more comprehensive description of all three cited studies and clarified that, in addition to replicating the results of Jagadisan and Gandhi, Baumann et al., PNAS, 2023 showed that the subspaces for the visual and motor epochs are orthogonal to each other. The following lines have been added to the Introduction of the revised manuscript:
  
  “A similar separation has been observed for visual and motor responses in the SC (Jagadisan and Gandhi, 2022; Ayar et al., 2023; Baumann et al., 2023). For example, Jagadisan and Gandhi (2022) used linear microelectrode arrays to investigate why early eye movements are not triggered when neuronal responses to a visual target, presented before a delayed saccade to that target, cross a threshold. They found that population activity in the SC was less stable during the visual epoch of a delayed saccade task, relative to the saccade epoch. Moreover, saccades could be evoked more easily by patterned microstimulation when the temporal structure of the microstimulation was stable across electrodes, providing a potential explanation for how downstream regions differentiate between visual and motor responses. Similar results were reported by Baumann et al. (2023) who found that the strength of SC motor responses during a saccade to a visual image depends on the features of that image (e.g., contrast, orientation). When dimensionality reduction was applied to the spiking responses of neuronal populations in the SC, the population trajectory during the initial visual response to the image was orthogonal to that during the motor response. These findings replicate the separation in temporal population structure reported by Jagadisan and Gandhi (2022) and support the results of Ayar et al. (2023). They found that, although not completely orthogonal, population activity in the SC is distinct for visual and motor responses during the same oculomotor task and across different tasks, which could further facilitate the decoding of signals related to sensation, action and context by downstream regions.” (Lines 110-127)
  
  Line 683 (and around) of the redlined version of the article (in the Results): I'm very confused here. When I mentioned visual modulation by changed pupil diameter, I did not mean the transient changes associated with the brief onset of the cue in the memory-guided saccade task. I meant the gray background of the display itself. This is a strong source of light. If the pupil diameter changes across trials, then the amount of light entering the eye also changes from the gray background. Thus, visually-responsive neurons will have different amount of light driving them. This will also happen in the baseline interval containing only a fixation spot. The arguments made by the authors here do not address this point at all. So, please modify the text to explicitly state the possibility that the global luminance of the display (as filtered by the pupil diameter) alters the amount of light driving the visually-responsive neurons and could contribute to the higher effects seen in the more visual neurons.
  
  We apologize that our analysis did not fully address the reviewer’s concern that the presence of fluctuations in visual neurons and their absence in motor neurons may have arisen indirectly due to changes in the amount of light entering the eye caused by changes in pupil size. As per the reviewer’s suggestion, we have now raised the possibility that visual neurons in the SC may have firing rates that are monotonically related to slow trends in overall luminance induced by pupil size changes, whereas motor neurons do not. Although we believe this to be an unlikely explanation, the paragraph from lines 374-398 has been modified to better describe this possibility, including the following text:
  
  “Given that slow drift is found in traditionally defined visual areas (e.g., area V4) and in regions that show mixed selectivity for multiple task variables (e.g., PFC) (Cowley et al., 2020), it seems unlikely that slow drift is caused by luminance fluctuations alone and more likely that it reflects global changes in arousal. At the same time, these arousal-related fluctuations covary with changes in pupil size (Johnston et al., 2022a), which could modulate the amount of light entering the eye from the display. This might affect visual neurons but not motor neurons due to their lack of visual sensitivity. Because SC neurons exist on a continuum, with visual responses decreasing and motor responses increasing from the intermediate to deep layers (Massot et al., 2019; Heusser et al., 2022) and no clear categorical boundary for motor-only neurons, any readout strategy would still need to avoid corruption of the motor output by slow drift, even if it were caused by changes in the amount of light entering the eye.” (Lines 387-398)
  
  The figures (everywhere, including the responses to reviewers) are very low resolution and all equations in methods are missing.
  
  We thank the reviewer for bringing this to our attention. We believe this issue may have arisen during conversion of the manuscript file for review, as the figures were of sufficient quality and the equations visible in the version that appeared online (https://doi.org/10.7554/eLife.99278.2). In any case, we will ensure that high-resolution figures are submitted with the revised manuscript and apologize that they were low resolution in the previous version.
  
  I'm very confused by Fig. 2 - supplement 2. Panel B shows a firing rate burst aligned to *microsaccade* onset. Does that mean you were in the foveal SC? i.e. how can neurons have a motor burst to the target of the memory-guided saccade and also for microsaccades? And which microsaccade directions caused such a burst? And what does it mean to compute the motor index and spike count for microsaccades in panel C? if you were in the proper SC location for the saccade target, then shouldn't you *not* get any microsaccade-related burst at all? This is very confusing to me and needs to be clarified
  
  We agree that clarification is needed here and thank the reviewer for their comment. The eccentricity of the targets was set to match the endpoints of the evoked saccades, which for some sessions were relatively close to the fovea. The mean eccentricity of the targets across sessions was 4.52° (SD = 2.89°). These values are now reported in the Methods section of the revised manuscript (Line 637). For the neuron shown in Figure 2–figure supplement 2, the eccentricity of the targets was 3°. Previous research has shown that some SC neurons respond during microsaccades as well as slightly larger saccades (see Hafed & Krauzlis, 2012, J. Neurophysiol., Fig. 4B). This likely explains why the neuron shown in Figure 2–figure supplement 2, which had a receptive field at ~3° based on saccades evoked by microstimulation, also responded during microsaccades. We apologize that this was not explained in the previous version and agree that it could have been confusing for the reader. To address this, the legend for this supplementary figure has been edited in the revised version and now reads:
  
  “(B) PSTH for an SC neuron that responded around the time of a microsaccade. Firing rates were computed in 1ms bins, averaged across trials and smoothed using a Gaussian function (σ = 5ms). Note that the targets were set to 3º in this session based on saccades evoked by microstimulation (see Methods). Previous research has shown that some SC neurons respond during microsaccades as well as to slightly larger saccades (Hafed and Krauzlis, 2012). This likely explains why this SC neuron, which had a RF at ~3º based on saccades evoked by microstimulation, also responded around the time of a microsaccade.” (Lines 1026-1031)
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.04.26.591284v3
www.biorxiv.org www.biorxiv.org

Frequency-dependent modulation of foveal contrast sensitivity by fine-scale exogenously triggered attention

1
1. Public_Reviews 12 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  eLife Assessment
  
  This study explores how exogenous attention operates at the finest spatial scale of vision, within the foveola - a topic that has not been previously explored. The question is important for understanding how attention shapes perception, and how it differs between the periphery and the central regions of highest visual acuity. The evidence is compelling, as shown by carefully designed experiments with state-of-the-art eye tracking to monitor attended locations just a few tens of minutes of arc away from the fixation target, but additional clarification regarding analyses and implications for vision and oculomotor control would broaden the impact of the study.
  
  We thank the editors and reviewers for their thorough evaluation of our work. We have carefully revised the manuscript and substantially reworked the Discussion to address all of the points raised, eliminate redundancies, streamline the text, and clarify the implications of our findings for vision and oculomotor control. We have also expanded the documentation of our power analyses and conducted the additional analyses requested by the reviewers. Our point-by-point responses are provided.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  The manuscript investigates how exogenous attention modulates spatial frequency sensitivity within the foveola. Using high-precision eye-tracking and gaze-contingent stimulus control, the authors show that exogenous attention selectively improves contrast sensitivity for low- to midrange spatial frequencies (4-8 cycles/degree), but not for higher frequencies (12-20 CPD). In contrast, improvements in asymptotic performance at the highest contrast levels occur across all spatial frequencies. These results suggest that, even within the foveola, exogenous attention operates through a mechanism similar to that observed in peripheral vision, preferentially enhancing lower spatial frequencies.
  
  Strengths:
  
  The study shows strong methodological rigor. Eye position was carefully controlled, and the stimulus generation and calibration were highly precise. The authors also situate their work well within the existing literature, providing a clear rationale for examining the fine-grained effects of exogenous attention within the foveola. The combination of high spatial precision, gazecontingent presentation, and detailed modeling makes this a valuable technical contribution.
  
  Weaknesses:
  
  The manipulation of attention raises some interpretive concerns. Clarifying this issue, together with additional detail about statistics, participant profiles, other methodological elements, and further discussion in relation to oculomotor control in general, could broaden the impact of the findings.
  
  We thank the reviewer for the helpful comments. In the Discussion, we have now considered additional factors that could have contributed to the observed attentional effects. First, the exogenous cue might have functioned as a temporal warning signal. However, the interval between cue and stimulus onset was fixed across trials, meaning that the cue did not provide temporal information beyond what participants could already anticipate. Furthermore, participants completed a large number of trials (≥ 4000), making it highly likely that the temporal relationship between trial onset and target onset was overlearned. These considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions.
  
  Another possibility is that the 100% validity of the exogenous cue could potentially have promoted endogenous attentional engagement. Yet, several characteristics of our task strongly limited the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to the observed attentional benefits in our task.
  
  Regarding the points on statistical reporting and participant details, we followed the reviewer’s suggestions by adding post hoc power analyses and providing more comprehensive reporting of the linear model outputs (see Appendices 1 and 2). We also expanded the description of the training procedures conducted with participants prior to formal data collection in the Methods section.
  
  We appreciate the reviewer for raising the important question of how our findings may relate to oculomotor control. To address this, we analyzed trials excluded from the manuscript due to saccades. This analysis revealed that saccade latencies were shorter in the valid condition than in the neutral condition (see Figure 2 — Supplementary Figure 2). This earlier saccade onset may reflect exogenously triggered preparatory activity in the oculomotor system in response to the salient cue. Future studies are needed to examine whether this preparatory mechanism serves to efficiently guide microsaccades or saccades toward behaviorally relevant stimuli in everyday vision. We have incorporated this point into the Discussion, highlighting a potential mechanistic link between exogenous attention and oculomotor behavior.
  
  Reviewer #2 (Public review):
  
  Summary:
  
  This study aims to test whether foveal and non-foveal vision share the same mechanisms for endogenous attention. Specifically, they aim to test whether they can replicate at the foveola previous results regarding the effects of exogenous attention for different spatial frequencies.
  
  Strengths:
  
  Monitoring the exact place where the gaze is located at this scale requires very precise eyetracking methods and accurate and stable calibration. This study uses state-of-the-art methods to achieve this goal. The study builds on many other studies that show similarities between foveal vision and non-foveal vision, adding more data supporting this parallel.
  
  Weaknesses:
  
  The study lacks a discussion of the strength of the effect and how it relates to previous studies done away from the fovea. It would be valuable to know if not just the range of frequencies, but the size of the effect is also comparable.
  
  We thank the reviewer for raising these important issues. In response, we have expanded the Discussion to link our findings to prior work. First, we included a direct comparison of our effect sizes with those reported in previous studies. This analysis revealed that our effect sizes are highly comparable to those earlier studies (see Figure 3 — Supplementary Figure 4). Second, we contextualized our findings within the popular framework of normalization model of attention in the Discussion. We detected a mixture of contrast and response gain effects, consistent with predictions from the normalization framework given our experimental design. Finally, we extended the Discussion to consider potential underlying neural mechanisms. Specifically, we suggested that differences in attentional modulation, particularly the manifestation in response gain vs. contrast gain between the fovea and extrafovea, may reflect distinct characteristics of foveal neurons relative to those in extrafoveal regions.
  
  Reviewer #3 (Public review):
  
  Summary:
  
  This paper explores how spatial attention affects foveal information processing across different spatial frequencies. The results indicate that exogenously directed attention enhances contrast sensitivity for low- to mid-range spatial frequencies (4-8 CPD), with no significant benefits for higher spatial frequencies (12-20 CPD). However, asymptotic performance increased as a result of spatial attention independently of spatial frequency.
  
  Strengths:
  
  The strengths of this article lie in its methodological approach, which combines a psychophysical experiment with precise control over the information presented in the foveola.
  
  Weaknesses:
  
  The authors acknowledge that they used the standard approach of analyzing observeraveraged data, but recognize that this method has limitations: it ignores the uncertainty associated with parameter estimates and the relationships between different parameters of the psychometric model. This may affect the interpretation of attentional effects. In the future, mixed-effects models at the trial level could overcome these limitations.
  
  We thank the reviewer for this comment. Our Methods section continues to transparently discuss these limitations, as well as the fact that these limitations are shared with most published studies in psychophysics. Additionally, we now include measures of uncertainty for all key effects (see Appendices 1 and 2), and we have reported effect sizes throughout the Results section. Finally, we have added post hoc power analyses to the Methods. Following previous approaches to power calculation for related experiments, we found that our study was sufficiently powered to detect the main effect of attention and had moderate power to detect the interaction between attention and spatial frequency.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  (1) The manipulation of attention raises some interpretive concerns. Since only valid and neutral cue conditions were included, the results might reflect differences in temporal predictability rather than true spatial reorienting of attention. In other words, the valid cue could act mainly as a temporal warning signal that reduces uncertainty about stimulus onset. Without invalid trials or a non-predictive control cue, it remains difficult to separate spatial and temporal contributions to exogenous attention.
  
  We thank the reviewer for raising this point. In this regard, we would like to clarify that there was no temporal uncertainty in stimulus onset: across all conditions and trial types, the stimulus was presented at the same time relative to the start of the trial, i.e., 600 ms after the start. Yet, we acknowledge that the shorter temporal proximity between the cue and stimulus in valid trials could serve as an additional temporal warning signal, potentially conferring an advantage relative to the neutral condition. While we cannot completely rule out a contribution of such temporal cueing within the constraints of the current experimental design, we believe its impact was limited. Specifically, the fixed cue-stimulus interval reduced the cue’s ability to convey additional temporal information. Furthermore, observers completed a large number of trials (≥4000), and the temporal contingency between trial onset and target onset was likely overlearned. Taken together, these considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions. We now mention this in the revised Discussion (lines 309-318).
  
  We recognized that the original Figure 2 illustrating the experimental paradigm may have caused confusion regarding the timing structure of the task. We have therefore updated the figure to more explicitly illustrate the trial timeline in both conditions.
  
  (2) The reported effects seem small, and no power analysis is provided. With only seven participants, the study may not have enough statistical power to confirm that the observed differences are reliable or generalizable. Although the technical precision in gaze and stimulus control is impressive, it cannot offset the limitations of a small sample. The authors should include effect size estimates, confidence intervals, and ideally a post-hoc power analysis.
  
  The statistical results are reported only as χ² values from model comparisons, which do not show the direction or size of the effects. For clarity and transparency, these tests should be accompanied by fixed-effect estimates with their standard errors and confidence intervals, so readers can better assess both the reliability and perceptual relevance of the findings.
  
  The reviewer raised several important points regarding the study's statistical rigor.
  
  In the revised manuscript, we now report effect size estimates (Cohen’s d) in the Results section and Appendices. Effect sizes were in the medium-to-large range, including the effect of attention on contrast sensitivity at 4 and 8 CPD, and the difference in attentional benefit on contrast sensitivity between 4 and 12 CPD and between 8 and 12 CPD. We have also included the full model outputs, including standard errors and confidence intervals, in the Appendices.
  
  The sample size for the current study was determined based on the magnitude of the attentional effects observed in our previous work (Guzhang et al., 2021). The experimental design and dependent measures were highly similar across the two studies, and the prior study revealed a robust effect, which accounted for a substantial proportion of within-observer variance in a tightly controlled repeated-measures design.
  
  We have revised the manuscript, adding bootstrap-based power estimates, following the procedure described by Jigo and Carrasco (2020), using data from Guzhang et al. (2021). Assuming the effect size in our current study would be comparable to the prior one, 2 to 12 observers were randomly sampled with replacement, and a one-way repeated-measures ANOVA with attention as the main factor was used. This procedure was repeated 10,000 times, and power was estimated as the proportion of iterations yielding a significant main effect for each sample size. The results of this analysis indicate that a sample size of five observers would have been sufficient to achieve approximately 80% power to detect the main effect of attention in the prior study. Based on these estimates, the sample size used in the current study (seven observers) is adequately powered.
  
  We also conducted a post hoc power analysis to evaluate the power of our design to detect the main effects and their interaction. It was performed using the R package simr, which estimates statistical power for mixed-effects models through model-based simulation. Specifically, simr generated datasets based on the fixed- and random-effect structure of the fitted model, preserving the observed effect sizes and variance components. For each simulated dataset, the model was refit, and the effect of interest was tested. By repeating this procedure 501 times across different sample sizes, power was estimated as the proportion of simulations in which the effect was statistically significant. Based on these post hoc simulations, we estimated that our study had high power (>95%) to detect the main effects and moderate power (>65%) to detect the interaction. Although the estimated power for the interaction was lower than for the main effects, the observed effect size was substantial (as indexed by Cohen’s d), indicating that the interaction was not trivially small.
  
  We now describe these analyses in lines 501-532 in the Methods section.
  
  (3) The task seems quite demanding, requiring fine spatial discrimination, very small stimuli, and head stabilization with a bite bar. It is not clear whether participants were naïve or experienced observers. If they had prior psychophysical training, practice effects could have influenced the results, particularly given the lack of invalid trials. The manuscript would benefit from clarifying participants' experience level and describing any training or familiarization procedures.
  
  We appreciate the reviewer’s concern regarding potential training effects. All observers had prior experience with similar tasks, but were naïve to the scope of this study. Each participant underwent an initial familiarization phase of approximately 50 trials with the experimental setup of this study. They then completed an additional ~50 trials to estimate their individual contrast thresholds per spatial frequency level before we proceeded with data collection at the five predefined contrast levels.
  
  Based on our experience, we have found that, for experiments similar to the one described here, observers quickly adapt to the setup and are generally able to maintain reliable fixation and stable performance, even during the initial training phase. In addition, each participant completed approximately 400 trials before the data collection started. Even observers who began the session with no prior experience would have become practiced with the setup by the time the actual data-collection phase started, during which ~4000 trials were collected per observer. Therefore, whether an observer participated in previous experiments is unlikely to meaningfully affect the results, as the large number of trials ensures comparable levels of task familiarity across individuals.
  
  Crucially, valid and neutral trials were interleaved throughout the session. Any general learning or practice would therefore influence both conditions equally. Despite this, we still observed clear performance improvements in the valid condition relative to the neutral condition, indicating that the observed benefits cannot be attributed solely to practice and reflect an attentional enhancement. We have added elaboration on the training procedures in Methods (lines 411-429).
  
  Finally, we recognize that the lack of invalid trials may raise concerns given our 100% spatially predictive cue, as noted in Reviewer 3’s first comment. We refer the reader to our response to that point for a more detailed discussion of cue validity and the distinction between exogenous and endogenous influences in our paradigm.
  
  (4) The study would benefit from a clearer connection between the behavioral results and possible underlying neural mechanisms. How might the observed changes in contrast sensitivity relate to known physiological processes at the retinal, thalamic, or cortical level? The discussion could be strengthened by framing the findings within established models of attentional modulation or by referring to known effects of attention in the early visual cortex.
  
  This is an important point, and we agree that framing the findings within established models of attentional modulation can strengthen the discussion. We believe that the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) offers a useful framework for interpreting our behavioral findings, especially the attention-related changes in contrast sensitivity and asymptotic performance observed at the foveal scale. We have now added a more detailed discussion linking our results to this model and considering, explicitly as speculation, how known physiological processes at different stages may contribute to the observed effects in Discussion (lines 264-307).
  
  (5) The ecological relevance of the results is not fully developed. The authors propose that the observed effects may resemble natural attentional shifts triggered by salient events, yet the brief, highly localized flashes used here are somewhat artificial. A more likely interpretation is that these mechanisms relate to oculomotor control within the fovea, perhaps reflecting preparatory activity for microsaccades or fine fixation adjustments. Considering this view could broaden the impact of the findings and link them to current discussions on the relationship between attention and oculomotor control.
  
  We thank the reviewer for raising this important point regarding the ecological relevance of our findings, which we did not sufficiently address in the original manuscript. Although we briefly motivated scenarios that engage exogenous attention at high spatial resolution, such as detecting road signs or traffic lights at a distance while driving, we did not fully elaborate on how such attentional processes may link to downstream visual and oculomotor functions.
  
  In our experiment, observers maintained fixation and avoided saccades throughout the trial. Nevertheless, in a subset of trials (on average 17% ± 3%), observers made saccades after stimuli disappeared and prior to providing a response. Typically, these movements were microsaccades with amplitudes smaller than 0.5°, directed toward the target location, in both valid and neutral trials. These saccades were discarded prior to the analyses performed in the manuscript. Inspired by the reviewer’s feedback, we decided to examine the saccade latency in these trials relative to the onset of the response cue to assess whether exogenous cueing influenced oculomotor timing. Notably, we observed an earlier onset of microsaccades in valid compared to neutral trials (71 ms ± 50 ms faster, P < 0.01). We have now added this observation as Figure 2 — Supplementary Figure 2 in the manuscript. Because the presence of an exogenous pre-cue was the only difference between the two trial types, the earlier microsaccade onset likely reflects exogenously triggered preparatory activity in the oculomotor system in response to the salient pre-cue. Such fine-grained attention may prime potential eye movements toward behaviorally relevant stimuli for further examination. This interpretation is consistent with the reviewer’s suggestion and supports a mechanistic link between exogenous attention and oculomotor behavior, extending the ecological relevance of our findings. This point has been added to the Discussion on lines 329 to 340.
  
  We also conducted analysis to examine ocular drift behavior following the response cue. Although trials included in the manuscript analyses were constrained such that fixation during target presentation remained within a small window (10’ radius) around the fixation marker, we did not assess whether gaze subsequently drifted closer to the target location after the response cue. One possibility is that exogenous attention might bias ocular drift, shifting the preferred locus of fixation closer to the target. To address this, we computed the average Euclidean distance between gaze position and the target location following response cue onset for valid and neutral trials. However, we found no significant difference in gaze-target distance between valid and neutral trials (p = 0.57).
  
  Although the spatial cueing approach has long been used to probe exogenous attention in a controlled manner in psychophysical experiments, we fully recognize the importance of understanding attention under more naturalistic viewing conditions that allow observers to freely move their eyes. Developing paradigms that incorporate more naturalistic, salient stimuli would be an important direction for future work, enabling investigation of exogenous attention in ecologically valid settings and its influence on sequential actions and processes, including oculomotor behavior.
  
  (6) There is no statement about the availability of the data and code used for the experiment.
  
  We have now added the data and code for the analysis pipeline to the Open Science Framework (OSF).
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) The study could discuss the strength of the effect and how it relates to previous studies.
  
  We thank the reviewer for raising this point. To facilitate direct comparison with the study by Jigo and Carrasco (2020), we computed attentional benefit as the ratio of contrast sensitivity between the valid and neutral conditions (now shown in Figure 3 — Supplementary Figure 4). In their data, the attentional benefit at 0° eccentricity peaked just below 4 CPD, with a ratio of approximately 1.2, corresponding to a ~20% increase in contrast sensitivity. This magnitude closely matches the benefit we observed for fine-grained attentional shifts within the foveola at spatial frequencies between 4 and 8 CPD (17% ± 12% and 16% ± 14% for 4 and 8 CPD, respectively). We have added this comparison to the Discussion (lines 246-262).
  
  In addition, we acknowledge that prior studies have reported heterogeneous attentional effects, including pure contrast gain, pure response gain, or a mixture of the two. We now explicitly reference these findings in the Discussion and use the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) to account for how differences in stimulus configuration, attention field size, and eccentricity may account for discrepancies between our findings and prior studies examining attention in the extrafovea or when broadly distributed across the fovea (lines 264-307).
  
  (2) Minor details:
  
  (a) The abstract mentions gaze-contingent-display, but if I understand correctly, the stimulus was not presented in a gaze-contingent manner.
  
  That’s correct. Although stimuli were not presented gaze-contingently, we used a gaze-contingent calibration procedure (see Methods, lines 386-389) to achieve higher precision in localizing the line of sight. This increased accuracy was essential for selecting trials in which stimuli remained at the intended eccentricity relative to the preferred locus of fixation. To avoid potential confusion, however, we have removed this detail from the abstract.
  
  (b) Line 361: What is the manual calibration the authors are referring to? It does not appear to be described.
  
  The text has been updated to explain more explicitly what auto and manual calibrations are.
  
  (c) Line 402: There may be a typo towards the end of the line "t0" should be "to"?
  
  Text has been updated. Thank you.
  
  (d) Line 405. What are the units of 30?
  
  It’s in arcminutes. Text has been updated.
  
  Reviewer #3 (Recommendations for the authors):
  
  I found this paper very interesting, with a solid methodological approach and excellent data analyses. The authors present a well-designed psychophysical study that contributes valuable insights into the mechanisms of attention in the foveola. The methodology is rigorous, and the analyses are thoughtfully conducted and clearly presented.
  
  That said, I would like to offer a few comments and suggestions for clarification and further consideration:
  
  (1) Exogenous attention:
  
  If a 100% spatially predictive cue is compared to a neutral cue, the observed attentional effect should not be described as (purely) exogenous, since the cue fully predicts where the post-cue will request a response. This situation represents a case in which attention is exogenously driven but endogenously maintained (see e.g., Chica et al., 2013, Behavioural Brain Research). I recommend clarifying this distinction in the manuscript (and title) to avoid conceptual ambiguity.
  
  We thank the reviewer for raising this important conceptual point. We agree that because the pre-cue was 100% spatially predictive, the resulting attentional allocation cannot be considered purely exogenous. Although the abrupt, salient onset of the cue obligatorily triggers an exogenous shift of attention, its validity could also promote endogenous maintenance of attention at the cued location. Yet, several characteristics of our task strongly limit the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to perceptual encoding in our task.
  
  We also considered the possibility that our response cue (a retro-cue indicating the target location) might recruit endogenous attention to the internal perceptual representation. Importantly, however, this retro-cue was equally informative in valid and neutral conditions. Any enhancement driven by the retro-cue should therefore benefit both trial types to the same extent. The fact that we still observe a robust advantage in valid trials supports the conclusion that the performance improvements predominantly reflect fast, spatially specific exogenous facilitation rather than slower endogenous processes.
  
  We have revised the manuscript to clarify that although the cue obligatorily triggers an exogenous attentional shift, its 100% validity could allow for endogenous attention maintenance as shown by Chica et al. (2013). We also added an explanation detailing why such endogenous contributions are unlikely to drive our main results, given the rapid cue-target timing in our task in Discussion (lines 319-327). Finally, to further prevent ambiguity, we updated the manuscript title to refer to “exogenously triggered attention,” rather than simply “exogenous attention.”
  
  (2) Interpretation of statistical effects:
  
  The statement "Therefore, asymptotic performance showed only independent, additive effects of frequency and attention, without a systematic influence of spatial frequency on the attentional benefit" seems not to be supported by the data, as the main effect of frequency was not significant.
  
  We thank the reviewer for this helpful observation. We agree that the original phrasing did not accurately reflect the results, as the main effect of spatial frequency was not significant (p = .0545). We have revised the sentence to “Therefore, asymptotic performance reflected an effect of attention alone, with no detectable contribution of spatial frequency or of the interaction between spatial frequency and attention” to avoid implying such an effect (lines 210-211).
  
  If data from two participants were missing in one condition, the authors should consider replacing this data with new participants.
  
  We agree with the reviewer that having two observers with missing data in one condition is not ideal. However, the 20 cpd condition was deliberately positioned near the resolution limit at the tested eccentricity and was therefore extremely demanding. Observers also had to monitor two stimulus locations simultaneously, further increasing task difficulty. This condition was challenging for all observers and, despite testing up to the highest contrast, two of seven observers were unable to perform above chance, indicating that for a non-trivial fraction of observers, this condition was effectively unmeasurable with our paradigm. As noted in the manuscript, the 20 cpd condition also has a statistical limitation: thresholds clustered near the upper bound (approaching 100% contrast), compressing the dynamic range and markedly reducing variance relative to lower spatial frequencies, which violates the homoscedasticity assumption of linear models. For these reasons, we did not pursue additional data collection in this condition. Nevertheless, we report the data that were successfully obtained, as they remain informative about performance near the resolution limit.
  
  We finally note that even when setting aside the 20 CPD condition, our data support this conclusion: comparisons between 4 and 12 CPD, as well as between 8 and 12 CPD, revealed large differences in the magnitude of the attentional benefit (d = 0.65, 95% CI [0.11, 1.18] and d = 0.62, 95% CI [0.08, 1.14], respectively). To further quantify these effects, we have added Cohen’s d to report the effect sizes for these spatial-frequency comparisons across texts in Results as well as in tables in Appendices.
  
  (3) Sample size:
  
  As this is a psychophysical experiment with many trials and few participants, I am curious about how the authors determined the appropriate sample size and the number of trials required to detect the expected effects. Given that many effects were found to be significant, it seems that statistical power was adequate; however, it would be helpful if the authors could explain how this issue was addressed a priori during experimental planning.
  
  We appreciate that the reviewer raised this point. Please see the reply to the second point from Reviewer 1, who raised a related question about statistical power.
  
  (4) Figure 2 clarification:
  
  In Figure 2B, I do not fully understand the "Valid" and "Neutral" representation. Both conditions include a post-cue indicating the right position; however, in the neutral condition, there is a central fixation square, whereas in the valid condition, there is not. Please clarify this aspect of the figure. I think I understood the paradigm, but this part of the figure is misleading.
  
  Precue only exists in valid condition. But there is a mistake where fixation marker is missing in valid condition in panel B.
  
  We thank the reviewer for pointing this out. We have updated Figure 2 to explicitly show the sequence of valid vs. neutral trials. The fixation mark remained on the screen throughout the trial in both the valid and neutral conditions. After a 500 ms fixation period, an exogenous cue was presented for 30 ms in valid trials, followed by a 70 ms interval before stimulus onset. In neutral trials, no cue was presented, and the screen remained blank for 100 ms before the stimuli appeared. In conditions, a response cue would appear 50 ms after stimulus offset.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.08.27.672541v2
www.biorxiv.org www.biorxiv.org

Identifying a novel mechanism of L-leucine uptake in Mycobacterium tuberculosis using a chemical genomic approach

1
1. Public_Reviews 12 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  In this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.
  
  Strengths:
  
  (1a) The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.
  
  We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.
  
  (1b) We still do not know what PpsB actually does for amino acid uptake - is it a transporter?
  
  By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.
  
  (1c) Does semapimod binding affect its activity?
  
  Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.
  
  (1d) Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?
  
  As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.
  
  (2) The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.
  
  Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.
  
  (3) The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.
  
  PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.
  
  (4) Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment.
  
  Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.
  
  A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.
  
  Weaknesses:
  
  I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.
  
  Reviewer #2 (Public review):
  
  Summary
  
  This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy.
  
  Strengths
  
  The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.
  
  Weaknesses
  
  (1) The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.
  
  While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.
  
  To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.
  
  As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.
  
  (2) Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.
  
  We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.
  
  Reviewer #3 (Public review):
  
  (1) Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine.
  
  The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.
  
  As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.
  
  As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.
  
  (2) Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.
  
  We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.
  
  (3) The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.
  
  We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.
  
  Recommendations for the authors:
  
  (1A) Intracellular leucine can decrease from:
  
  inhibition of transport/uptake via semapimod as the authors claim or
  
  decreased uptake/requirement of many metabolites due to cells entering static growth arrest from challenge by semapimod
  
  To rule out the growth-inhibitory effect of semapimod on L-leucine uptake, we estimated intracellular L-leucine in Mtb after brief exposure of 24 hours to 50ng/ml semapimod (kindly refer Materials and Methods). We confirmed that 24 hours of treatment with 50ng/ml semapimod does not cause cells entering static growth arrest.
  
  (1B) increased consumption/utilization of leucine for some programmed response to semapimod challenge
  
  Our results show reduced expression of genes involved in leucine catabolism such as accD1, bkdA and bkdB in semapimod-treated cells, and thus the above hypothesis seems unlikely.
  
  (1C) Additional metabolites should be measured to determine the specificity of the semapimod challenge.
  
  As mentioned below, we measured intracellular valine in the semapimod-treated Mtb 6206 by LC-MS/MS, which shows no change in its level. These observations thus corroborate a specific effect of semapimod on L-leucine level in the cell.
  
  (2) The effect of Semapimod on L-leucine uptake is largely based on indirect evidence, without showing reduced transport of the amino acid. Gene expression data is not enough to prove that the amino acid transport is blocked. More compelling evidence is required to confirm this mechanism.
  
  The authors could perform leucine uptake assays to directly confirm the functioning of Semapimod, inhibiting L-leucine transport. Another possibility would be to try out measuring intra-bacterial leucine levels for drug-treated versus untreated M. tuberculosis strains.
  
  Data presented in the Fig. 3b shows lesser intracellular L-leucine upon semapimod treatment; in contrast, Sem<sup>R</sup> strain exhibits ~3-fold more intracellular L-leucine, as estimated by mass spectrometry (kindly refer our response to comment #6 below). Together, these observations indicate an inhibitory effect of semapimod on L-leucine uptake by the auxotroph.
  
  (3) The authors show that the overexpression of leuC-leuD restores Semapimod resistance in the auxotroph (Figs. 3C-3E). Is it possible to examine Semapimod resistance of WT-H37Rv or the complemented mutant grown in leucine-limiting conditions? This sort of evidence will be more direct on the specific drug-target beyond the auxotroph (mc<sup>2</sup> 6206).
  
  Because endogenous L-leucine synthesis pathway is functional in WT-H37Rv, as well as complemented auxotrophic strain, leucine-limiting conditions are unexpected to yield any effect on susceptibility to semapimod.
  
  Author response image 1.
  
  (4) Biolayer Interferometry (BLI) shows Semapimod binds to PpsB (Fig. 6); however, there is no clear evidence that it disrupts PDIM synthesis. More direct evidence would be to study the effect of Semapimod on a ppsB mutant (may be a knock-down). This would prove the specificity of Semapimod for PpsB. Likewise, it would be worth looking into the effect of Semapimod using mutant M. tuberculosis defective for PDIM synthesis.
  
  As recommended by the peer reviewer, we created the ppsB knockdown strain in the Mtb mc2 6206 by CRISPRi and examined its vulnerability to semapimod treatment. As can be seen in the Author response image 1, ppsB KD strain shows lesser susceptibility to semapimod when compared with the pDcas9-control strain which exhibits significant growth inhibition on the 7H11-OADS-PL agar plate containing 200nM semapimod.
  
  (5) Metabolomics experiments would benefit from including other control BCAAs like isoleucine and valine to determine if decreased intracellular levels of leucine are specific to semapimod or a general consequence of growth arrest from an antimicrobial agent.
  
  As suggested by the reviewer, we measured intracellular valine as well as proline levels in the semapimod-treated Mtb 6206 by LC-MS/MS; data presented in the supplimentry figure 5 clearly show no change in their levels upon semapimod treatment.
  
  (5) Figure 3c, pyrazinamide susceptibility assay could be included on the panCD strain to ensure complementation leads to functional panCD. Parent strain would be resistant to PZA, complement strain would be susceptible. (doi: 10.1038/s41467-019-14238-3).
  
  The wild-type Mtb 6206 is unable to grow in the absence of pantothenate. We verified resumption of growth of Mtb 6206 in 7H9-OADS-L-leucine medium lacking pantothenate upon PanCD overexpression, which provides more direct evidence of the expression of functional copies of panCD genes.
  
  (6) does the Sem-R mutant have increased levels of leucine?
  
  As can be seen in the supplimentry figure 7, Sem<sup>R</sup> strain shows ~3.0 fold increase in the intracellular L-leucine level when compared with the WT strain. In contrast, a comparable level of another BCAA– valine, is observed in both the strains
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2025.04.14.648691v3
www.biorxiv.org www.biorxiv.org

An ancient evolutionary calculus for attention signaling retained in modern music

1
1. ryanayork 11 May 2026
  
  in Public
  
  In summary, we find striking mathematical similarity among animal ‘song’ vocalizations and human musical sounds regarding the stability of CES, along with some well-defined differences across evolutionary distant taxa that evolved singing behavior independently (i.e. anurans, birds, primates)
  
  I don't think you have yet shown enough to make this claim. The paper would greatly benefit from the use of null models. Given that all the data used in the paper share informational structure (i.e., they're way detectable in some way as "songs" to humans), are the relationships you identify actually surprising? It's very hard to gauge this without a null, either empirical or simulated, to compare to.
  
  Similarly, it's impossible to know whether the comparisons are fair without a deeper examination of how parameter choice (e.g., window sampling size) may differentially affect the CES estimation across song types. Even if you find significant similarity between human/animal song times compared to a null, how will you know if this isn't a product of bias in your CES estimation?
Visit annotations in context

Annotators

ryanayork

URL

biorxiv.org/content/10.1101/2025.09.28.679029v6
www.biorxiv.org www.biorxiv.org

ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis

1
1. Public_Reviews 11 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Reviewer #1 (Public Review):
  
  Summary:
  
  Sun et al. generated germline-specific cKO mice for the Znhit1 gene and examined its effect on male meiosis. The authors found that the loss of Znhit1 affects the transcriptional activation of pachytene. Znhit1 is a subunit of the SRCAP chromatin remodeling complex and a depositor of H2AZ, and in cKO spermatocytes, H2AZ is not deposited into the gene region. The authors claim that this is why the PGA was not activated. These findings provide important insights into the mechanisms of transcriptional regulation during the meiotic prophase.
  
  Strengths:
  
  The authors used samples from their original mouse model, analyzing both the epigenome and the transcriptome in detail using diverse NGS analyses to gain new insights into PGA. The quality of the results appeared excellent.
  
  Weaknesses:
  
  Overall, the data is inconsistent with the authors' claims and does not support their final conclusions. In addition, the sample used may not be the most suitable for the analysis, but a more suitable sample would dramatically improve the overall quality of the paper.
  
  Thank you for your comprehensive summary of our study and your thoughtful insights into its strengths and weaknesses. We greatly appreciate this valuable feedback, which helps us further improve our work. Below, we provide a detailed response addressing each of the points you raised.
  
  Reviewer #1 (Recommendations For The Authors):
  
  Major revisions:
  
  Surprisingly, many genes were upregulated in the scRNA-seq results. How many XY genes are included? Discuss why many genes are up-regulated in Fig. 5E whereas bulk RNA-seq showed only 70 genes were down-regulated. Since apoptosis-related factors are up-regulated in Fig5E, could these up-regulated genes be due to the high content of the transcriptome of dead cells? As you know, cell death starts, but randomly and violently disrupts the transcriptome, so we think it is not desirable to analyze the transcriptome with dead cells in the mix. Describe this point appropriately in the text or generate new data without dead cells.
  
  We sincerely appreciate the reviewer’s critical points. Below, we address each point sequentially:
  
  (1) To address the question about XY-linked genes, we utilized scRNA-seq data to identify differentially expressed sex chromosome genes in spermatocytes at different stages. Our analysis revealed an aberrant activation of XY-linked genes relative to controls. Specifically, 120 XY-linked genes were aberrantly activated in zygotenestage spermatocytes, and 119 XY-linked genes showed aberrant activation in pachytene-stage spermatocytes (revised Fig. 4F). This observation directly indicates that Znhit1 knockout impairs Meiotic Sex Chromosome Inactivation (MSCI), a finding that aligns with our prior characterization of XY chromosome synapsis defects in Znhit1-deficient spermatocytes.
  
  (2) Two key reasons explain the discrepancy between scRNA-seq and bulk RNA-seq results:
  
  First, scRNA-seq employs a more permissive threshold for identifying DEGs (log2 fold change [log2FC] = 0.25), thereby enhancing sensitivity to subtle expression changes and enabling the detection of more upregulated genes. In contrast, bulk RNAseq uses a stricter threshold (log2FC = 1), which filters out these subtly upregulated transcripts, resulting in fewer DEGs overall.
  
  Second, scRNA-seq can capture cell subset-specific differential expression. In contrast, bulk RNA-seq averages signals across mixed cells, masking such subsetspecific expression changes.
  
  These clarifications have been included in the Data Analysis section of the revised manuscript.
  
  (3) We fully agree with the reviewer’s concern that dead cells could confound transcriptomic analyses. Before downstream analysis, we excluded non-viable cells via stringent QC: cells with mitochondrial RNA (mtRNA) content exceeding 15% were removed, as high mtRNA content is a well-established marker of cell death or compromised viability. To further validate that upregulated genes were not driven by dead cell contamination, we analyzed the correlation between the expression of apoptosis-related genes and mtRNA fractions in our data. This analysis revealed no significant correlation (Pearson correlation coefficient, r = -0.02; please see Author response image 1). These results collectively rule out dead cell transcriptome contamination as the primary cause of the observed gene upregulation.
  
  Author response image 1.
  
  Scatter Chart showing the Pearson correlation between apoptosisrelated genes and mitochondrial RNA fractions in scRNA-seq data.
  
  Line 280-286: The data in Figures 7I and J are confusing: as shown by KAS-seq, it is natural that ssDNA is not formed in the promoter region in Znhit1-cKO sample because transcription does not proceed, but why is ssDNA formed in the enhancer region in the first place in control and then lost in Znhit1-cKO sample? Generally, it is said that in the enhancer region, including the super-enhancer region, doublestranded DNA is not dissociated, thus not forming ssDNA. Discuss why the loss of ssDNA in the enhancer region affects transcription with appropriate citations. Also, show whether genes downstream of the missing ssDNA in the promoter region have abnormal transcriptional activity, along with the RNA-seq data. Furthermore, in the region shown in Figure 7I, why the chromatin is even more open, as shown by ATACseq in Znhit1-cKO. Discuss whether this is related to transcriptional progression or aberrant substitution with H2A. If the function of ZNHIT1 is to replace H2A with H2AZ for PGA, it is not necessary to show the H2A level in Znhit1-cKO.
  
  We appreciate the reviewer’s constructive comments.
  
  (1) ssDNA dynamics in enhancer regions: Emerging evidence demonstrates that active enhancers undergo transient DNA unwinding to form ssDNA, a process critical for transcriptional regulation by transcribing enhancer RNAs (eRNA). KAS‑seq is sufficiently sensitive to detect ssDNA in enhancer regions (Kim et al., 2010; Wu et al., 2020). It has been shown that H2A.Z (deposited by the ZNHIT1-SRCAP complex) is required for maintaining enhancer accessibility and dynamic unwinding (Sporrij et al., 2023). In this study, we found that Znhit1 deletion and defective H2A.Z incorporation impaired enhancer ssDNA formation, indicating that ZNHIT-H2A.Z plays an important role in the activity of both promoter and enhancer.
  
  (2) Impact of ssDNA loss on transcription: To address how missing ssDNA affects transcriptional activity, we further analyzed changes in KAS‑seq signals following Znhit1 knockout. Overall, KAS‑seq signals were significantly reduced upon Znhit1 depletion, confirming that Znhit1 is essential for ssDNA formation. Further examination of KAS‑seq signals at promoters of downregulated genes also revealed reduced signals (revised manuscript, Fig. S8). In contrast, KAS-seq signals of upregulated genes remained relatively low and showed no changes in both the control and knockout groups, and their upregulation probably results from indirect regulation. These results underscore the importance of ZNHIT1-mediated chromatin states in regulating ssDNA formation and gene expression.
  
  (3) Aberrant chromatin openness in Znhit1-cKO (ATAC-seq): The increased chromatin accessibility detected by ATAC-seq likely represents a disorganized, nonfunctional state rather than productive transcriptional openness. H2A.Z normally constrains chromatin dynamics to facilitate ordered transcriptional regulation (Cole et al., 2021); its absence in Znhit1-cKO leads to higher ATAC-seq signals, suggesting that this aberrant openness fails to support proper assembly of the transcriptional machinery.
  
  Minor revisions:
  
  Line 106. The text says that they looked for chromatin factors, but the legend says that they looked for epigenetic factors. The text must be consistent.
  
  We have corrected it in the revised manuscript (line 801).
  
  Line 107. Although it is stated that the transcriptional data published here were used, it appears from the cited references that they are scRNA-seq data. A clear explanation is required in the text or legend.
  
  We have revised this data as scRNA-seq data (line 107).
  
  Line 141-143: Using TUNEL analysis in Figure 4F, the authors show that Znhit1cKO testis cells contain many dead cells. Describe the type or stage of the apoptotic cells.
  
  We appreciate the reviewer’s suggestion. Specifically, we performed TUNEL staining on testes isolated from P14 mice, a critical time point for pachytene development (revised Fig. 2D). We tested this by showing that apoptosis-related genes were significantly upregulated in pachytene-stage spermatocytes in scRNA-seq data (revised Fig. 4D). To further validate this observation, we performed scRNA-seq from P35 testis samples. The results revealed a significant reduction in late pachytene-stage spermatocytes in Znhit1-cKO samples (revised Fig. 2F), consistent with apoptotic loss of pachytene cells. Collectively, these data confirm that Znhit1 knockout impairs pachytene-stage spermatocyte development.
  
  The authors claimed that the loss of Znhit1 lowers the transcription of a group of genes involved in homologous recombination, including Rnf212, causing a delay in homologous recombination; however, if the process of homologous recombination is delayed, homologous chromosome pairing and synapsis are affected unless DSB repair is completed. Provide a satisfactory explanation for the fact that DNA damage remains on autosomes despite complete synapsis, as shown in Figure 3C, which is likely not solely due to delayed homologous recombination.
  
  Thank you for this insightful comment. We fully agree that persistent autosomal DNA damage cannot be explained solely by delayed homologous recombination. To resolve this question, we further analyzed autosomal synapsis through SYCP1 and SYCP3 staining. While autosomal synapsis appeared morphologically complete, we identified subtle but significant synapsis defects in autosomal terminal regions (revised Fig. 3A). This suggests that Znhit1 knockout also results in autosomal synapsis defects. We speculate that these synapsis defects are associated with the unresolved autosomal DNA damage we observed.
  
  Lines 150-163. With regard to XY unpairing in Znhit1-cKO pachytene spermatocytes, there is insufficient discussion as to whether this is due to transcriptional aberrations.
  
  Thank you for highlighting the need to link transcriptional aberrations to XY unpairing in Znhit1-cKO pachytene spermatocytes. To address this, we analyzed sex chromosome transcription using scRNA-seq data. Relative to controls, 120 XYlinked genes were aberrantly activated at zygotene, and 119 were upregulated at pachytene in Znhit1-cKO spermatocytes (revised Fig. 4F), directly demonstrating Znhit1 knockout disrupts Meiotic Sex Chromosome Inactivation (MSCI). Given that intact MSCI is required to stabilize XY synapsis in pachytene spermatocytes, we conclude that the observed XY unpairing is likely a direct consequence of these sex chromosome transcriptional abnormalities. We add this information to the revised manuscript (lines 221-226).
  
  Line 187-194. Analysis of the scRNA-seq data is shown in Figure 4, but it lists several genes as stage-specific markers, some of which do not have well-understood meiotic functions. Please cite a reference paper that provides sufficient evidence to qualify this stage.
  
  In response to this comment, we have refined the presentation of marker genes used for cell annotation (revised Fig. S4B). We have incorporated relevant references supporting their utility as stage-specific markers for the meiotic stages (line 187).
  
  Line 225-233: If Znhit1 is important for H2AZ deposition and regulates PGA through it, how does it regulate HR-related genes that are expressed earlier through H2AZ deposition during the pachytene stage? For example, Rnf212 is not specifically expressed during the pachytene stage but is one of the targets of MEIOSIN, so it is expressed at an earlier stage.
  
  Thank you for this insightful comment. We fully acknowledge the reviewer’s key observation that HR-related genes such as Rnf212 are MEIOSIN targets that initiate transcription at earlier meiotic stages, before the pachytene stage. Our stage-resolved scRNA-seq data further showed that the expression of Ccnb1ip1 and Rnf212 was significantly upregulated from zygotene to pachytene, following their initial transcriptional onset. We next showed that the loss of H2A.Z deposition induced by Znhit1 deletion specifically impaired this pachytene-specific secondary transcriptional activation, rather than the early MEIOSIN-driven expression onset (please see Author response image 2).
  
  Author response image 2.
  
  Plots showing the expression level of indicated genes in scRNAseq data.
  
  Line 245-251: As shown in Figure 6E, more than 14,000 genes have H2AZ peaks. In contrast, only approximately 60% of the genes downregulated by Znhit1-cKO appeared to be directly affected by H2AZ. Are the remaining 40% of genes regulated in a different way that is not mediated by H2AZ? Also, only a few percent of the genes with H2AZ peaks are affected, but why are only genes with A-MYB involvement affected, as shown in Figure 7?
  
  Thank you for these insightful and constructive comments. For the ~40% of downregulated genes not directly linked to H2A.Z, they were likely regulated through indirect mechanisms. H2A.Z deposition mediated by ZNHIT1 may influence upstream transcriptional regulators (e.g., transcription factors or coactivators), whose dysregulation in turn affects these genes.
  
  The selective effect of H2A.Z loss on A-MYB target genes is explained by the strict context-dependent function of H2A.Z, which requires stage-specific partner transcription factors to exert its regulatory activity. During the zygotene-to-pachytene transition, A-MYB acts as the master regulator of pachytene gene activation and forms a functional collaborative complex with H2A.Z to drive target gene transcription. Disrupted H2A.Z deposition upon Znhit1 deletion specifically impairs the activity of this A-MYB-H2A.Z complex, leading to selective downregulation of A-MYB targets. Other H2A.Z peak-associated genes may rely on alternative cofactors and compensatory mechanisms.
  
  Line 245-256: Figures 6 and F show that the localization of H2AZ is reduced in Znhit1-cKO mice, which means that no substitution with H2A occurs. If so, show it in the data because the localization of H2A should be increased compared to that in the control.
  
  To clarify the status of H2A, we have now detected immunofluorescent staining against H2A. While H2A.Z deposition was clearly impaired following Znhit1 deletion, the global level of H2A did not change significantly (Author response image 3). We speculate that this observed absence of a compensatory increase in H2A is likely due to the intrinsically low abundance of the histone variant H2A.Z relative to canonical histone H2A under physiological conditions.
  
  Author response image 3.
  
  Immunostaining of SYCP3 and H2A in spermatocyte testis sections of control and Znhit1-sKO mice, Scale bar, 40 μm.
  
  Reviewer #2 (Public Review):
  
  Summary:
  
  The study demonstrates that Znhit1 regulates male meiosis, with deletion causing pachytene failure associated with defective expression of pachytene genes and subtle effects on X-Y pairing and DSB repair. The authors attribute this phenotype to the defective incorporation of the Znhit1 target H2A.Z into chromatin.
  
  Strengths:
  
  The paper and the figures are well presented and the narrative is clear. Evidence that the conditional deletion strategy removes Znhit1 is strong, with multiple orthogonal approaches used. Most of the meiotic phenotyping is well performed, and the omics analysis clearly identifies a dramatic effect on the meiotic gene expression program. The link to H2A.Z and A-MYB adds a mechanistic angle to the study.
  
  Weaknesses:
  
  (1) Current literature demonstrates that meiotic mutants arrest at one of two stages: midpachytene (stage IV of the seminiferous cycle) or metaphase I (stage XII of the seminiferous cycle). This study documents that in the Znhit1 KO the midpachytene marker H1t appears normally, but that cells arrest before diplotene. If this is true, then arrest must occur during late pachytene, which based on my knowledge has never been documented for a meiotic KO. To resolve this, the authors should present stronger histological substaging evidence to support their claim.
  
  Thank you for this insightful and constructive comment. To achieve highresolution tracking of cell lineage progression, we performed scRNA-seq analysis using P35 testes in this revised manuscript. scRNA-seq data showed that germ cells normally progressed through all meiotic stages and successfully gave rise to spermatids in control groups. By contrast, in the Znhit1 knockout group, late pachytene spermatocytes decreased significantly, and only very few subsequent germ cell types were observable (revised Fig. 2F, G). In scRNA-seq data, although very few diplotene spermatocytes and meiotic metaphase I cells were detectable, these cells still appeared abnormal, as evidenced by their extremely low Pou5f2 expression. We have revised our description of the meiotic arrest stage in the manuscript.
  
  (2) The authors overlooked the possible effects of Znhit1 deletion on MSCI. Defective MSCI is a well-established cause of pachytene arrest. Actually, the fact that they see X-Y pairing failure should alert them even more strongly to this possibility because MSCI failure is often associated with defective X-Y pairing. This could be easily addressed by examination of their RNAseq data.
  
  To address the concern that Znhit1 deletion may impact Meiotic Sex Chromosome Inactivation (MSCI), we analyzed XY-linked gene expression using scRNA-seq data from spermatocytes at distinct stages. Our analysis revealed aberrant activation of XY-linked genes in Znhit1-CKO spermatocytes relative to controls. Specifically, 120 XY-linked genes were activated at zygotene, and 119 XY-linked genes were upregulated at pachytene (revised Fig. 4F). This observation directly demonstrates that Znhit1-CKO impairs MSCI, which aligns with our prior characterization of defective X-Y chromosome synapsis in Znhit1-deficient spermatocytes. To explicitly resolve this concern, we have integrated these MSCIfocused RNA-seq analyses into the revised Results section (lines 221-226).
  
  (3) The recombination assays need attention.
  
  In the text the authors state that they studied RPA2 and DMC1, but the figures show RPA2 and RAD51.
  
  The RPA counts are not quantitated.
  
  The conclusion that crossover formation fails (based on MLH1 staining) is not justified. This marker does not appear in wt males until late pachytene, so if cells in this mutant are dying before that stage, MLH1 cannot be assessed.
  
  The authors state that gH2AZ persists in the KO, but I'm not convinced that they are comparing equivalent stages in the wt and KO. In Figure 3C, the pachytene cell is late, whereas in the mutant the pachytene cell is early or mid (when residual gH2AX is expected, even in wt males).
  
  Previous work (PMID: 23824539) has shown that antibodies reportedly detecting pATM in the sex body are non-specific. I therefore advise caution with the data shown in Figure 3D.
  
  We appreciate the reviewer’s detailed feedback on our recombination assays and have addressed each concern as follows:
  
  (1) Discrepancy between text and figures (RPA2/DMC1 vs. RPA2/RAD51): We have corrected this in the revised manuscript.
  
  (2) Quantitation of RPA2 foci: We have supplemented quantitative analysis of RPA2 foci (revised Fig. S3).
  
  (3) Conclusion on crossover failure: Single-cell RNA sequencing data from P35 testes definitively confirmed that Znhit1 knockout spermatocytes successfully progressed to the late pachytene stage, ruling out the possibility that our MLH1 staining results are confounded by cell death or arrest before this critical stage. In addition, analysis of transcriptome datasets revealed significant downregulation of important genes required for homologous recombination and crossover formation, including Ccnb1ip1 and Rnf212. Reduced expression of these essential factors may impair the assembly of MLH1 crossover foci. These data demonstrate that ZNHIT1 is essential for proper homologous recombination and crossover formation during male meiosis. We have revised the text to emphasize this context.
  
  (4) γH2AX persistence and stage matching: We have replaced the images with more representative, stage‑matched pachytene spermatocytes from wild‑type and Znhit1‑KO mice (revised Fig. 2C). Furthermore, prompted by the insightful comment from Reviewer 1, we carefully re‑examined autosomal synapsis and identified abnormal synapsis specifically at the terminal regions of autosomes in Znhit1‑deficient spermatocytes (revised Fig. 3A). These data together confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.
  
  (5) pATM staining issue: Following the reviewer’s advice, we carefully reviewed the relevant literature (PMID: 23824539) and confirmed that the anti‑pATM antibody may exhibit non‑specific staining on the XY chromosomes. Accordingly, we have removed the pATM staining data presented in Figure 3D from the revised manuscript to ensure the accuracy and rigor of our results.
  
  (4) RNAseq data. The authors show convincingly that Znhit1 activates genes that are normally upregulated at the zyg-pachytene transition. They should repeat the analysis for genes normally upregulated at the prelep- lep and lep-zyg transition to show that this effect is really pachytene-gene specific.
  
  We appreciate this suggestion. To clarify the stage specificity of ZNHIT1’s regulatory role, we analyzed genes upregulated at the prelep-lep and lepzyg transitions. Our results showed that Znhit1 knockout had little impact on the overall expression levels of these genes (as shown in revised Fig. 4B). In contrast, as we previously reported, genes upregulated at the zygotene-pachytene transition were remarkably downregulated in Znhit1-cKO. These findings further confirm the specificity of ZNHIT1 in regulating pachytene gene expression.
  
  (5) I am puzzled that the title and overall gist of the study focuses on H2A.Z, when it is Znhit1 that has been deleted.
  
  We appreciate the reviewer’s observation and have revised the study title as suggested. Specifically, the title is now updated to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis.”
  
  Reviewer #3 (Public Review):
  
  Summary:
  
  Sun et al. present a manuscript detailing the phenotypic characterization of loss of Znhit1 in male germ cells. Znhit1 is a subunit of the chromatin regulating complex SRCAP that functions to deposit the histone variant H2A.Z. Given that meiosis, and specifically meiotic recombination, occurs in the context of the dynamic condensing of chromosomes, the role of chromatin regulators in general, and histone variants specifically, in mammalian meiosis is an active area of research. Previous work has shown that H2A.Z is found at the locations of recombination in plants, although H2A.Z was previously not found at recombination sites in mammalian meiosis. Here the authors use a conditional approach to ablate Znhit1 in spermatocytes and characterize a block in meiosis in prophase I in the transition from pachytene to diplotene stage.
  
  Strengths:
  
  The authors combine current methods in immunohistochemistry and functional genomics to provide strong evidence of meiotic block upon the loss of Znhit1. They find that loss of Znhit1 leads to reduced incorporation of the histone variant H2A.Z, specifically at promoters and enhancers. Further, RNA sequencing found more genes are down-regulated upon loss of Znhit1 compared to upregulated, suggesting that incorporation of H2A.Z is critical for the expression of genes necessary for successful meiotic progression.
  
  A strength of the manuscript is tying the locations of changes in H2A.Z deposition with binding of the transcription factor A-MYB, providing a mechanism that can potentially combine the changes in chromatin regulation with variable binding of a transcription factor in gene expression in pachytene stage spermatocytes.
  
  Weaknesses:
  
  A weakness in the single-cell RNA experiment using cells from 16-day-old male mice. The authors suggest that the rationale for the experiment was to determine where the Znhit1-sKO mutant showed an arrest in meiosis, and claim that this is the pachytene stage. However, in the 'first wave' of meiosis 16-day-old mice are just beginning to enter pachytene, so cells from later meiotic stages will be largely absent in these tubules. This is clear from the UMAP showing a similar pattern of cell distributions between wild-type and mutant mice. Using older mice would have better demonstrated where the mutant and wild-type mice differ in cell-type composition.
  
  We appreciate the reviewer’s constructive comment. To resolve this issue, we have added new scRNA‑seq data from testes of P35 mice, which harbor a full spectrum of meiotic stages, including late pachytene, diplotene, metaphase I spermatocytes, and post-meiotic spermatids. Compared with wild-type controls, Znhit1-sKO testes exhibited a marked reduction in late pachytene spermatocytes and a near-complete loss of post-pachytene cell types, directly validating the pachytenestage meiotic arrest (revised Fig. 2F, G). All updated analyses have been integrated into the manuscript to strengthen our conclusions.
  
  The authors use the term pachytene genome activation (PGS) in the manuscript to suggest a novel process by which genes are specifically increased in expression in the pachytene stage of meiotic prophase I, without reference to literature that establishes the term. If the authors are putting forward a new concept defined by this term, it would strengthen the manuscript to describe it further and delineate what the genes are that are activated and discuss potential mechanisms.
  
  We appreciate the reviewer’s valuable feedback on our use of the term "pachytene genome activation (PGA)".
  
  To address this, we have revised the text to explicitly frame PGA as a stage-specific transcriptional program observed in our data, defined by the coordinated upregulation of a distinct set of genes during the pachytene stage of meiotic prophase I.
  
  (1) Definition and Gene Set: Using the scRNA-seq dataset, we formally defined PGA as the transcriptional wave characterized by genes with increased expression in pachytene vs. zygotene spermatocytes (n = 1,560 genes). Functional enrichment analysis shows these genes are primarily involved in DNA repair, cilium organization, and spermatid development (Table S3), consistent with the biological process of germ cell development.
  
  (2) Relationship to existing literature: While PGA as a term is not widely established, our data align with prior observations of pachytene-specific transcriptional upregulation (Alexander et al., 2023; Ernst et al., 2019; Turner, 2015). Importantly, Alexander et al reveals that in late meiotic stages, starting from pachynema, chromatin has a ~3-fold increase in transcription. We have added these citations to clearly illustrate the relevant advances in the field (lines 68-71).
  
  (3) Regulation of pachytene-stage gene expression: We further delineate that PGA is regulated by ZNHIT1-dependent H2A.Z deposition. Znhit1 deletion resulted in significant downregulation of 70.1% (1,094 out of 1,560) of these genes. This links PGA to chromatin-based regulation, where ZNHIT1-dependent H2A.Z deposition enables pachytene-specific transcription.
  
  Generally speaking, the authors present solid evidence for a pachytene block in male germ cell development in mice lacking Znhit1 in spermatocytes. The evidence supporting a change in gene expression during pachytene, that more genes are downregulated in the mutant compared to increased expression, and changes in histone modification dynamics and placement of H2A.Z all support a role in alterations in meiotic gene regulation. However, the support that changes in H2A.Z impacting meiotic recombination (as suggested in the manuscript title) is less supported, rather than a general cell arrest in the pachytene stage leading to cell death. The conclusions around the role of Znhit1 influencing meiotic recombination directly could use further justification or mechanistic hypothesis.
  
  We acknowledge the reviewer’s comments. Indeed, existing data support the presence of a pachytene block in spermatocytes of Znhit1-deficient mice, along with aberrant pachytene gene expression and impaired H2A.Z deposition.
  
  In response, we made the following revisions: (1) we adjusted the manuscript title and conclusion to reduce emphasis on a direct H2A.Z-recombination link, and focus instead on ZNHIT1/H2A.Z in pachytene gene regulation and meiotic progression; (2) recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery (lines 314-319).
  
  Reviewer #3 (Recommendations For The Authors):
  
  Quality of the images for meiotic spreads - images have low contrast and are tiny. It is difficult to see the SYCP3 results even when the images are magnified on the computer screen.
  
  We have provided new images with high resolution to ensure a clear visualization of SYCP3 signals.
  
  Line 165 - indicates the results for DMC1, although the figure suggests the results are for RAD51 foci.
  
  We have corrected this mistake.
  
  Line 306 - this manuscript 'confirms' that H2AZ is not found at mammalian recombination sites, a result already in the literature.
  
  We have corrected this mistake (lines 309-312).
  
  Reviewing Editor Comments:
  
  Major points and revisions highlighted by the reviewers:
  
  (1) Meiotic prophase in Znhit1KO: The main questions to clarify are the stage and status of progression, the analysis of apoptosis, and the consequences of gene expression on the X and Y. Additional analysis for DSB repair foci, gH2AX is also required. Those analysis are needed to answer to reviewer 2. Even if H2AZ was not detected at recombination hotspots, it may be possible that it plays a role in DSB repair but the level is too low for detection. This should be discussed as H2AZ was shown to be involved in DNA repair.
  
  We sincerely appreciate the reviewing editor’s constructive comments.
  
  (1) Stage and progression of meiotic prophase: We supplement P35 testes for scRNAseq. Results confirmed Znhit1-KO spermatocytes arrest at late pachytene, and postpachytene stages (diplotene, metaphase I) were nearly absent (revised Fig. 2F, G).
  
  (2) Apoptosis analysis: We studied this by demonstrating that apoptosis-related genes were upregulated in pachytene spermatocytes at the single-cell level (revised Fig. 4D). To further validate this finding, we performed scRNA-seq analysis on P35 testis samples. Our results revealed a marked reduction in late pachytene spermatocytes in Znhit1-cKO testes (revised Fig. 2F, G), consistent with apoptotic depletion of pachytene-stage cells. Together, these data confirm that Znhit1 ablation impairs pachytene-stage spermatocyte development.
  
  (3) X/Y gene expression consequences: To address this key point, we performed stage-resolved analysis of XY-linked gene expression using scRNA-seq data from different-stage spermatocytes. Compared with controls, we detected aberrant ectopic activation of XY-linked genes in Znhit1-KO spermatocytes: 120 XY-linked genes were inappropriately activated at zygotene, and 119 remained abnormally upregulated at pachytene (revised Fig. 4F). These results provide direct evidence that Znhit1 deletion impairs Meiotic Sex Chromosome Inactivation (MSCI).
  
  (4) DSB repair issue: We have replaced the images with more representative, stage‑matched pachytene spermatocytes (revised Fig. 3C). The revised images show consistently increased γH2AX signals in Znhit1-KO spermatocytes. Prompted by Reviewer 1’s comment, we identified abnormal synapsis at autosomal terminal regions in mutant cells. Together, these results confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.
  
  (5) Potential role of H2A.Z in DSB repair: Though H2A.Z was nearly undetectable at recombination hotspots, we discuss two possibilities: (1) ZNHIT1-H2A.Z depletion dysregulated DSB repair-related genes; (2) Current ChIP-seq sensitivity may miss low-abundance H2A.Z at hotspots, which could support repair via chromatin remodeling. Future high-resolution assays (super-resolution imaging, DSB-targeted ChIP-seq) are proposed to validate this. We agree that recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery.
  
  (2) Gene expression analysis. The first consequence of H2AZ depletion is gene expression downregulation. However, it may be not surprising that some genes are down and others upregulated. There are likely secondary and indirect effects including the upregulation of some genes. The authors should explain and discuss this point such as to answer to questions raised by reviewer 1 and 2.
  
  The primary consequence of H2A.Z depletion in pachytene spermatocytes is indeed widespread downregulation of genes. For the coexistence of upregulated genes, we explain this via three key points.
  
  (1) Technical differences between scRNA-seq and bulk RNA-seq (addressing Reviewer 1): scRNA-seq captures cell-type-specific differentially expressed genes that bulk RNA-seq masks (bulk averages signals across mixed cells, hiding changes in rare subsets). Additionally, scRNA-seq uses a lower log2(fold change) threshold (0.25 vs. 1 in bulk RNA-seq), detecting subtle upregulations missed by bulk analysis.
  
  (2) No dead cell contamination (addressing Reviewer 1): Stringent quality control excluded cells with >15% mitochondrial RNA. Apoptosis-related genes showed no significant correlation with mitochondrial RNA fractions (Pearson correlation coefficient, r = -0.02; please see Author response image 1), ruling out dead cell transcriptome interference.
  
  (3) Secondary/indirect effects (addressing Reviewers 1 & 2): Upregulated genes likely result from indirect regulatory cascades. H2AZ depletion may disrupt upstream transcription factors, leading to compensatory upregulation of their downstream genes or cell stress responses to meiotic arrest. Notably, Znhit1 knockout specifically impacts genes upregulated at the zygotene-pachytene transition, while genes upregulated at preleptotene-leptotene or leptotene-zygotene transitions remain largely unaffected (revised Fig. 4B), confirming the specificity of H2A.Z’s direct regulatory role and framing upregulation as non-targeted indirect effects.
  
  (3) The authors should also test the effect of Znhit1KO on the 1196 genes (up PreL/L) and 1325 (up L/Z) as shown in Figure 5D for the PGA. Also in Figure 5B, there is no evaluation of the statistical significance of the variation, this should be revised. X and Y genes should be analysed. KAS-Seq should be correlated with gene expression analysis, and several points as mentioned in the reviews below should be better explained and discussed.
  
  (1) Effect of Znhit1-KO on PreL/L- and L/Z-upregulated genes: we analyzed the 1196 genes upregulated at the PreL/L transition and 1325 genes upregulated at the L/Z transition. Znhit1 knockout had minimal effect on the expression of these early meiotic gene sets (revised Fig. 4B), whereas genes activated at the zygotene‑pachytene transition were strongly downregulated in Znhit1-KO spermatocytes. These results confirm the specific role of ZNHIT1 in regulating pachytene‑stage gene expression. We have also added a statistical evaluation for the variation shown in Fig. 4B.
  
  (2) X/Y-linked gene analysis: Analysis of stage‑resolved scRNA‑seq revealed aberrant ectopic activation of 120 XY‑linked genes at zygotene and 119 at pachytene in Znhit1-KO spermatocytes (revised Fig. 4F), demonstrating impaired Meiotic Sex Chromosome Inactivation (MSCI).
  
  (3) KAS-seq correlation with gene expression: We analyzed the link between KAS‑seq signals and gene expression, and we found that Znhit1 depletion caused a global reduction in KAS‑seq signals, especially at promoters of downregulated genes (revised Fig. S8). Genes with increased expression showed low KAS‑seq signals in both control and mutant groups, likely reflecting indirect regulation. These results highlight the essential role of ZNHIT1 in transcriptional regulation.
  
  (4) The title should refer to Znhit1, and the effect on meiotic recombination activities may be an indirect consequence of prophase progression arrest, even if some recombination genes are downregulated. This point is important as noted by reviewer 3.
  
  We fully acknowledge Reviewer 3’s key point and have revised the manuscript title to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis” to reduce emphasis on a direct H2A.Z-recombination link.
  
  Regarding meiotic recombination activities: The downregulation of recombinationrelated genes (e.g., Ccnb1ip1, Rnf212) stems from impaired pachytene-stage transcriptional programs caused by ZNHIT1-dependent H2A.Z deposition defects, which in turn leads to prophase progression arrest. Thus, the observed recombination abnormalities may be a secondary consequence of the meiotic prophase arrest, rather than a direct regulatory effect of ZNHIT1 on recombination machinery. This clarification has been integrated into the Discussion section (lines 314-318).
  
  (5) The recent structural analysis of SRCAP should be cited: Yu et al. Cell Discovery (2024) 10:15 https://doi.org/10.1038/s41421-023-00640-1.
  
  We have cited this reference in this revised manuscript (lines 234-236).
  
  (6) The authors should read and answer the specific revisions asked for by the reviewers.
  
  We have thoroughly read and systematically addressed all specific revisions requested by Reviewers 1, 2, and 3, as detailed in the revised manuscript and supplementary data.
  
  References
  
  Alexander, A.K., Rice, E.J., Lujic, J., Simon, L.E., Tanis, S., Barshad, G., Zhu, L., Lama, J., Cohen, P.E., and Danko, C.G. (2023). A-MYB and BRDT-dependent RNA Polymerase II pause release orchestrates transcriptional regulation in mammalian meiosis. Nature communications 14.
  
  Cole, L., Kurscheid, S., Nekrasov, M., Domaschenz, R., Vera, D.L., Dennis, J.H., and Tremethick, D.J. (2021). Multiple roles of H2A.Z in regulating promoter chromatin architecture in human cells. Nature communications 12, 2524.
  
  Ernst, C., Eling, N., Martinez-Jimenez, C.P., Marioni, J.C., and Odom, D.T. (2019). Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis. Nature communications 10, 1251.
  
  Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., et al. (2010). Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182-187.
  
  Sporrij, A., Choudhuri, A., Prasad, M., Muhire, B., Fast, E.M., Manning, M.E., Weiss, J.D., Koh, M., Yang, S., Kingston, R.E., et al. (2023). PGE(2) alters chromatin through H2A.Z-variant enhancer nucleosome modification to promote hematopoietic stem cell fate. Proceedings of the National Academy of Sciences of the United States of America 120, e2220613120.
  
  Turner, J.M. (2015). Meiotic Silencing in Mammals. Annu Rev Genet 49, 395-412. Wu, T., Lyu, R., You, Q., and He, C. (2020). Kethoxal-assisted single-stranded DNA sequencing captures global transcription dynamics and enhancer activity in situ.
  
  Nature methods 17, 515-523.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/2024.06.06.597721v3
www.biorxiv.org www.biorxiv.org

Dorsal/NF-κB exhibits a dorsal-to-ventral mobility gradient in the Drosophila embryo

1
1. Public_Reviews 11 May 2026
  
  in eLife
  
  Author response:
  
  The following is the authors’ response to the original reviews.
  
  Public Reviews:
  
  Reviewer #1 (Public review):
  
  Summary:
  
  Al Asafen and colleagues apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair-correlation function spectroscopy) to address the nuclear-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Tolldependent control of Dl nuclear localization, and provides an example of a morphogen gradient produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measuring GFP-tagged Dl protein, either in wild-type embryos or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments of the embryo, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.
  
  Strengths:
  
  The experiments on wild-type GFP-tagged Dorsal are performed well, are mostly reported well, and are interpreted fairly.
  
  Weaknesses:
  
  The discrepancy between experiment and theory as pertains to Michaelis-Menten kinetics is not fully motivated in the text, and could benefit from a more clear presentation. The experiments performed to distinguish between the contribution of Toll-dependent phosphorylation and Cactus interaction models for limiting Dorsal DNA binding are possibly confounded by the presence of wild-type, GFP-tagged Dorsal protein.
  
  Thank you for your thoughtful feedback. Regarding the discrepancy between experiment and theory in relation to Michaelis-Menten kinetics, we recognize that our initial explanation may not have been explicit enough. Our intent was to illustrate that if DNA binding is a saturable process, then while the absolute concentration of Dl bound to DNA will increase with total Dl levels, the fraction of Dl bound to DNA will decrease. We used Michaelis-Menten kinetics only as a familiar example to convey this concept but did not intend to suggest that the system strictly follows Michaelis-Menten behavior. To clarify this point, we removed mention of Michaelis-Menten as an illustrative analogy and stuck specifically with discussing the system as “saturating.” This primarily affected text in the paragraph starting on Line 204, but also Lines 323-325.
  
  Regarding the concern about potential confounding effects due to the presence of wildtype GFP-tagged Dorsal (Dl[wt]-GFP): we understand the importance of addressing this point more directly. Therefore, we have imaged the Dorsal-GFP gradient in embryos expressing the UAS-dl[S280P]-GFP or the UAS-dl[S317A]-GFP constructs in the absence of the BAC-recombineered Dl-GFP construct. In both cases, the dl mutants by themselves were not able to recapitulate enough of the Dl gradient to test our hypotheses. We have added this analysis to Supplemental Figure 4 and mentioned this figure on Lines 333-336 and 354-358. Furthermore, we explicitly mention that it is possible the reason why we failed to reject the null hypothesis in the Toll phosphorylation mutant case may be due to the additional copy of Dl[wt]-GFP (the BAC recombineered construct), with text added to Lines 343-345, 365-369 (Results) and 408-418 (Discussion).
  
  Reviewer #2 (Public review):
  
  Summary:
  
  In this manuscript, Al Asafen, Clark et al., use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying the formation of the Dl gradient have been extensively studied by this group and others, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. However, the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors provide evidence that GFP-tagged Dl may be separated into a mobile pool and an immobile pool. Interestingly, the fraction of immobile Dl is position-dependent along the DV axis, revealing more binding to DNA in the ventral than in the dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl from binding to DNA. Using dl-mutant alleles, the authors support the latter hypothesis.
  
  Strengths:
  
  The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.
  
  Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.
  
  Weaknesses:
  
  In my opinion, the main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl from binding DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A, 5A). While it is interesting that the fraction of immobile Dl increases (just a little, but significantly) in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent. As can be seen in Figure 3F, the fraction of immobile is unaffected in Dl-mutant forms with reduced DNA binding, because it is already very low. It is unlikely that Dl binding to Cact in dorsal nuclei would affect shuttling as well since the fraction is very low anyway.
  
  We thank the reviewer for pointing out the places where we could strengthen our explanations. Here we first address the criticism, also raised by the other reviewer, that the fraction of immobile Dl increases only a small amount (Fig. 5A). [In our reply to the next comment, we address the question of biological implications.] We attempted to explain this small effect size in the manuscript; however, we understand that we could clarify further and, given the fact that eLife has no restraints on space, we added more explanation in the main text.
  
  In essence, even though the effect was statistically significant, the effect size was small because the mutation was “diluted” by the presence of a wildtype Dl protein tagged with GFP. We were willing to deal with this dilution because the alternative was that, according to previous literature, without any wildtype Dl, no Dl gradient would be present in the reduced Toll phosphorylation mutants, and only a very weak Dl gradient (weakened on both ends) would be present in mutants that reduced Cact binding. We were confident that, with our quantitative approaches, we would be able to detect the diluted effect.
  
  However, because both reviewers have criticized this diluted effect, in this resubmission, we have included analysis of GFP-tagged mutants without the presence of wildtype Dl protein. Unfortunately, these embryos lack a discernible Dl gradient and cannot be analyzed in such a way as to test the hypotheses that the mutants were generated for.
  
  Even so, the effect of the Cact-binding mutant was strong enough that we were able to statistically distinguish it from embryos expressing only wildtype Dl-GFP, even with the dilution effect. On the other hand we have also included a caveat that our failure to statistically distinguish Toll phosphorylation mutants from wildtype may be due to the dilution effect. We now also explicitly state the concerns about a lack of a discernible Dl gradient and have included figures of full mutants in the supplement. See also our discussion of Reviewer 1’s similar comment.
  
  While the authors have a very clear understanding of the biology of the Dl gradient, I feel that the manuscript is more written as a 'tools' paper (i.e., to exemplify how FSC methods and analysis can be used for biological discovery). This is ok, but I think that the authors should discuss further what are the biological implications of these findings other than the contribution to uncovering the biophysical parameters.
  
  Here we underscore the biological implications of our discovery that Cact is present in the nucleus on the dorsal side. The reviewer mentioned that Cact in the nucleus on the dorsal side appears to have little overall effect, because this is the location of the embryo where there is very little Dl in the first place, which raises the question of whether this discovery is impactful.
  
  While we previously used the final paragraph of the discussion to touch on the implications of this discovery, we acknowledge that we could have spent more time on the explanation. As such, we have expanded this final paragraph into two paragraphs. In the first of the two, we discuss in more detail the implications specifically of the Dl/Cact interactions in the dorsal-most nuclei, as understood by the results of this paper. In brief, knowing that Dl in the dorsal-most nuclei is bound by Cact results in an updated understanding of the Dl gradient, with increased dynamic range, robustness, and precision (but unknown shape).
  
  In the second of the two paragraphs, we discuss this result in light of our recent work on imaging Cact in live embryos, in which we have shown that Cact is present in all nuclei at roughly uniform levels. Taken together, we suggest that it is possible that Cact is bound to Dl in all nuclei (not just the dorsal-most), which would allow us to estimate the shape of the overall Dl gradient by subtracting off the fluorescence that stems from Dl/Cact complex.
  
  For example, I think that the implications of the rejected hypothesis (i.e., that Tolldependent Dl phosphorylation does not seem to have an impact on Dl binding affinities to DNA) are important and should be further discussed (even if no additional experiments are performed). What is then the role of Dl phosphorylation? Perhaps it could have an impact on patterning robustness in lateral regions. The authors should report in Figure 5 also what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.
  
  We appreciate the reviewer’s suggestion that the rejection of the hypothesis that phosphorylation of Dl by Toll impacts Dl/DNA binding could be expanded upon further. For the role of Dl phosphorylation by Toll: we previously mentioned that this phosphorylation is known to enhance the nuclear import or retention of Dl, and that mutation of serine 317 to an alanine abolishes Toll-mediated phosphorylation of Dl, which results in embryos with no Dl gradient. We had also mentioned that phosphorylation of Dl is not known to affect its DNA binding, which is the hypothesis we sought to test by creating the dl[S317A]-GFP mutants. We did not image any mutants, or the UAS-dl[wt]-GFP control, in the lateral regions, for two reasons. First, this region is easily the smallest of the three regions, in terms of the percentage of the DV axis (see Fig. 1A). Second, because of the dilution effect, we knew the effect size would be small, and as such, we imaged only on the extreme ends of the gradient so that the most clear conclusion could be drawn about the effect that Toll phosphorylation might have on DNA binding of Dl.
  
  The way that position along the DV axis is reported using the nuclear-cytoplasmic-ratio (NCR) in Figures 1-3 is not incorrect, but I wonder if it is the best way of doing it. The reason is that it spreads out a relatively small region of the embryo (the ventral-most locations) and shrinks a relatively large region of the embryo (lateral and dorsal regions), see Figure 1A. Perhaps reporting the NCR in log_2 units would be more appropriate.
  
  We agree that there is some distortion of the relative spatial extents of the Dorsal gradient when NCR is used as an independent variable on a plot. However, we prefer the NCR on the horizontal axis because it is closer the functional variable (Dl concentration, rather than spatial location) for the properties we studied.
  
  Recommendations for the authors:
  
  Reviewer #1 (Recommendations for the authors):
  
  I really enjoyed the first part of this paper and have only minor suggestions for improvement of the presentation. I am confused about the experimental approach for the final figure, distinguishing phosphorylation and cactus-dependent effects. I'll divide my comments between "First Part/General Suggestions", "Last Part", and finish with some minor typo observations.
  
  The gist of the issues with the last part of the paper could boil down to insufficient detail/explanation of the section. The discrepancy with expectation with Michaelis-Menten kinetics is presented in a total of three sentences and is not necessarily obvious to the general readership of eLife. The mutants chosen to distinguish the phosphorylation and cactus mechanisms could be described more (why these? aren't other residues phosphorylated?) and possibly why also having wild-type GFP-Dl in the measurements isn't confounding. Since there is unlimited space in this journal, it may be advisable to use this space to fill out these rationales and ideas.
  
  First part/General Suggestions:
  
  (1) For the RICS data, (Figures 1 and 2) there is a nice correlation between WT NC ratio and the selected low/med/hi Dl activity mutants. More-or-less the median values in, say, Figure 1E-G are reflected in Figure 1H. However, with the ccRICS data (Figure 3), it looks like there is less correspondence between the range of fraction bound estimates in, for instance, "ventral" in Figure 3D and '10b' in Figure 3E. Can the authors comment on this? Should the reader be able to make this kind of comparison, or does something about data collection for the wt/NCR measurements preclude direct comparison of magnitudes with the panel of mutants? (imaging setup, laser power, etc)?
  
  The reviewer is correct that there seems to be a discrepancy in the values of ψ between the wt embryos (ventral side) and the Toll10B embryos. It should be noted that the Toll10B embryos are not “ventral-like” in every way, in part because they have unknown activated Toll levels that might be above or below what is seen at the ventral midline in wildtype embryos, and in part because there is no DV gradient, and thus no shuttling in these embryos that would accumulate total Dorsal on the ventral midline. As such, comparisons between Toll10B embryos and the ventral side of wildtype embryos are not exactly one-toone, and we are more confident in comparing among the mutants in an allelic series. To address this question, we have added a sentence to the end of the second paragraph of the “Dorsal/DNA binding exhibits a spatial gradient” subsection of the Results (Lines 233235).
  
  (2) Materials and methods: Mounting and imaging of Drosophila embryos: the authors cite the "488 nm laser intensity ranged from 0.5% to 3.0%..." The values presented here are not useful for the general reader or an individual looking to replicate these conditions, as emission power produced from such values will vary from instrument to instrument. It is standard in these cases to report an estimated laser power (measured in watts) for each laser line, and a clear description of how such measurements were made (stationary beam, under scanning conditions, with what detector, etc). These measurements are valuable and the authors are strongly encouraged to report such measurements for their setup.
  
  We appreciate the reviewer’s suggestion and understand the importance of providing absolute laser power values for reproducibility. We have now included the laser power (in watts) for the laser lines on both microscopes used in this study. The revised text can be found in the Materials and Methods section, in the Lines 535-536 and 540.
  
  (3) The presentation of the data in Figure 4 is difficult to understand. Are the kymographs (A lower) representing the entire length of the big white arrow in A upper? Or do the dashed lines indicate the x-axis limits of the kymograph? It is difficult to tell from the figure legend, where the dashed lines are described as "areas where Dl-GFP movement is measured out of the nucleus." I believe that the authors can make these measurements and that Figure 4B reflects properties of "movement" of Dl out of the nucleus, but how they get there from these data is not clear to this reader. Perhaps a cartoon explaining the green lines and the orange lines in the kymograph or tightening the legend would help.
  
  We thank the reviewer for their feedback and understand the need for greater clarity in the text of the pCF section and in Figure 4. The widths of the kymographs in the lower panels correspond to the full widths of the images in the upper panels. The pCF measurements were taken at the y-coordinates at the level of the white arrows. The dashed vertical lines connecting the upper and lower panels illustrate two cases of locations along the x-axis of the image where Dl is crossing from inside a nucleus to outside. In the two illustrated cases, these crossings are accompanied by either zero Dl molecules being observed to cross the nuclear barrier (ventral image/kymograph on left) or delayed crossing of Dl molecules (dorsal image/kymograph on right). To address this concern, we have added more detail to the Fig. 4 legend and greatly expanded on a discussion of what pCF does in the text (the second and third paragraph of the section). We have also updated Fig. 4 to align with new explanations from the text: namely, describing the y-axis of the kymographs as Δt (instead of log(time)) and explicitly showing that the pair correlation is for pairs of pixels that are Δx = 6 pixels apart. Further details were also added to the relevant Methods section.
  
  (4) DV position in the wild-type imaging experiments is operationally determined through measurement of the Dorsal NC ratio. This makes sense, but the strategy is buried in the first paragraph of the results, and not discussed in the M & M. For readers unfamiliar with imaging the fly embryo or the nuances of the Dl gradient, perhaps a sentence or two explaining that embryos were oriented randomly along the DV axis, and DV positions of the imaging region were estimated by measuring the Dl NC ratio.
  
  We thank the reviewer for this helpful suggestion. To improve clarity, we have added a description of how DV position was determined to the Materials & Methods section (paragraph starting on Line 520). Specifically, we now state that embryos were randomly oriented along the DV axis and that we used the Dorsal NC ratio of intensity as a proxy for measuring the DV position in imaging experiments. Additionally, we have added a statement to the Results section to ensure that this strategy is more clearly introduced (Lines 143-144). We appreciate this recommendation, as it will help readers unfamiliar with fly embryo imaging better understand our approach.
  
  (5) It would be nice to report the corresponding NC-ratio values for Dl in each of the mutant conditions, perhaps as a supplement to Figure 1. Currently, Figure 1H relies on the (admittedly well-established) properties of the three mutants, but it feels that an additional nice quantitative link in the data can be drawn out here. Do the authors see the strict correlation between the wt and mutant diffusivity measurements at specific NC-ratios?
  
  We are hesitant to try to draw direct comparisons between the mutants and the behavior of the wildtype embryo at the corresponding NCR. This is because, in the context of these uniform mutants, the NCR is determined by a combination of at least three factors that we cannot measure or control for: the unknown strength of Toll signaling, the unknown capacity of Toll signaling (ie, the potential saturation of the cytoplasmic enzymes controlled by Toll signaling), and, most importantly, the lack of a shuttling mechanism that concentrates Dl on the ventral side of the embryo. As such, the NCR does not represent a continuous variable that transforms the behavior of one mutant into another (or from mutants into wt DV coordinates), as it does along the DV axis in wildtype embryo. This is why the mutant studies are presented as boxplots. At best, we were comfortable only in using the uniform mutants as an allelic series to produce gross trends. We have added a brief statement describing the shuttling caveat to the Results section (Lines 173-177).
  
  (6) In the section related to Dl nuclear export, the language used to describe Dl kinetics is ambiguous. The term "movement" is used seemingly as a catch-all for nuclear-importexport as distinguished from diffusion. However, diffusion is also a form of movement. Could this section be reworked to explicitly distinguish nuclear import-export and diffusive movements?
  
  We appreciate the reviewer’s suggestion and agree that the language used to describe Dl kinetics could be more precise. By way of explanation, the pCF analysis calculates the time scale on which Dl can exit the nucleus. pCF only gives a signal if it sees the same Dl molecule twice, at two different locations after some Δt amount of time has passed. Because of this, if a given Dl molecule in a ventral nucleus is being tracked, then that molecule has some probability that it is bound to DNA initially, which means it will take, on average, longer to exit the nucleus than a Dl molecule not initially bound to DNA. Therefore, on the ventral side, the time scale on which Dl exits the nucleus is longer than on the dorsal side (where DNA binding is not happening). This can be true even if the nuclear export rate constants are the same on the ventral side vs the dorsal side. As such, we were careful to choose language that did not imply that we were talking about a nuclear export rate constant. We have added this discussion to the end of the relevant Results section (Lines 308-315).
  
  We have also revised this section to explicitly distinguish between the mobility associated with exiting the nucleus and diffusive movement, while still trying to distinguish between the time scale of exiting the nucleus vs the nuclear export rate. Specifically, we now refer to ‘time scale of nuclear export’ when discussing transport across the nuclear envelope and reserve the term ‘diffusion’ for passive intracellular movement. Furthermore, we have edited a sentence in this section (Lines 291-293) to describe the distinction we are making between the time scale measured by pCF and the time scale commonly associated with nuclear export (that is, the reciprocal of the rate constant). We hope this clarification improves readability and conceptual clarity.
  
  Last Part:
  
  (1) There is an undersold argument centered on Michaelis-Menten kinetics that needs to be explicitly presented, especially since it motivates the final experiments of the paper, which are challenging. In the two sections describing how the data do not adhere to expectations based on Michaelis-Menten Kinetics, the assertion that "the fraction of immoble Dl is expected to decrease with increasing nuclear total Dl concentration" is only intuitively true if the system is saturated. Is the system demonstrably saturated? Another interpretation of this would be that these results demonstrate that the system is likely not saturated. In any case, the authors need to devote some space in the introduction and/or results and/or discussion to fully motivate this point.
  
  We agree that the reviewer has raised an important point: if the system is very far from saturation, then the fraction of immobile Dl is not expected to decrease with increasing nuclear total Dl concentration. But neither would it increase; it would instead stay flat. To correct this mistake, we have edited the sentences in question to acknowledge the farfrom-saturation scenario, saying “at best, [the fraction bound] remain[s] constant” (Line 209). As such, our original point, which is that in no case would the fraction immobile increase [unless something else is going on besides affinity-based binding to DNA], it still valid.
  
  (2) Wouldn't any argument on the basis of Michaelis-Menten need to rely on the assumption that the system is at steady-state? Reeves 2012 concludes that during the times measured here, Dl does not reach a steady state. It would be good, in the context of the point above, for the authors to clarify how this impacts the expectations of saturation and the application of M/M kinetics.
  
  We thank the reviewer for raising this important point. We apologize for not being clear on our points about M/M kinetics and would like to stress again that we are not claiming the system is has M/M kinetics. We appealed to M/M kinetics only as a simple, intuitive example of a saturating system to point out the difference between bound concentration vs bound fraction as functions of total concentration. We did this because previous feedback on our manuscript suggested that the difference between these two variables needed to be made clearer. Because this point seemed controversial with both reviewers, we removed all mention of M/M kinetics and simply refer to the system as “saturating.” For further explanation, see the first paragraph of our response to Reviewer 1’s “weaknesses” in the public review.
  
  (3) It is not clear to me how the inclusion of wild-type, GFP-tagged dorsal in the experimental setup for Figure 5 is not confounding. For the S317 (phospho-) mutant, GFPtagged alleles of both phospho- and wild-type Dl are expressed. The reasoning is that not enough phospho-mutant Dl gets into the nucleus, and this makes it difficult to distinguish the dorsal from the ventral side of the embryo, so in a dl mutant background, there is expression of wt GFP-dl from a BAC, and nos>Gal4 driven expression of a GFP-tagged S317A mutant dl. The measurements show that on the ventral side of the embryo, there is no difference in the fraction of bound Dl. Couldn't this be predominantly binding of wildtype GFP-Dl? How is this interpretable? Wouldn't it be easier to perform these measurements in a Tl 10b background (or to cross in UAS>Tl[10b]) and for the only GFPtagged dl to be S317A? The same goes for the S234 mutant (could be done in the pelle mutant background).
  
  We thank the reviewer for raising the point that the confounding effect of wildtype Dl makes it difficult to interpret the results from the 317A mutant. Under the circumstances of the experimental design, we can best conclude that, if the null hypothesis is incorrect, the effect size was too small to detect with our sample size. As such, we have modified our discussion of the results of this experiment to carefully explain this caveat (rather than confidently saying that Toll phosphorylation has no effect). For further explanation, see the second paragraph of our response to Reviewer 1’s “weaknesses” in the public review, as well as our response to the related question raised by Reviewer 2 in the public review.
  
  Minor issues/typo stuff:
  
  (1) This reviewer notes that the submitted materials contain neither line numbers nor page numbers.
  
  We appreciate the reviewer’s feedback. We have now included line numbers and page numbers in the revised manuscript for easier reference.
  
  (2) First paragraph of results: "We imaged small regions of the embryo..." The parenthetical statement only cites pixel size and directs the reader to the methods. Without the total number of pixels, the pixel size value does not clarify how "small" the imaged region is. Consider including the xy area, pixel dimensions, and pixel size here to assert the smallness of the imaged area.
  
  We have added the requested information.
  
  (3) Second paragraph, Introduction: "Dorsal, one of three (Drosophila) homologs to mammalian NF-kB" (Add Drosophila). Also, aren't these orthologs?
  
  We have made these changes.
  
  (4) Last sentence of last paragraph in the introduction: Kind of a throw-away sentence. Consider revising.
  
  We thank the reviewer for making this point; the sentence was originally constructed to state that our quantitative measurements resulted in a biologically significant discovery. However, because Reviewer 2 also mentioned the question of biological significance, we have changed this final sentence to explicitly mention of what the biological significance is: namely, an understanding of the Dl gradient that has superior dynamic range, spatial range, robustness, and precision.
  
  (5) Where is the median line in the S317A boxplot in Fig 5C?
  
  The median line is at ψ = 0. We have added an explanation of this to the Figure legend.
  
  (6) Materials & Methods: Fly transformation, typo: Drosophila embryos were injected with 0.5 µl of each pUAST construct..." The volume of an entire Drosophila embryo is less than 0.5 µl, please revise the units to reflect the value injected. Most likely an absolute volume unit was stated when rather a concentration of an injection solution, delivered at significantly smaller volumes was intended.
  
  We thank the reviewer for catching this typo. It was intended to indicate a concentration of 0.5 ng/μL, and we have made the appropriate changes.
  
  Reviewer #2 (Recommendations for the authors):
  
  (1) Perhaps this has been described in a prior publication (if this is the case, please simply state this somewhere in the Methods section where Dl-GFP embryos are described), but since Dl-GFP embryos have one copy of endogenous dl and one copy of Dl-GFP, how do potential differences in tagged vs. non-tagged Dl interactions with DNA or Cact affect their findings?
  
  The reviewer brings up a good point, and we acknowledge that any time a protein is tagged with GFP, the behavior of the protein may be affected. We have now explicitly added this caveat to our discussion in a new paragraph on Lines 420-429.
  
  (2) In the Discussion section, the authors argue that a major implication of their findings is the possibility that Cact binds Dl in the nuclei would imply that the true (active) Dl gradient may be unknown unless the unbounded Dl is separated from the Dl/Cact (inactive form). While this is an interesting point, this idea is not supported by the findings of Figure 5B where there is no effect in the fraction of Dl bound to DNA in the reduced Cactus binding mutants. The authors should report what happens in lateral regions in Figure 5 because perhaps there is an effect there (see comment on this in the Public Review).
  
  We thank the reviewer for the insight, as we did not directly discuss the implications of the middle column of Fig. 5B on our hypothesis. Indeed, our hypothesis is not supported by Fig. 5B; it is instead inconclusive (failure to reject H0). This is why we designed the second experiment (Fig. 5C) to test the Cactus hypothesis, because the effect size would be greater on the dorsal side.
  
  Furthermore, as pointed out by both reviewers, the presence of wildtype Dl-GFP in these experiments is confounding. We have discussed this elsewhere in our rebuttal, but briefly, this problem resulted in needing larger effect sizes to detect a statistically significant difference between wt and the mutant populations. This was a necessary evil that we were willing to deal with in order to ensure the Dl gradient could be established so that the dorsal vs ventral sides would be distinguishable. We have added a fuller discussion of these issues to the relevant Results section (Lines 333-336, 343-345, 354-359, 365-369) and also the Discussion section (Lines 412-418), including underscoring the fact that, from a falsification standpoint, the results in Fig. 5B do not allow us to reject either null hypothesis, possibly due to the confounding effect of wildtype Dl. We appreciate the reviewer’s point about this, and believe the changes suggested by the reviewer have improved the manuscript.
  
  On the other hand, we respectfully disagree with the reviewer that investigating either mutant in the lateral regions of the embryo would bear fruit. To the first approximation, it would be the average between the behaviors on the ventral vs. dorsal sides. For the S317A mutant, neither the ventral nor the dorsal side was conclusive in regards to our hypotheses. (Although we admit here that further investigation into why the S317A column in Fig. 5C was statistically different from wildtype, in the opposite direction from the S234P mutant, may be interesting in future work.) For the S234P mutant, the data were more conclusive on the side of the embryo where the effect size was expected to be large enough to detect a difference. In the lateral regions, the expectation would be that the effect size would be intermediate, which would make the interpretation of the results more difficult (i.e., more likely to be inconclusive). In contrast, as Fig. 5C is already conclusive, we are not confident there would be more information gained by imaging the lateral regions.
  
  (3) Is Figure 5A a wild-type embryo? If so, I think that the labels are misleading or unclear. Also, is it the same image as in Figure 1A? If so, I suggest replacing this with a schematic since it does not add any new data.
  
  We have eliminated the labels for the mutants and have added the following comment to the figure 5 legend “Same embryo as in Fig. 1A”.
  
  (4) Also in Figure 5, I suggest using labels to indicate the schematics instead of simply using their location. You could use 5A', 5A' and 5A', for example.
  
  We have made the suggested changes.
  
  (5) The use of some technical labels makes some figures difficult to read. I suggest using more simple labels for mutants in Figure 3F (replace R063C) or Figure 5B, C (replace S234P and S317A).
  
  We have made changes to Fig. 3F, Fig. 5B,C, and the corresponding places in the figure legends. We have labeled R063C as ↓DNA, S317A as ↓Toll, and S234P as ↓Cact.
  
  (6) I suggest reporting p-values consistently. For example, in Figure 4B, they use one or two asterisks to denote p-values less than 0.07 and 0.05, respectively, which is somehow arbitrary and unconventional. Why not report the actual values as in Figure 5C, for example? (By the way, I would report in Figure 5B the actual p-values as well, since a nonsignificant value is also reported in Figure 5C. Also in Figure 5C, report values in the same notation (decimal or scientific), i.e., either put 0.005 as 5x10^-3 or 10^-3 as 0.001).
  
  We have made the suggested changes.
  
  AuthorResponse
Visit annotations in context

Tags

AuthorResponse

Annotators

Public_Reviews

URL

biorxiv.org/content/10.1101/320754v3